├── CONTRIBUTING.md ├── logo.png ├── testdata ├── TestRealWorld │ ├── snippets │ │ ├── price_em_in_a_p │ │ │ ├── output.default.golden │ │ │ ├── goldmark.golden │ │ │ └── input.html │ │ ├── square_brackets │ │ │ ├── output.default.golden │ │ │ ├── goldmark.golden │ │ │ └── input.html │ │ ├── code_design_heading_in_link │ │ │ ├── output.default.golden │ │ │ ├── goldmark.golden │ │ │ └── input.html │ │ ├── tweet │ │ │ ├── output.default.golden │ │ │ ├── goldmark.golden │ │ │ └── input.html │ │ ├── github_about │ │ │ ├── output.default.golden │ │ │ ├── goldmark.golden │ │ │ └── input.html │ │ ├── text_with_whitespace │ │ │ ├── output.default.golden │ │ │ ├── goldmark.golden │ │ │ └── input.html │ │ ├── turndown_demo │ │ │ ├── output.default.golden │ │ │ ├── goldmark.golden │ │ │ └── input.html │ │ └── nav_nested_list │ │ │ ├── output.default.golden │ │ │ ├── input.html │ │ │ └── goldmark.golden │ ├── golang.org │ │ ├── output.default.golden │ │ └── goldmark.golden │ ├── bonnerruderverein.de │ │ └── output.default.golden │ └── blog.golang.org │ │ ├── output.inlined.golden │ │ ├── output.emphasis_asterisks.golden │ │ └── output.emphasis_underscores.golden ├── TestCommonmark │ ├── hr │ │ ├── output.default.golden │ │ ├── goldmark.golden │ │ └── input.html │ ├── keep_remove_tag │ │ ├── output.default.golden │ │ ├── goldmark.golden │ │ └── input.html │ ├── br_element │ │ ├── output.default.golden │ │ ├── goldmark.golden │ │ └── input.html │ ├── italic │ │ ├── output.asterisks.golden │ │ ├── output.underscores.golden │ │ ├── goldmark.golden │ │ └── input.html │ ├── sup_element │ │ ├── output.default.golden │ │ ├── goldmark.golden │ │ └── input.html │ ├── heading │ │ ├── output.default.golden │ │ ├── output.atx.golden │ │ ├── goldmark.golden │ │ ├── output.setext.golden │ │ └── input.html │ ├── bold │ │ ├── output.asterisks.golden │ │ ├── output.underscores.golden │ │ ├── goldmark.golden │ │ └── input.html │ ├── blockquote │ │ ├── output.default.golden │ │ ├── goldmark.golden │ │ └── input.html │ ├── p_tag │ │ ├── output.default.golden │ │ ├── goldmark.golden │ │ └── input.html │ ├── image │ │ ├── output.default.golden │ │ ├── goldmark.golden │ │ └── input.html │ ├── list_nested │ │ ├── output.dash.golden │ │ ├── output.plus.golden │ │ ├── output.asterisks.golden │ │ ├── goldmark.golden │ │ └── input.html │ ├── list │ │ ├── output.dash.golden │ │ ├── output.plus.golden │ │ ├── output.asterisks.golden │ │ ├── goldmark.golden │ │ └── input.html │ ├── link │ │ ├── output.relative.golden │ │ ├── output.inlined.golden │ │ ├── output.referenced_full.golden │ │ ├── output.referenced_shortcut.golden │ │ ├── output.referenced_collapsed.golden │ │ └── input.html │ └── pre_code │ │ ├── output.indented.golden │ │ ├── output.fenced_backtick.golden │ │ ├── output.fenced_tilde.golden │ │ ├── goldmark.golden │ │ └── input.html └── TestPlugins │ ├── movefrontmatter │ ├── jekyll │ │ ├── input.html │ │ ├── output.default.golden │ │ └── goldmark.golden │ ├── not │ │ ├── input.html │ │ ├── goldmark.golden │ │ └── output.default.golden │ ├── simple │ │ ├── output.default.golden │ │ ├── input.html │ │ └── goldmark.golden │ └── blog │ │ ├── input.html │ │ ├── output.default.golden │ │ └── goldmark.golden │ ├── strikethrough │ ├── output.default.golden │ ├── goldmark.golden │ └── input.html │ ├── checkbox │ ├── output.default.golden │ ├── input.html │ └── goldmark.golden │ └── table │ ├── output.default.golden │ ├── output.tablecompat.golden │ ├── output.table.golden │ └── input.html ├── SECURITY.md ├── .gitignore ├── go.mod ├── plugin ├── gfm.go ├── confluence_attachment_block.go ├── task_list.go ├── strikethrough.go ├── youtube.go ├── confluence_code_block.go ├── frontmatter.go ├── plugin_test.go ├── movefrontmatter.go ├── vimeo.go └── table.go ├── examples ├── options │ └── main.go ├── goquery │ └── main.go ├── github_flavored │ └── main.go ├── custom_tag │ └── main.go └── add_rules │ └── main.go ├── .github ├── ISSUE_TEMPLATE │ └── bug_report.md └── workflows │ └── go.yml ├── LICENSE ├── markdown_test.go ├── plugin_test.go ├── escape └── escape.go ├── go.sum ├── markdown.go ├── README.md ├── commonmark_test.go └── utils_test.go /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rsc/html-to-markdown/master/logo.png -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/price_em_in_a_p/output.default.golden: -------------------------------------------------------------------------------- 1 | 首付 _19,8万_ 月供 -------------------------------------------------------------------------------- /testdata/TestCommonmark/hr/output.default.golden: -------------------------------------------------------------------------------- 1 | Some Content 2 | 3 | * * * 4 | 5 | Other Content -------------------------------------------------------------------------------- /testdata/TestCommonmark/keep_remove_tag/output.default.golden: -------------------------------------------------------------------------------- 1 |

Content

-------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/price_em_in_a_p/goldmark.golden: -------------------------------------------------------------------------------- 1 |

首付 19,8万 月供

2 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/hr/goldmark.golden: -------------------------------------------------------------------------------- 1 |

Some Content

2 |
3 |

Other Content

4 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/hr/input.html: -------------------------------------------------------------------------------- 1 | 2 |

Some Content

3 |
4 |

Other Content

5 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/price_em_in_a_p/input.html: -------------------------------------------------------------------------------- 1 |

首付19,8 月供

-------------------------------------------------------------------------------- /testdata/TestPlugins/movefrontmatter/jekyll/input.html: -------------------------------------------------------------------------------- 1 | +++ 2 | food: Pizza 3 | +++ 4 | 5 |

{{ page.food }}

6 | -------------------------------------------------------------------------------- /testdata/TestPlugins/movefrontmatter/jekyll/output.default.golden: -------------------------------------------------------------------------------- 1 | +++ 2 | food: Pizza 3 | +++ 4 | 5 | # {{ page.food }} -------------------------------------------------------------------------------- /testdata/TestPlugins/movefrontmatter/jekyll/goldmark.golden: -------------------------------------------------------------------------------- 1 |

+++ 2 | food: Pizza 3 | +++

4 |

{{ page.food }}

5 | -------------------------------------------------------------------------------- /testdata/TestPlugins/movefrontmatter/not/input.html: -------------------------------------------------------------------------------- 1 |

---title---

2 | 3 | --- 4 | type: page 5 | tags: 6 | - Berlin 7 | --- 8 | -------------------------------------------------------------------------------- /testdata/TestPlugins/movefrontmatter/not/goldmark.golden: -------------------------------------------------------------------------------- 1 |

---title---

2 |

--- 3 | type: page 4 | tags: 5 | - Berlin 6 | ---

7 | -------------------------------------------------------------------------------- /testdata/TestPlugins/movefrontmatter/not/output.default.golden: -------------------------------------------------------------------------------- 1 | # ---title--- 2 | 3 | \-\-\- 4 | type: page 5 | tags: 6 | \- Berlin 7 | \-\-\- -------------------------------------------------------------------------------- /testdata/TestPlugins/strikethrough/output.default.golden: -------------------------------------------------------------------------------- 1 | Some ~~Strikethrough~~ Text 2 | 3 | Only ~~blue ones~~ ~~left~~ 4 | 5 | Some ~~Strikethrough~~ Text -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/square_brackets/output.default.golden: -------------------------------------------------------------------------------- 1 | first [literal] brackets 2 | 3 | then [one] way to escape 4 | 5 | then [another] one -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/square_brackets/goldmark.golden: -------------------------------------------------------------------------------- 1 |

first [literal] brackets

2 |

then [one] way to escape

3 |

then [another] one

4 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/keep_remove_tag/goldmark.golden: -------------------------------------------------------------------------------- 1 |

Content

2 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/square_brackets/input.html: -------------------------------------------------------------------------------- 1 |

first [literal] brackets

2 |

then [one] way to escape

3 |

then [another] one

4 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/br_element/output.default.golden: -------------------------------------------------------------------------------- 1 | 1\. xxx 2 | 3 | 2\. xxxx 4 | 5 | 3\. xxx 6 | 7 | ![](http://example.com/xxx) 8 | 9 | 4\. golang 10 | 11 | a. xx 12 | 13 | b. xx -------------------------------------------------------------------------------- /testdata/TestPlugins/strikethrough/goldmark.golden: -------------------------------------------------------------------------------- 1 |

Some Strikethrough Text

2 |

Only blue ones left

3 |

Some Strikethrough Text

4 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/br_element/goldmark.golden: -------------------------------------------------------------------------------- 1 |

1. xxx

2 |

2. xxxx

3 |

3. xxx

4 |

5 |

4. golang

6 |

a. xx

7 |

b. xx

8 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/br_element/input.html: -------------------------------------------------------------------------------- 1 | 2 | 3 |

1. xxx
2. xxxx
3. xxx


4. golang
a. xx
b. xx

4 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/italic/output.asterisks.golden: -------------------------------------------------------------------------------- 1 | Some *Text* 2 | 3 | Some *Text* 4 | 5 | Some *Text* 6 | 7 | *DoubleItalic* 8 | 9 | Some *19,80€* Text 10 | 11 | *Content* and no space afterward. 12 | 13 | \_Not Italic\_ -------------------------------------------------------------------------------- /testdata/TestCommonmark/italic/output.underscores.golden: -------------------------------------------------------------------------------- 1 | Some _Text_ 2 | 3 | Some _Text_ 4 | 5 | Some _Text_ 6 | 7 | _DoubleItalic_ 8 | 9 | Some _19,80€_ Text 10 | 11 | _Content_ and no space afterward. 12 | 13 | \_Not Italic\_ -------------------------------------------------------------------------------- /SECURITY.md: -------------------------------------------------------------------------------- 1 | # Security Policy 2 | 3 | ## Reporting a Vulnerability 4 | 5 | Please report (suspected) security vulnerabilities to johannes@joina.de with the subject _"Security html-to-markdown"_ and you will receive a response within 48 hours. 6 | 7 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/sup_element/output.default.golden: -------------------------------------------------------------------------------- 1 | One of the most common equations in all of physics is 2 | E=mc2. 3 | 4 | The ordinal number "fifth" can be abbreviated in various languages as follows: 5 | 6 | - English: 5th 7 | - French: 5ème -------------------------------------------------------------------------------- /testdata/TestPlugins/movefrontmatter/simple/output.default.golden: -------------------------------------------------------------------------------- 1 | --- 2 | type: page 3 | tags: 4 | - Berlin 5 | --- 6 | 7 | some content 8 | \-\-\- 9 | not: frontmatter 10 | \-\-\- 11 | 12 | Start of the **HTML** Document. 13 | 14 | # Title -------------------------------------------------------------------------------- /testdata/TestPlugins/movefrontmatter/simple/input.html: -------------------------------------------------------------------------------- 1 | --- 2 | type: page 3 | tags: 4 | - Berlin 5 | --- 6 | 7 | some content 8 | --- 9 | not: frontmatter 10 | --- 11 | 12 | 13 | Start of the HTML Document. 14 | 15 |

Title

16 | -------------------------------------------------------------------------------- /testdata/TestPlugins/checkbox/output.default.golden: -------------------------------------------------------------------------------- 1 | - [x] Checked! 2 | 3 | - [ ] Check Me! 4 | 5 | - [ ] Check Me B 6 | - [ ] Check Me C 7 | - [ ] Check Nested 1 8 | - [ ] Check Nested 2 9 | - [ ] Check Me D 10 | - [ ] Check Nested 1 11 | - [ ] Check Nested 2 -------------------------------------------------------------------------------- /testdata/TestCommonmark/italic/goldmark.golden: -------------------------------------------------------------------------------- 1 |

Some Text

2 |

Some Text

3 |

Some Text

4 |

DoubleItalic

5 |

Some 19,80€ Text

6 |

Content and no space afterward.

7 |

_Not Italic_

8 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Binaries for programs and plugins 2 | *.exe 3 | *.exe~ 4 | *.dll 5 | *.so 6 | *.dylib 7 | 8 | # Test binary, build with `go test -c` 9 | *.test 10 | 11 | # Output of the go coverage tool, specifically when used with LiteIDE 12 | *.out 13 | 14 | .DS_Store 15 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/sup_element/goldmark.golden: -------------------------------------------------------------------------------- 1 |

One of the most common equations in all of physics is 2 | E=mc2.

3 |

The ordinal number "fifth" can be abbreviated in various languages as follows:

4 | 8 | -------------------------------------------------------------------------------- /testdata/TestPlugins/movefrontmatter/simple/goldmark.golden: -------------------------------------------------------------------------------- 1 |
2 |

type: page 3 | tags:

4 | 7 |
8 |

some content 9 | --- 10 | not: frontmatter 11 | ---

12 |

Start of the HTML Document.

13 |

Title

14 | -------------------------------------------------------------------------------- /testdata/TestPlugins/strikethrough/input.html: -------------------------------------------------------------------------------- 1 | 2 |

Some Strikethrough Text

3 | 4 | 5 | 6 |

Only blue ones left

7 | 8 | 9 | 10 |

Some Strikethrough Text

11 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/code_design_heading_in_link/output.default.golden: -------------------------------------------------------------------------------- 1 | [**\#zuhause\_jul20**\ 2 | Remote\ 3 | \ 4 | Datum\ 5 | \ 6 | 31.07. - 02.08.20\ 7 | \ 8 | \ 9 | Uhrzeit\ 10 | \ 11 | Fr 15 Uhr - So 19 Uhr\ 12 | \ 13 | \ 14 | Details anzeigen →](https://code.design/events/zuhause_jul20) -------------------------------------------------------------------------------- /testdata/TestCommonmark/heading/output.default.golden: -------------------------------------------------------------------------------- 1 | # Heading 1 2 | 3 | ## Heading 2 4 | 5 | ### Heading 3 6 | 7 | #### Heading 4 8 | 9 | ##### Heading 5 10 | 11 | ###### Heading 6 12 | 13 | ## Heading with Whitespace 14 | 15 | ## Header Containing Newlines 16 | 17 | # Heading One 18 | 19 | [**Heading 2**](/page.html) -------------------------------------------------------------------------------- /testdata/TestCommonmark/bold/output.asterisks.golden: -------------------------------------------------------------------------------- 1 | Some **Text** 2 | 3 | Some **Text** 4 | 5 | **Text** 6 | 7 | Some **Text** 8 | 9 | Some **Text.** 10 | 11 | Some **Text** Content 12 | 13 | Some **Text.** 14 | 15 | Some **Text** 16 | 17 | #### 首付 _19,8万_ / 月供 _6339元X24_ 18 | 19 | \*\*Not Strong\*\* 20 | \*\*Still Not 21 | Strong\*\* -------------------------------------------------------------------------------- /testdata/TestCommonmark/bold/output.underscores.golden: -------------------------------------------------------------------------------- 1 | Some __Text__ 2 | 3 | Some __Text__ 4 | 5 | __Text__ 6 | 7 | Some __Text__ 8 | 9 | Some __Text.__ 10 | 11 | Some __Text__ Content 12 | 13 | Some __Text.__ 14 | 15 | Some __Text__ 16 | 17 | #### 首付 _19,8万_ / 月供 _6339元X24_ 18 | 19 | \*\*Not Strong\*\* 20 | \*\*Still Not 21 | Strong\*\* -------------------------------------------------------------------------------- /go.mod: -------------------------------------------------------------------------------- 1 | module github.com/JohannesKaufmann/html-to-markdown 2 | 3 | go 1.13 4 | 5 | require ( 6 | github.com/PuerkitoBio/goquery v1.5.1 7 | github.com/sebdah/goldie/v2 v2.5.1 8 | github.com/sergi/go-diff v1.1.0 // indirect 9 | github.com/yuin/goldmark v1.2.0 10 | golang.org/x/net v0.0.0-20200320220750-118fecf932d8 11 | gopkg.in/yaml.v2 v2.2.8 12 | ) 13 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/code_design_heading_in_link/goldmark.golden: -------------------------------------------------------------------------------- 1 |

#zuhause_jul20
2 | Remote
3 |
4 | Datum
5 |
6 | 31.07. - 02.08.20
7 |
8 |
9 | Uhrzeit
10 |
11 | Fr 15 Uhr - So 19 Uhr
12 |
13 |
14 | Details anzeigen →

15 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/sup_element/input.html: -------------------------------------------------------------------------------- 1 | 2 | 3 |

One of the most common equations in all of physics is 4 | E=mc2.

5 | 6 | 7 | 8 |

The ordinal number "fifth" can be abbreviated in various languages as follows:

9 | 13 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/bold/goldmark.golden: -------------------------------------------------------------------------------- 1 |

Some Text

2 |

Some Text

3 |

Text

4 |

Some Text

5 |

Some Text.

6 |

Some Text Content

7 |

Some Text.

8 |

Some Text

9 |

首付 19,8万 / 月供 6339元X24

10 |

**Not Strong** 11 | **Still Not 12 | Strong**

13 | -------------------------------------------------------------------------------- /testdata/TestPlugins/movefrontmatter/blog/input.html: -------------------------------------------------------------------------------- 1 | +++ 2 | type: page 3 | layout: reisebericht 4 | title: Hamar 5 | date: "2018-10-24 22:32:03 +0100" 6 | weight: 2 7 | tags: 8 | - Norwegen 9 | - Hamar 10 | url: /2018/10-norwegen/02-hamar/ 11 | description: Ein Spaziergang durch Hamar, Shoppen und ein Besuch im Norsk jernbanemusem, dem norwegischen Eisenbahnmuseum. 12 | image: files/2018/10-Norwegen/Hamar_Titel.jpg 13 | +++ 14 | 15 |

{{ title }}

16 | -------------------------------------------------------------------------------- /testdata/TestPlugins/movefrontmatter/blog/output.default.golden: -------------------------------------------------------------------------------- 1 | +++ 2 | type: page 3 | layout: reisebericht 4 | title: Hamar 5 | date: "2018-10-24 22:32:03 +0100" 6 | weight: 2 7 | tags: 8 | - Norwegen 9 | - Hamar 10 | url: /2018/10-norwegen/02-hamar/ 11 | description: Ein Spaziergang durch Hamar, Shoppen und ein Besuch im Norsk jernbanemusem, dem norwegischen Eisenbahnmuseum. 12 | image: files/2018/10-Norwegen/Hamar_Titel.jpg 13 | +++ 14 | 15 | # {{ title }} -------------------------------------------------------------------------------- /testdata/TestPlugins/movefrontmatter/blog/goldmark.golden: -------------------------------------------------------------------------------- 1 |

+++ 2 | type: page 3 | layout: reisebericht 4 | title: Hamar 5 | date: "2018-10-24 22:32:03 +0100" 6 | weight: 2 7 | tags:

8 | 16 |

{{ title }}

17 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/italic/input.html: -------------------------------------------------------------------------------- 1 | 2 |

Some Text

3 | 4 | 5 | 6 |

Some Text

7 | 8 | 9 | 10 |

Some Text

11 | 12 | 13 | 14 |

DoubleItalic

15 | 16 |

Some 19,80 Text

17 | 18 | 19 | 20 |

Content and no space afterward.

21 | 22 | 23 | 24 |

_Not Italic_

-------------------------------------------------------------------------------- /plugin/gfm.go: -------------------------------------------------------------------------------- 1 | // Package plugin contains all the rules that are not 2 | // part of Commonmark like GitHub Flavored Markdown. 3 | package plugin 4 | 5 | import md "github.com/JohannesKaufmann/html-to-markdown" 6 | 7 | // GitHubFlavored is GitHub's Flavored Markdown 8 | func GitHubFlavored() md.Plugin { 9 | return func(c *md.Converter) (rules []md.Rule) { 10 | rules = append(rules, Strikethrough("")(c)...) 11 | rules = append(rules, Table()(c)...) 12 | rules = append(rules, TaskListItems()(c)...) 13 | return 14 | } 15 | } 16 | -------------------------------------------------------------------------------- /examples/options/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "log" 6 | 7 | md "github.com/JohannesKaufmann/html-to-markdown" 8 | ) 9 | 10 | func main() { 11 | html := `Bold Text` 12 | // -> `__Bold Text__` 13 | // instead of `**Bold Text**` 14 | 15 | opt := &md.Options{ 16 | StrongDelimiter: "__", // default: ** 17 | } 18 | conv := md.NewConverter("", true, opt) 19 | 20 | markdown, err := conv.ConvertString(html) 21 | if err != nil { 22 | log.Fatal(err) 23 | } 24 | fmt.Println(markdown) 25 | } 26 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/heading/output.atx.golden: -------------------------------------------------------------------------------- 1 | # Heading 1 2 | 3 | ## Heading 2 4 | 5 | ### Heading 3 6 | 7 | #### Heading 4 8 | 9 | ##### Heading 5 10 | 11 | ###### Heading 6 12 | 13 | Heading 7 14 | 15 | ## Heading with Whitespace 16 | 17 | ## Header Containing Newlines 18 | 19 | # Heading One 20 | 21 | [**Heading 2**](http://example.com/page.html) 22 | 23 | # \#hashtag 24 | 25 | not title 26 | \-\-\---- 27 | 28 | not title 29 | - 30 | 31 | not title 32 | 33 | - 34 | 35 | not title 36 | = 37 | 38 | not title 39 | \\-\\-\\- -------------------------------------------------------------------------------- /examples/goquery/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "log" 6 | 7 | md "github.com/JohannesKaufmann/html-to-markdown" 8 | 9 | "github.com/PuerkitoBio/goquery" 10 | ) 11 | 12 | func main() { 13 | url := "https://blog.golang.org/godoc-documenting-go-code" 14 | doc, err := goquery.NewDocument(url) 15 | if err != nil { 16 | log.Fatal(err) 17 | } 18 | content := doc.Find("#content") 19 | 20 | conv := md.NewConverter(md.DomainFromURL(url), true, nil) 21 | markdown := conv.Convert(content) 22 | 23 | fmt.Println(markdown) 24 | } 25 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/heading/goldmark.golden: -------------------------------------------------------------------------------- 1 |

Heading 1

2 |

Heading 2

3 |

Heading 3

4 |

Heading 4

5 |
Heading 5
6 |
Heading 6
7 |

Heading 7

8 |

Heading with Whitespace

9 |

Header Containing Newlines

10 |

Heading One

11 |

Heading 2

12 |

#hashtag

13 |

not title 14 | ------

15 |

not title

16 |

not title

17 | 20 |

not title

21 |

not title 22 | \-\-\-

23 | -------------------------------------------------------------------------------- /testdata/TestPlugins/table/output.default.golden: -------------------------------------------------------------------------------- 1 | FirstnameLastnameAgeJillSmith50EveJackson94EmptyEnd 2 | 3 | ### With \| Character 4 | 5 | FirstnameWith \| CharacterAgeJillSmith50 Eve 6 | Jackson 7 | 94 8 | 9 | ### Tabelle mit thead, tfoot, and tbody 10 | 11 | Header content 1Header content 2Footer content 1Footer content 2Body content 1Body content 2Unglaublich tolle BeschreibungUnglaublich tolle Daten 12 | 13 | #### Pegel DUISBURG-RUHRORT 14 | 15 | Quelle: … 16 | 17 | ABABCDABC**Strong**[Link](http://example.com/link.html)_Italic_`var`bc 18 | 19 | 1 20 | 21 | 2 22 | 23 | 3 24 | 25 | 1 26 | 27 | 2 28 | 29 | 3 -------------------------------------------------------------------------------- /testdata/TestCommonmark/blockquote/output.default.golden: -------------------------------------------------------------------------------- 1 | > Some Quote 2 | > Next Line 3 | 4 | > Allowing an unimportant mistake to pass without comment is a wonderful social grace. 5 | > 6 | > Ideological differences are no excuse for rudeness. 7 | 8 | > This is the first level of quoting. 9 | > 10 | > > This is a paragraph in a nested blockquote. 11 | > 12 | > Back to the first level. 13 | 14 | > ## This is a header. 15 | > 16 | > 1. This is the first list item. 17 | > 2. This is the second list item. 18 | > 19 | > A code block: 20 | > 21 | > ``` 22 | > return 1 < 2 ? shell_exec('echo $input | $markdown_script') : 0; 23 | > ``` -------------------------------------------------------------------------------- /testdata/TestCommonmark/p_tag/output.default.golden: -------------------------------------------------------------------------------- 1 | Some Content 2 | 3 | Text 4 | 5 | Some Text 6 | 7 | Some Content 8 | 9 | jmap –histo[:live] 10 | 11 | Sometimes a struct field, function, type, or even a whole package becomes 12 | 13 | redundant or unnecessary, but must be kept for compatibility with existing 14 | 15 | programs. 16 | 17 | To signal that an identifier should not be used, add a paragraph to its doc 18 | 19 | comment that begins with "Deprecated:" followed by some information about the 20 | 21 | deprecation. 22 | 23 | There are a few examples [in the standard library](https://golang.org/search?q=Deprecated:). -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/tweet/output.default.golden: -------------------------------------------------------------------------------- 1 | [![](https://cdn.substack.com/image/twitter_name/w_36/kroger.jpg)Kroger @kroger\ 2 | \ 3 | As a company, it’s our responsibility to better support our Black associates, customers and allies. We know there is more work to do and will keep you updated on our progress, this is only the beginning. Black Lives Matter. \ 4 | ![](https://cdn.substack.com/image/fetch/w_600,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fpbs.substack.com%2Fmedia%2FEaVVy4aXsAglkCk.jpg)\ 5 | \ 6 | June 12th 2020\ 7 | \ 8 | 17 Retweets93 Likes](https://twitter.com/kroger/status/1271516803756425218) -------------------------------------------------------------------------------- /testdata/TestCommonmark/heading/output.setext.golden: -------------------------------------------------------------------------------- 1 | Heading 1 2 | ========= 3 | 4 | Heading 2 5 | --------- 6 | 7 | ### Heading 3 8 | 9 | #### Heading 4 10 | 11 | ##### Heading 5 12 | 13 | ###### Heading 6 14 | 15 | Heading 7 16 | 17 | Heading with Whitespace 18 | ----------------------- 19 | 20 | Header Containing Newlines 21 | --------------------------- 22 | 23 | Heading One 24 | ============== 25 | 26 | [**Heading 2**](http://example.com/page.html) 27 | 28 | \#hashtag 29 | ========= 30 | 31 | not title 32 | \-\-\---- 33 | 34 | not title 35 | - 36 | 37 | not title 38 | 39 | - 40 | 41 | not title 42 | = 43 | 44 | not title 45 | \\-\\-\\- -------------------------------------------------------------------------------- /testdata/TestCommonmark/p_tag/goldmark.golden: -------------------------------------------------------------------------------- 1 |

Some Content

2 |

Text

3 |

Some Text

4 |

Some Content

5 |

jmap –histo[:live]

6 |

Sometimes a struct field, function, type, or even a whole package becomes

7 |

redundant or unnecessary, but must be kept for compatibility with existing

8 |

programs.

9 |

To signal that an identifier should not be used, add a paragraph to its doc

10 |

comment that begins with "Deprecated:" followed by some information about the

11 |

deprecation.

12 |

There are a few examples in the standard library.

13 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/github_about/output.default.golden: -------------------------------------------------------------------------------- 1 | ## About 2 | 3 | ⚙️ 4 | Convert HTML to Markdown. Even works with whole websites. 5 | 6 | ### Topics 7 | 8 | [go](http://example.com/topics/go "Topic: go") [golang](http://example.com/topics/golang "Topic: golang") [html](http://example.com/topics/html "Topic: html") [html-to-markdown](http://example.com/topics/html-to-markdown "Topic: html-to-markdown") [markdown](http://example.com/topics/markdown "Topic: markdown") 9 | 10 | ### Resources 11 | 12 | [Readme](http://example.com#readme) 13 | 14 | ### License 15 | 16 | [MIT License](http://example.com/JohannesKaufmann/html-to-markdown/blob/master/LICENSE) -------------------------------------------------------------------------------- /.github/ISSUE_TEMPLATE/bug_report.md: -------------------------------------------------------------------------------- 1 | --- 2 | name: Bug report 3 | about: Create a report to help us improve 4 | title: "\U0001F41B Bug" 5 | labels: bug 6 | assignees: '' 7 | 8 | --- 9 | 10 | **Describe the bug** 11 | A clear and concise description of what the bug is. 12 | 13 | **HTML Input** 14 | ```html 15 |

Title

16 | ``` 17 | 18 | 19 | **Generated Markdown** 20 | ````markdown 21 | # Title 22 | ```` 23 | 24 | **Expected Markdown** 25 | ````markdown 26 | # Title!!! 27 | ```` 28 | 29 | **Additional context** 30 | Add any other context about the problem here. For example, if you changed the default options or used a plugin. Also adding the version from the `go.mod` is helpful. 31 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/tweet/goldmark.golden: -------------------------------------------------------------------------------- 1 |

Kroger @kroger
2 |
3 | As a company, it’s our responsibility to better support our Black associates, customers and allies. We know there is more work to do and will keep you updated on our progress, this is only the beginning. Black Lives Matter.
4 |
5 |
6 | June 12th 2020
7 |
8 | 17 Retweets93 Likes

9 | -------------------------------------------------------------------------------- /testdata/TestPlugins/checkbox/input.html: -------------------------------------------------------------------------------- 1 | 24 | -------------------------------------------------------------------------------- /examples/github_flavored/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "log" 6 | 7 | md "github.com/JohannesKaufmann/html-to-markdown" 8 | "github.com/JohannesKaufmann/html-to-markdown/plugin" 9 | ) 10 | 11 | func main() { 12 | html := ` 13 | 17 | ` 18 | /* 19 | - [x] Checked! 20 | - [ ] Check Me! 21 | */ 22 | 23 | conv := md.NewConverter("", true, nil) 24 | 25 | // Use the `GitHubFlavored` plugin from the `plugin` package. 26 | conv.Use(plugin.GitHubFlavored()) 27 | 28 | markdown, err := conv.ConvertString(html) 29 | if err != nil { 30 | log.Fatal(err) 31 | } 32 | fmt.Println(markdown) 33 | } 34 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/image/output.default.golden: -------------------------------------------------------------------------------- 1 | ![website favicon](http://commonmark.org/help/images/favicon.png) 2 | 3 | ![alt "attribute"](http://commonmark.org/help/images/favicon.png) 4 | 5 | ![alt description](http://commonmark.org/help/images/favicon.png) 6 | 7 | ![](http://example.com/image.png) 8 | 9 | ![](invalid%zz 10 | 11 | zzz) 12 | 13 | ![star]() 14 | 15 | ![website favicon](http://commonmark.org/help/images/favicon.png) -------------------------------------------------------------------------------- /testdata/TestPlugins/checkbox/goldmark.golden: -------------------------------------------------------------------------------- 1 | 26 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/github_about/goldmark.golden: -------------------------------------------------------------------------------- 1 |

About

2 |

⚙️ 3 | Convert HTML to Markdown. Even works with whole websites.

4 |

Topics

5 |

go golang html html-to-markdown markdown

6 |

Resources

7 |

Readme

8 |

License

9 |

MIT License

10 | -------------------------------------------------------------------------------- /plugin/confluence_attachment_block.go: -------------------------------------------------------------------------------- 1 | package plugin 2 | 3 | import ( 4 | "fmt" 5 | 6 | md "github.com/JohannesKaufmann/html-to-markdown" 7 | "github.com/PuerkitoBio/goquery" 8 | ) 9 | 10 | // ConfluenceAttachments converts `` elements 11 | // [Contributed by @Skarlso] 12 | func ConfluenceAttachments() md.Plugin { 13 | return func(c *md.Converter) []md.Rule { 14 | return []md.Rule{ 15 | { 16 | Filter: []string{"ri:attachment"}, 17 | Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 18 | if v, ok := selec.Attr("ri:filename"); ok { 19 | formatted := fmt.Sprintf("![][%s]", v) 20 | return md.String(formatted) 21 | } 22 | return md.String("") 23 | }, 24 | }, 25 | } 26 | } 27 | } 28 | -------------------------------------------------------------------------------- /plugin/task_list.go: -------------------------------------------------------------------------------- 1 | package plugin 2 | 3 | import ( 4 | md "github.com/JohannesKaufmann/html-to-markdown" 5 | "github.com/PuerkitoBio/goquery" 6 | ) 7 | 8 | // TaskListItems converts checkboxes into task list items. 9 | func TaskListItems() md.Plugin { 10 | return func(c *md.Converter) []md.Rule { 11 | return []md.Rule{ 12 | { 13 | Filter: []string{"input"}, 14 | Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 15 | if !selec.Parent().Is("li") { 16 | return nil 17 | } 18 | if selec.AttrOr("type", "") != "checkbox" { 19 | return nil 20 | } 21 | 22 | _, ok := selec.Attr("checked") 23 | if ok { 24 | return md.String("[x] ") 25 | } 26 | return md.String("[ ] ") 27 | }, 28 | }, 29 | } 30 | } 31 | } 32 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/blockquote/goldmark.golden: -------------------------------------------------------------------------------- 1 |
2 |

Some Quote 3 | Next Line

4 |
5 |
6 |

Allowing an unimportant mistake to pass without comment is a wonderful social grace.

7 |

Ideological differences are no excuse for rudeness.

8 |
9 |
10 |

This is the first level of quoting.

11 |
12 |

This is a paragraph in a nested blockquote.

13 |
14 |

Back to the first level.

15 |
16 |
17 |

This is a header.

18 |
    19 |
  1. This is the first list item.
  2. 20 |
  3. This is the second list item.
  4. 21 |
22 |

A code block:

23 |
return 1 < 2 ? shell_exec('echo $input | $markdown_script') : 0;
24 | 
25 |
26 | -------------------------------------------------------------------------------- /testdata/TestPlugins/table/output.tablecompat.golden: -------------------------------------------------------------------------------- 1 | Firstname · Lastname · Age 2 | 3 | Jill · Smith · 50 4 | 5 | Eve · Jackson · 94 6 | 7 | Empty 8 | 9 | End 10 | 11 | ### With \| Character 12 | 13 | Firstname · With \| Character · Age 14 | 15 | Jill · Smith · 50 16 | 17 | Eve · Jackson · 94 18 | 19 | ### Tabelle mit thead, tfoot, and tbody 20 | 21 | Header content 1 · Header content 2 22 | 23 | Footer content 1 · Footer content 2 24 | 25 | Body content 1 · Body content 2 26 | 27 | Unglaublich tolle BeschreibungUnglaublich tolle Daten 28 | 29 | #### Pegel DUISBURG-RUHRORT 30 | 31 | Quelle: … 32 | 33 | A · B 34 | 35 | A · B · C · D 36 | 37 | A · B · C 38 | 39 | **Strong** · [Link](http://example.com/link.html) · _Italic_ 40 | 41 | `var` · b · c 42 | 43 | 1 44 | 45 | 2 46 | 47 | 3 48 | 49 | 1 50 | 51 | 2 52 | 53 | 3 -------------------------------------------------------------------------------- /testdata/TestCommonmark/image/goldmark.golden: -------------------------------------------------------------------------------- 1 |

website favicon

2 |

alt "attribute"

3 |

alt  description

4 |

5 |

![](invalid%zz

6 |

zzz)

7 |

star

8 |

website favicon

9 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/text_with_whitespace/output.default.golden: -------------------------------------------------------------------------------- 1 | # Aktuelles 2 | 3 | [BRV-abend](http://www.bonnerruderverein.de/wp-content/uploads/2015/09/BRV-abend.jpg "BRV-abend") 4 | 5 | 25 Mai 6 | 7 | ![](http://www.bonnerruderverein.de/wp-content/uploads/2015/09/BRV-abend.jpg) 8 | 9 | ### [9\. Bonner Nachtlauf - Einschränkungen am Bootshaus](http://www.bonnerruderverein.de/bonner-nachtlauf/) 10 | 11 | am Mittwoch, dem 30. Mai 2018 findet am Bonner Rheinufer der 9. ... 12 | [More](http://www.bonnerruderverein.de/bonner-nachtlauf/) 13 | 14 | * * * 15 | 16 | # Aktuelles 17 | 18 | ### [Title](http://example.com/some_url) 19 | 20 | Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Vestibulum id ligula porta felis euismod semper. 21 | [More](http://example.com/other_url) -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/turndown_demo/output.default.golden: -------------------------------------------------------------------------------- 1 | # Turndown Demo 2 | 3 | This demonstrates [turndown](https://github.com/domchristie/turndown) – an HTML to Markdown converter in JavaScript. 4 | 5 | ## Usage 6 | 7 | ```js 8 | var turndownService = new TurndownService() 9 | console.log( 10 | turndownService.turndown('

Hello world

') 11 | ) 12 | ``` 13 | 14 | * * * 15 | 16 | It aims to be [CommonMark](http://commonmark.org/) 17 | compliant, and includes options to style the output. These options include: 18 | 19 | - headingStyle (setext or atx) 20 | - horizontalRule (\*, -, or \_) 21 | - bullet (\*, -, or +) 22 | - codeBlockStyle (indented or fenced) 23 | - fence 24 | - emDelimiter (\_ or \*) 25 | - strongDelimiter (\*\* or \_\_) 26 | - linkStyle (inlined or referenced) 27 | - linkReferenceStyle (full, collapsed, or shortcut) -------------------------------------------------------------------------------- /testdata/TestCommonmark/bold/input.html: -------------------------------------------------------------------------------- 1 | 2 |

Some Text

3 | 4 | 5 | 6 |

Some Text

7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 |

Text

15 | 16 | 17 | 18 |

SomeText

19 | 20 | 21 | 22 |

Some Text.

23 | 24 |

SomeTextContent

25 | 26 |

SomeText.

27 | 28 | 29 |

Some Text

30 | 31 | 32 | 33 |

首付19,8 / 34 | 月供6339元X24

35 | 36 | 37 | 38 |

**Not Strong** 39 | **Still Not 40 | Strong**

-------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/text_with_whitespace/goldmark.golden: -------------------------------------------------------------------------------- 1 |

Aktuelles

2 |

BRV-abend

3 |

25 Mai

4 |

5 |

9. Bonner Nachtlauf - Einschränkungen am Bootshaus

6 |

am Mittwoch, dem 30. Mai 2018 findet am Bonner Rheinufer der 9. ... 7 | More

8 |
9 |

Aktuelles

10 |

Title

11 |

Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Vestibulum id ligula porta felis euismod semper. 12 | More

13 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/list_nested/output.dash.golden: -------------------------------------------------------------------------------- 1 | - foo 2 | - bar 3 | - baz 4 | - boo 5 | 6 | - Coffee 7 | - Tea 8 | - Black tea 9 | - Green tea 10 | - Milk 11 | 12 | # header1 13 | 14 | - Bullet list 15 | - Nested bullet 16 | - Sub-nested bullet etc 17 | - Bullet list item 2 18 | 19 | - One 20 | - One point one 21 | - One point two 22 | 23 | 1. One 24 | 1. One point one 25 | 2. One point two 26 | 27 | - 1 28 | - 2 29 | - 2.1 30 | - 2.2 31 | - 2.2.1 32 | - 2.2.2 33 | - 2.2.3 34 | - 3 35 | 36 | 1. 1 37 | 2. 2 38 | 1. 2.1 39 | 2. 2.2 40 | 1. 2.2.1 41 | 2. 2.2.2 42 | 3. 2.2.3 43 | 4. 3 44 | 45 | - First Thing 46 | 47 | Second Thing 48 | - Nested First Thing 49 | 50 | Nested Thing 51 | 52 | 1. First Thing 53 | 54 | Second Thing 55 | 1. Nested First Thing 56 | 57 | Nested Thing 58 | 59 | 2. Date: 60 | 61 | 20.02.2021 62 | 63 | 26\. Mai - 3. Juni -------------------------------------------------------------------------------- /testdata/TestCommonmark/list_nested/output.plus.golden: -------------------------------------------------------------------------------- 1 | + foo 2 | + bar 3 | + baz 4 | + boo 5 | 6 | + Coffee 7 | + Tea 8 | + Black tea 9 | + Green tea 10 | + Milk 11 | 12 | # header1 13 | 14 | + Bullet list 15 | + Nested bullet 16 | + Sub-nested bullet etc 17 | + Bullet list item 2 18 | 19 | + One 20 | + One point one 21 | + One point two 22 | 23 | 1. One 24 | 1. One point one 25 | 2. One point two 26 | 27 | + 1 28 | + 2 29 | + 2.1 30 | + 2.2 31 | + 2.2.1 32 | + 2.2.2 33 | + 2.2.3 34 | + 3 35 | 36 | 1. 1 37 | 2. 2 38 | 1. 2.1 39 | 2. 2.2 40 | 1. 2.2.1 41 | 2. 2.2.2 42 | 3. 2.2.3 43 | 4. 3 44 | 45 | + First Thing 46 | 47 | Second Thing 48 | + Nested First Thing 49 | 50 | Nested Thing 51 | 52 | 1. First Thing 53 | 54 | Second Thing 55 | 1. Nested First Thing 56 | 57 | Nested Thing 58 | 59 | 2. Date: 60 | 61 | 20.02.2021 62 | 63 | 26\. Mai - 3. Juni -------------------------------------------------------------------------------- /testdata/TestCommonmark/list_nested/output.asterisks.golden: -------------------------------------------------------------------------------- 1 | * foo 2 | * bar 3 | * baz 4 | * boo 5 | 6 | * Coffee 7 | * Tea 8 | * Black tea 9 | * Green tea 10 | * Milk 11 | 12 | # header1 13 | 14 | * Bullet list 15 | * Nested bullet 16 | * Sub-nested bullet etc 17 | * Bullet list item 2 18 | 19 | * One 20 | * One point one 21 | * One point two 22 | 23 | 1. One 24 | 1. One point one 25 | 2. One point two 26 | 27 | * 1 28 | * 2 29 | * 2.1 30 | * 2.2 31 | * 2.2.1 32 | * 2.2.2 33 | * 2.2.3 34 | * 3 35 | 36 | 1. 1 37 | 2. 2 38 | 1. 2.1 39 | 2. 2.2 40 | 1. 2.2.1 41 | 2. 2.2.2 42 | 3. 2.2.3 43 | 4. 3 44 | 45 | * First Thing 46 | 47 | Second Thing 48 | * Nested First Thing 49 | 50 | Nested Thing 51 | 52 | 1. First Thing 53 | 54 | Second Thing 55 | 1. Nested First Thing 56 | 57 | Nested Thing 58 | 59 | 2. Date: 60 | 61 | 20.02.2021 62 | 63 | 26\. Mai - 3. Juni -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/turndown_demo/goldmark.golden: -------------------------------------------------------------------------------- 1 |

Turndown Demo

2 |

This demonstrates turndown – an HTML to Markdown converter in JavaScript.

3 |

Usage

4 |
var turndownService = new TurndownService()
 5 | console.log(
 6 |   turndownService.turndown('<h1>Hello world</h1>')
 7 | )
 8 | 
9 |
10 |

It aims to be CommonMark 11 | compliant, and includes options to style the output. These options include:

12 | 23 | -------------------------------------------------------------------------------- /plugin/strikethrough.go: -------------------------------------------------------------------------------- 1 | package plugin 2 | 3 | import ( 4 | "strings" 5 | 6 | md "github.com/JohannesKaufmann/html-to-markdown" 7 | "github.com/PuerkitoBio/goquery" 8 | ) 9 | 10 | // Strikethrough converts ``, ``, and `` elements 11 | func Strikethrough(character string) md.Plugin { 12 | return func(c *md.Converter) []md.Rule { 13 | if character == "" { 14 | character = "~~" 15 | } 16 | 17 | return []md.Rule{ 18 | { 19 | Filter: []string{"del", "s", "strike"}, 20 | Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 21 | // trim spaces so that the following does NOT happen: `~ and cake~` 22 | content = strings.TrimSpace(content) 23 | 24 | content = character + content + character 25 | 26 | // always have a space to the side to recognize the delimiter 27 | content = md.AddSpaceIfNessesary(selec, content) 28 | 29 | return &content 30 | }, 31 | }, 32 | } 33 | } 34 | } 35 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/keep_remove_tag/input.html: -------------------------------------------------------------------------------- 1 | 2 | 3 |

Content

4 | 5 | 6 | 7 |

Content

8 | 9 | 10 | 11 | 12 | 15 | 16 | 17 | 18 | 21 | 22 | 23 | 24 | 25 | 28 | 29 | 30 | 31 | 35 | 36 | 37 | 38 | 41 | 42 | 43 | 44 | 45 | 46 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/list/output.dash.golden: -------------------------------------------------------------------------------- 1 | - Some Thing 2 | - Another Thing 3 | 4 | 1. First Thing 5 | 2. Second Thing 6 | 7 | 01. 1 8 | 02. 2 9 | 03. 3 10 | 04. 4 11 | 05. 5 12 | 06. 6 13 | 07. 7 14 | 08. 8 15 | 09. 9 16 | 10. 10 17 | 11. 11 18 | 12. 12 19 | 13. 13 20 | 14. 14 21 | 15. 15 22 | 19. 16 23 | 20. 17 24 | 21. 18 25 | 22. 19 26 | 23. 20 27 | 24. ![](http://example.com/example.png) 28 | 25. 22 29 | 30 | - Link:[example](https://example.com) works 31 | - Link: 32 | [example](https://example.com) 33 | works 34 | 35 | 36 | 1. First Thing 37 | - Some Thing 38 | - Another Thing 39 | 2. Second Thing 40 | 41 | - foo 42 | - bar 43 | 44 | - List items 45 | - Ending with 46 | - A space 47 | 48 | - Indent First Thing 49 | 50 | Second Thing 51 | 52 | - Third Thing 53 | 54 | \- Not List 55 | 56 | 1\. Not List 1. Not List 57 | 1\. Not List 58 | 59 | 1. A paragraph 60 | with two lines. 61 | 62 | 63 | ``` 64 | indented code 65 | ``` 66 | 67 | 68 | 69 | > A block quote. -------------------------------------------------------------------------------- /testdata/TestCommonmark/list/output.plus.golden: -------------------------------------------------------------------------------- 1 | + Some Thing 2 | + Another Thing 3 | 4 | 1. First Thing 5 | 2. Second Thing 6 | 7 | 01. 1 8 | 02. 2 9 | 03. 3 10 | 04. 4 11 | 05. 5 12 | 06. 6 13 | 07. 7 14 | 08. 8 15 | 09. 9 16 | 10. 10 17 | 11. 11 18 | 12. 12 19 | 13. 13 20 | 14. 14 21 | 15. 15 22 | 19. 16 23 | 20. 17 24 | 21. 18 25 | 22. 19 26 | 23. 20 27 | 24. ![](http://example.com/example.png) 28 | 25. 22 29 | 30 | + Link:[example](https://example.com) works 31 | + Link: 32 | [example](https://example.com) 33 | works 34 | 35 | 36 | 1. First Thing 37 | + Some Thing 38 | + Another Thing 39 | 2. Second Thing 40 | 41 | + foo 42 | + bar 43 | 44 | + List items 45 | + Ending with 46 | + A space 47 | 48 | + Indent First Thing 49 | 50 | Second Thing 51 | 52 | + Third Thing 53 | 54 | \- Not List 55 | 56 | 1\. Not List 1. Not List 57 | 1\. Not List 58 | 59 | 1. A paragraph 60 | with two lines. 61 | 62 | 63 | ``` 64 | indented code 65 | ``` 66 | 67 | 68 | 69 | > A block quote. -------------------------------------------------------------------------------- /testdata/TestCommonmark/list/output.asterisks.golden: -------------------------------------------------------------------------------- 1 | * Some Thing 2 | * Another Thing 3 | 4 | 1. First Thing 5 | 2. Second Thing 6 | 7 | 01. 1 8 | 02. 2 9 | 03. 3 10 | 04. 4 11 | 05. 5 12 | 06. 6 13 | 07. 7 14 | 08. 8 15 | 09. 9 16 | 10. 10 17 | 11. 11 18 | 12. 12 19 | 13. 13 20 | 14. 14 21 | 15. 15 22 | 19. 16 23 | 20. 17 24 | 21. 18 25 | 22. 19 26 | 23. 20 27 | 24. ![](http://example.com/example.png) 28 | 25. 22 29 | 30 | * Link:[example](https://example.com) works 31 | * Link: 32 | [example](https://example.com) 33 | works 34 | 35 | 36 | 1. First Thing 37 | * Some Thing 38 | * Another Thing 39 | 2. Second Thing 40 | 41 | * foo 42 | * bar 43 | 44 | * List items 45 | * Ending with 46 | * A space 47 | 48 | * Indent First Thing 49 | 50 | Second Thing 51 | 52 | * Third Thing 53 | 54 | \- Not List 55 | 56 | 1\. Not List 1. Not List 57 | 1\. Not List 58 | 59 | 1. A paragraph 60 | with two lines. 61 | 62 | 63 | ``` 64 | indented code 65 | ``` 66 | 67 | 68 | 69 | > A block quote. -------------------------------------------------------------------------------- /testdata/TestCommonmark/p_tag/input.html: -------------------------------------------------------------------------------- 1 | 2 | 3 |

Some Content

4 | 5 | 6 | 7 |
8 |

Text

9 |

Some Text

10 |
11 | 12 | 13 | 14 |

Some Content

15 | 16 | 17 | 18 | 19 |

jmap –histo[:live]

20 | 21 | 22 | 23 |

24 | Sometimes a struct field, function, type, or even a whole package becomes 25 | 26 | 27 | redundant or unnecessary, but must be kept for compatibility with existing 28 | 29 | 30 | programs. 31 | 32 | 33 | To signal that an identifier should not be used, add a paragraph to its doc 34 | 35 | 36 | comment that begins with "Deprecated:" followed by some information about the 37 | 38 | 39 | deprecation. 40 | 41 | 42 | There are a few examples in the standard library. 43 |

-------------------------------------------------------------------------------- /testdata/TestCommonmark/blockquote/input.html: -------------------------------------------------------------------------------- 1 | 2 |
3 | Some Quote 4 | Next Line 5 |
6 | 7 | 8 |
9 | 10 | 11 | 12 |
13 |

Allowing an unimportant mistake to pass without comment is a wonderful social grace.

14 |

Ideological differences are no excuse for rudeness.

15 |
16 | 17 | 18 | 19 |
20 |

This is the first level of quoting.

21 |
22 |

This is a paragraph in a nested blockquote.

23 |
24 |

Back to the first level.

25 |
26 | 27 | 28 | 29 |
30 |

This is a header.

31 |
    32 |
  1. This is the first list item.
  2. 33 |
  3. This is the second list item.
  4. 34 |
35 |

A code block:

36 |
return 1 < 2 ? shell_exec('echo $input | $markdown_script') : 0;
37 |
-------------------------------------------------------------------------------- /plugin/youtube.go: -------------------------------------------------------------------------------- 1 | package plugin 2 | 3 | import ( 4 | "fmt" 5 | "regexp" 6 | "strings" 7 | 8 | md "github.com/JohannesKaufmann/html-to-markdown" 9 | "github.com/PuerkitoBio/goquery" 10 | ) 11 | 12 | var youtubeID = regexp.MustCompile(`youtube\.com\/embed\/([^\&\?\/]+)`) 13 | 14 | // EXPERIMENTALYoutubeEmbed registers a rule (for iframes) and 15 | // returns a markdown compatible representation (link to video, ...). 16 | var EXPERIMENTALYoutubeEmbed = []md.Rule{ 17 | { 18 | Filter: []string{"iframe"}, 19 | Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 20 | src := selec.AttrOr("src", "") 21 | if !strings.Contains(src, "www.youtube.com") { 22 | return nil 23 | } 24 | alt := selec.AttrOr("title", "") 25 | 26 | parts := youtubeID.FindStringSubmatch(src) 27 | if len(parts) != 2 { 28 | return nil 29 | } 30 | id := parts[1] 31 | 32 | text := fmt.Sprintf("[![%s](https://img.youtube.com/vi/%s/0.jpg)](https://www.youtube.com/watch?v=%s)", alt, id, id) 33 | return &text 34 | }, 35 | }, 36 | } 37 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/turndown_demo/input.html: -------------------------------------------------------------------------------- 1 | 2 |

Turndown Demo

3 | 4 |

This demonstrates turndown – an HTML to Markdown converter in JavaScript.

5 | 6 |

Usage

7 | 8 |
var turndownService = new TurndownService()
 9 | console.log(
10 |   turndownService.turndown('<h1>Hello world</h1>')
11 | )
12 | 13 |
14 | 15 |

It aims to be CommonMark 16 | compliant, and includes options to style the output. These options include:

17 | 18 |
    19 |
  • headingStyle (setext or atx)
  • 20 |
  • horizontalRule (*, -, or _)
  • 21 |
  • bullet (*, -, or +)
  • 22 |
  • codeBlockStyle (indented or fenced)
  • 23 |
  • fence
  • 24 |
  • emDelimiter (_ or *)
  • 25 |
  • strongDelimiter (** or __)
  • 26 |
  • linkStyle (inlined or referenced)
  • 27 |
  • linkReferenceStyle (full, collapsed, or shortcut)
  • 28 |
29 | -------------------------------------------------------------------------------- /.github/workflows/go.yml: -------------------------------------------------------------------------------- 1 | name: Go 2 | 3 | on: 4 | push: 5 | branches: [ master ] 6 | pull_request: 7 | branches: [ master ] 8 | 9 | jobs: 10 | 11 | build: 12 | name: Build 13 | runs-on: ubuntu-latest 14 | steps: 15 | 16 | - name: Set up Go 1.x 17 | uses: actions/setup-go@v2 18 | with: 19 | go-version: ^1.13 20 | id: go 21 | 22 | - name: Check out code into the Go module directory 23 | uses: actions/checkout@v2 24 | 25 | - name: Get dependencies 26 | run: | 27 | go get -v -t -d ./... 28 | if [ -f Gopkg.toml ]; then 29 | curl https://raw.githubusercontent.com/golang/dep/master/install.sh | sh 30 | dep ensure 31 | fi 32 | 33 | - name: Build 34 | run: go build -v . 35 | 36 | - name: Test 37 | run: go test -v -race -coverprofile=coverage.txt -covermode=atomic . 38 | 39 | - name: Upload Coverage report to CodeCov 40 | uses: codecov/codecov-action@v1.0.0 41 | with: 42 | token: ${{secrets.CODECOV_TOKEN}} 43 | file: ./coverage.txt 44 | -------------------------------------------------------------------------------- /testdata/TestPlugins/table/output.table.golden: -------------------------------------------------------------------------------- 1 | | Firstname | Lastname | Age | 2 | | --- | --- | --- | 3 | | Jill | Smith | 50 | 4 | | Eve | Jackson | 94 | 5 | | Empty | | | 6 | | End | 7 | 8 | ### With \| Character 9 | 10 | | Firstname | With \| Character | Age | 11 | | --- | --- | --- | 12 | | Jill | Smith | 50 | 13 | | Eve | Jackson | 94 | 14 | 15 | ### Tabelle mit thead, tfoot, and tbody 16 | 17 | | Header content 1 | Header content 2 | 18 | | --- | --- | 19 | | Footer content 1 | Footer content 2 | 20 | | Body content 1 | Body content 2 | 21 | 22 | | | 23 | | --- | 24 | | Unglaublich tolle Daten | 25 | 26 | Unglaublich tolle Beschreibung 27 | 28 | | | | | | 29 | | --- | --- | --- | --- | 30 | | A | B | 31 | | A | B | C | D | 32 | | A | B | C | 33 | 34 | #### Pegel DUISBURG-RUHRORT 35 | 36 | Quelle: … 37 | 38 | | | | | 39 | | --- | --- | --- | 40 | | **Strong** | [Link](http://example.com/link.html) | _Italic_ | 41 | | `var` | b | c | 42 | 43 | | | 44 | | --- | 45 | | 1
2
3 | 46 | 47 | | | 48 | | --- | 49 | | | | 50 | | --- | 51 | | 1
2
3 | | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Johannes Kaufmann 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /markdown_test.go: -------------------------------------------------------------------------------- 1 | package md 2 | 3 | import "testing" 4 | 5 | func TestDefaultGetAbsoluteURL_NoDomain(t *testing.T) { 6 | input := "/page.html?key=val#hash" 7 | expected := input 8 | 9 | res := DefaultGetAbsoluteURL(nil, input, "") 10 | if res != expected { 11 | t.Errorf("expected '%s' but got '%s'", expected, res) 12 | } 13 | } 14 | 15 | func TestDefaultGetAbsoluteURL_Domain(t *testing.T) { 16 | input := "/page.html?key=val#hash" 17 | expected := "http://test.com" + input 18 | 19 | res := DefaultGetAbsoluteURL(nil, input, "test.com") 20 | if res != expected { 21 | t.Errorf("expected '%s' but got '%s'", expected, res) 22 | } 23 | } 24 | 25 | func TestDefaultGetAbsoluteURL_DataURI(t *testing.T) { 26 | input := "" 27 | expected := input 28 | 29 | res := DefaultGetAbsoluteURL(nil, input, "test.com") 30 | if res != expected { 31 | t.Errorf("expected '%s' but got '%s'", expected, res) 32 | } 33 | } 34 | -------------------------------------------------------------------------------- /examples/custom_tag/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "log" 6 | "strings" 7 | 8 | md "github.com/JohannesKaufmann/html-to-markdown" 9 | "github.com/PuerkitoBio/goquery" 10 | ) 11 | 12 | func main() { 13 | html := `https://youtu.be/1SoMeViD 14 | https://youtu.be/2SoMeViD 15 | https://youtu.be/3SoMeViDhttps://youtu.be/4SoMeViD 16 | 17 | https://youtu.be/5SoMeViD 18 | ` 19 | 20 | videoRule := md.Rule{ 21 | // We want to add a rule for a `my_video` tag. 22 | Filter: []string{"my_video"}, 23 | Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 24 | text := "click to watch video" 25 | 26 | // in this case, the content inside the tag is the url 27 | href := strings.TrimSpace(content) 28 | 29 | // format it, so that its `[click to watch video](https://youtu.be/1SoMeViD)\n\n` 30 | md := fmt.Sprintf("[%s](%s)\n\n", text, href) 31 | return &md 32 | }, 33 | } 34 | 35 | conv := md.NewConverter("", true, nil) 36 | conv.AddRules(videoRule) 37 | // -> add 1+ rules to the converter. the last added will be used first. 38 | 39 | markdown, err := conv.ConvertString(html) 40 | if err != nil { 41 | log.Fatal(err) 42 | } 43 | 44 | fmt.Printf("\n\nresult:'%s'\n", markdown) 45 | } 46 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/link/output.relative.golden: -------------------------------------------------------------------------------- 1 | [Simple Absolute Link](http://simple.org/) 2 | 3 | [Simple Relative Link](/page.html) 4 | 5 | [Link with Space](http://space.org/) 6 | 7 | [Link with Title](http://title.org/ "Some Title") 8 | 9 | [Link with multiline Title](http://title.org/ "Some Long Title") 10 | 11 | [Broken Link](http://broken.com/test.html\r) 12 | 13 | [First Text\ 14 | ![](xxx)\ 15 | Second Text](http://multi.org/) 16 | 17 | - [First Text\ 18 | \ 19 | Second Text](http://list.org/) 20 | 21 | [GitHub](https://github.com "GitHub") 22 | 23 | [first top](http://first_under.com) [second below](http://second_under.com) 24 | 25 | [first left](http://first_next.com) [second right](http://second_next.com) 26 | 27 | Before [close](http://example_close.com) After 28 | 29 | [**Heading A** **Heading B**](/page.html) 30 | 31 | [DIW-Chef zum Grünen-Programm "Vermögenssteuer ist aus wirtschaftlicher Sicht klug"](/page.html "\"Vermögenssteuer ist aus wirtschaftlicher Sicht klug\"") 32 | 33 | [**Die App WDR aktuell begleitet Sie durch den Tag**\ 34 | \ 35 | Sie möchten eine App, die Sie so durch den Tag in NRW begleitet, dass Sie jederzeit mitreden können? Die App WDR aktuell bietet Ihnen dafür immer die passenden Nachrichten.\ 36 |  \| \ 37 | **mehr**](/nachrichten/wdr-aktuell-app-stores-100.html "Die App WDR aktuell begleitet Sie durch den Tag ") -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/text_with_whitespace/input.html: -------------------------------------------------------------------------------- 1 | 2 |
3 |

Aktuelles

4 | 5 | 6 |
25 Mai
7 | 8 |

9. Bonner Nachtlauf - Einschränkungen am Bootshaus

9 | 10 | 11 | am Mittwoch, dem 30. Mai 2018 findet am Bonner Rheinufer der 9. ... 12 | More 13 | 14 | 15 | 16 |
17 | 18 |
19 | 20 |
21 |

Aktuelles

22 |

Title

23 | 24 | 25 | Fusce dapibus, tellus ac cursus commodo, tortor mauris condimentum nibh, ut fermentum massa justo sit amet risus. Vestibulum id ligula porta felis euismod semper. 26 | More 27 | 28 |
29 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/image/input.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | website favicon 4 |
5 | 6 | 7 | 8 | alt "attribute" 9 |
10 | 11 | 12 | 13 | alt
14 | 
15 | description 16 |
17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 30 |
31 | 32 | 33 | 34 | 37 |
38 | 39 | 40 | 41 | star 42 |
43 | 44 | 45 | 46 | 47 | website favicon 48 | 49 | 50 | 51 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/link/output.inlined.golden: -------------------------------------------------------------------------------- 1 | [Simple Absolute Link](http://simple.org/) 2 | 3 | [Simple Relative Link](http://example.com/page.html) 4 | 5 | [Link with Space](http://space.org/) 6 | 7 | [Link with Title](http://title.org/ "Some Title") 8 | 9 | [Link with multiline Title](http://title.org/ "Some Long Title") 10 | 11 | [Broken Link](http://broken.com/test.html%5Cr) 12 | 13 | [First Text\ 14 | ![](http://example.com/xxx)\ 15 | Second Text](http://multi.org/) 16 | 17 | - [First Text\ 18 | \ 19 | Second Text](http://list.org/) 20 | 21 | [GitHub](https://github.com "GitHub") 22 | 23 | [first top](http://first_under.com) [second below](http://second_under.com) 24 | 25 | [first left](http://first_next.com) [second right](http://second_next.com) 26 | 27 | Before [close](http://example_close.com) After 28 | 29 | [**Heading A** **Heading B**](http://example.com/page.html) 30 | 31 | [DIW-Chef zum Grünen-Programm "Vermögenssteuer ist aus wirtschaftlicher Sicht klug"](http://example.com/page.html "\"Vermögenssteuer ist aus wirtschaftlicher Sicht klug\"") 32 | 33 | [**Die App WDR aktuell begleitet Sie durch den Tag**\ 34 | \ 35 | Sie möchten eine App, die Sie so durch den Tag in NRW begleitet, dass Sie jederzeit mitreden können? Die App WDR aktuell bietet Ihnen dafür immer die passenden Nachrichten.\ 36 |  \| \ 37 | **mehr**](http://example.com/nachrichten/wdr-aktuell-app-stores-100.html "Die App WDR aktuell begleitet Sie durch den Tag ") -------------------------------------------------------------------------------- /testdata/TestCommonmark/heading/input.html: -------------------------------------------------------------------------------- 1 | 2 |

Heading 1

3 | 4 |

Heading 2

5 | 6 |

Heading 3

7 | 8 |

Heading 4

9 | 10 |
Heading 5
11 | 12 |
Heading 6
13 | 14 | Heading 7 15 | 16 | 17 | 18 |

Heading with Whitespace

19 | 20 | 21 | 29 |

30 | 31 | Header 32 | Containing 33 | 34 | Newlines 35 | 36 |

37 | 38 | 39 | 40 |

Heading

One

41 | 42 | 43 | 44 | 45 |

Heading 2

46 |
47 | 48 | 49 | 50 |

#hashtag

51 | 52 |

53 | not title 54 | ------ 55 |

56 | 57 | 58 |

59 | not title 60 | - 61 |

62 | 63 | 64 |

not title

65 |

-

66 | 67 |

68 | not title 69 | = 70 |

71 | 72 | 73 |

74 | not title 75 | \-\-\- 76 |

-------------------------------------------------------------------------------- /testdata/TestCommonmark/pre_code/output.indented.golden: -------------------------------------------------------------------------------- 1 | `last_30_days` 2 | 3 | ``with backtick (`)`` 4 | 5 | ```with backtick (``)``` 6 | 7 | `````here are three ``` here are four ```` here is one ` that's it````` 8 | 9 | `` `starting & ending with a backtick` `` 10 | 11 | Who ate the most donuts this week? 12 | 13 | ```foo+bar 14 | Jeff 15 15 | Sam 11 16 | Robin 6 17 | ``` 18 | 19 | ``` 20 | // Fprint formats using the default formats for its operands and writes to w. 21 | // Spaces are added between operands when neither is a string. 22 | // It returns the number of bytes written and any write error encountered. 23 | func Fprint(w io.Writer, a ...interface{}) (n int, err error) { 24 | ``` 25 | 26 | When `x = 3`, that means `x + 2 = 5` 27 | 28 | The `` tag is used to embed an image. 29 | 30 | The `` tag is used to embed an image. 31 | 32 | Two variables `A` `B` 33 | 34 | CSS: ` 35 | body { 36 | color: yellow; 37 | font-size: 16px; 38 | } 39 | ` 40 | 41 | CSS: ` 42 | body { 43 | color: yellow; 44 | font-size: 16px; 45 | } 46 | ` 47 | 48 | ```` 49 | ``` 50 | ```` 51 | 52 | ``` 53 | ~~~ 54 | ``` 55 | 56 | ``` 57 | 58 | Some ~~~ 59 | totally ~~~~~~ normal 60 | ~ code 61 | 62 | ``` 63 | 64 | ``` 65 | 66 | The tag is used to embed an image. 67 | 68 | The tag is used to embed an image. 69 | 70 | ``` 71 | 72 | ``` 73 | 74 | 75 | 76 | 77 | 78 | ``` -------------------------------------------------------------------------------- /testdata/TestCommonmark/pre_code/output.fenced_backtick.golden: -------------------------------------------------------------------------------- 1 | `last_30_days` 2 | 3 | ``with backtick (`)`` 4 | 5 | ```with backtick (``)``` 6 | 7 | `````here are three ``` here are four ```` here is one ` that's it````` 8 | 9 | `` `starting & ending with a backtick` `` 10 | 11 | Who ate the most donuts this week? 12 | 13 | ```foo+bar 14 | Jeff 15 15 | Sam 11 16 | Robin 6 17 | ``` 18 | 19 | ``` 20 | // Fprint formats using the default formats for its operands and writes to w. 21 | // Spaces are added between operands when neither is a string. 22 | // It returns the number of bytes written and any write error encountered. 23 | func Fprint(w io.Writer, a ...interface{}) (n int, err error) { 24 | ``` 25 | 26 | When `x = 3`, that means `x + 2 = 5` 27 | 28 | The `` tag is used to embed an image. 29 | 30 | The `` tag is used to embed an image. 31 | 32 | Two variables `A` `B` 33 | 34 | CSS: ` 35 | body { 36 | color: yellow; 37 | font-size: 16px; 38 | } 39 | ` 40 | 41 | CSS: ` 42 | body { 43 | color: yellow; 44 | font-size: 16px; 45 | } 46 | ` 47 | 48 | ```` 49 | ``` 50 | ```` 51 | 52 | ``` 53 | ~~~ 54 | ``` 55 | 56 | ``` 57 | 58 | Some ~~~ 59 | totally ~~~~~~ normal 60 | ~ code 61 | 62 | ``` 63 | 64 | ``` 65 | 66 | The tag is used to embed an image. 67 | 68 | The tag is used to embed an image. 69 | 70 | ``` 71 | 72 | ``` 73 | 74 | 75 | 76 | 77 | 78 | ``` -------------------------------------------------------------------------------- /testdata/TestCommonmark/pre_code/output.fenced_tilde.golden: -------------------------------------------------------------------------------- 1 | `last_30_days` 2 | 3 | ``with backtick (`)`` 4 | 5 | ```with backtick (``)``` 6 | 7 | `````here are three ``` here are four ```` here is one ` that's it````` 8 | 9 | `` `starting & ending with a backtick` `` 10 | 11 | Who ate the most donuts this week? 12 | 13 | ~~~foo+bar 14 | Jeff 15 15 | Sam 11 16 | Robin 6 17 | ~~~ 18 | 19 | ~~~ 20 | // Fprint formats using the default formats for its operands and writes to w. 21 | // Spaces are added between operands when neither is a string. 22 | // It returns the number of bytes written and any write error encountered. 23 | func Fprint(w io.Writer, a ...interface{}) (n int, err error) { 24 | ~~~ 25 | 26 | When `x = 3`, that means `x + 2 = 5` 27 | 28 | The `` tag is used to embed an image. 29 | 30 | The `` tag is used to embed an image. 31 | 32 | Two variables `A` `B` 33 | 34 | CSS: ` 35 | body { 36 | color: yellow; 37 | font-size: 16px; 38 | } 39 | ` 40 | 41 | CSS: ` 42 | body { 43 | color: yellow; 44 | font-size: 16px; 45 | } 46 | ` 47 | 48 | ~~~ 49 | ``` 50 | ~~~ 51 | 52 | ~~~~ 53 | ~~~ 54 | ~~~~ 55 | 56 | ~~~~~~~ 57 | 58 | Some ~~~ 59 | totally ~~~~~~ normal 60 | ~ code 61 | 62 | ~~~~~~~ 63 | 64 | ~~~ 65 | 66 | The tag is used to embed an image. 67 | 68 | The tag is used to embed an image. 69 | 70 | ~~~ 71 | 72 | ~~~ 73 | 74 | 75 | 76 | 77 | 78 | ~~~ -------------------------------------------------------------------------------- /testdata/TestRealWorld/golang.org/output.default.golden: -------------------------------------------------------------------------------- 1 | The Go Programming Language 2 | 3 | ... 4 | 5 | [The Go Programming Language](http://golang.org/) 6 | 7 | [Go](http://golang.org/) 8 | 9 | ▽ 10 | 11 | [Documents](http://golang.org/doc/) [Packages](http://golang.org/pkg/) [The Project](http://golang.org/project/) [Help](http://golang.org/help/) [Blog](http://golang.org/blog/) [Play](http://play.golang.org/ "Show Go Playground") submit search 12 | 13 | RunFormatShare 14 | 15 | Pop-out 16 | 17 | Try Go 18 | 19 | ``` 20 | Hello, 世界 21 | 22 | ``` 23 | 24 | RunShare [Tour](http://tour.golang.org/ "Learn Go from your browser") 25 | 26 | Hello, World!Conway's Game of LifeFibonacci ClosurePeano IntegersConcurrent piConcurrent Prime SievePeg Solitaire SolverTree Comparison 27 | 28 | Go is an open source programming language that makes it easy to build 29 | simple, reliable, and efficient software. 30 | 31 | [Download Go\ 32 | Binary distributions available for\ 33 | \ 34 | Linux, Mac OS X, Windows, and more.](http://golang.org/dl/) 35 | 36 | Featured video 37 | 38 | Featured articles 39 | 40 | [Read more](http://blog.golang.org/) 41 | 42 | Build version go1.10.2. 43 | 44 | Except as [noted](https://developers.google.com/site-policies#restrictions), 45 | the content of this page is licensed under the 46 | Creative Commons Attribution 3.0 License, 47 | and code is licensed under a [BSD license](http://golang.org/LICENSE). 48 | 49 | [Terms of Service](http://golang.org/doc/tos.html) \| 50 | [Privacy Policy](http://www.google.com/intl/en/policies/privacy/) -------------------------------------------------------------------------------- /plugin/confluence_code_block.go: -------------------------------------------------------------------------------- 1 | package plugin 2 | 3 | import ( 4 | "fmt" 5 | "strings" 6 | 7 | md "github.com/JohannesKaufmann/html-to-markdown" 8 | "github.com/PuerkitoBio/goquery" 9 | ) 10 | 11 | // ConfluenceCodeBlock converts `` elements that are used in Atlassian’s Wiki “Confluence”. 12 | // [Contributed by @Skarlso] 13 | func ConfluenceCodeBlock() md.Plugin { 14 | return func(c *md.Converter) []md.Rule { 15 | character := "```" 16 | return []md.Rule{ 17 | { 18 | Filter: []string{"ac:structured-macro"}, 19 | Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 20 | for _, node := range selec.Nodes { 21 | if node.Data == "ac:structured-macro" { 22 | // node's last child -> . We don't want to filter on that 23 | // because we would end up with structured-macro around us. 24 | // ac:plain-text-body's last child is [CDATA which has the actual content we are looking for. 25 | data := strings.TrimPrefix(node.LastChild.LastChild.Data, "[CDATA[") 26 | data = strings.TrimSuffix(data, "]]") 27 | // content, if set, will contain the language that has been set in the field. 28 | var language string 29 | if content != "" { 30 | language = content 31 | } 32 | formatted := fmt.Sprintf("%s%s\n%s\n%s", character, language, data, character) 33 | return md.String(formatted) 34 | } 35 | } 36 | return md.String(character + content + character) 37 | }, 38 | }, 39 | } 40 | } 41 | } 42 | -------------------------------------------------------------------------------- /examples/add_rules/main.go: -------------------------------------------------------------------------------- 1 | package main 2 | 3 | import ( 4 | "fmt" 5 | "log" 6 | "strings" 7 | 8 | md "github.com/JohannesKaufmann/html-to-markdown" 9 | "github.com/PuerkitoBio/goquery" 10 | ) 11 | 12 | func main() { 13 | html := `Good soundtrack and cake.` 14 | // -> `Good soundtrack ~~and cake~~.` 15 | 16 | /* 17 | We want to add a rule when a `span` tag has a class of `bb_strike`. 18 | Have a look at `plugin/strikethrough.go` to see how it is implemented normally. 19 | */ 20 | strikethrough := md.Rule{ 21 | Filter: []string{"span"}, 22 | Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 23 | // If the span element has not the classname `bb_strike` return nil. 24 | // That way the next rules will apply. In this case the commonmark rules. 25 | // -> return nil -> next rule applies 26 | if !selec.HasClass("bb_strike") { 27 | return nil 28 | } 29 | 30 | // Trim spaces so that the following does NOT happen: `~ and cake~`. 31 | // Because of the space it is not recognized as strikethrough. 32 | // -> trim spaces at begin&end of string when inside strong/italic/... 33 | content = strings.TrimSpace(content) 34 | return md.String("~~" + content + "~~") 35 | }, 36 | } 37 | 38 | conv := md.NewConverter("", true, nil) 39 | conv.AddRules(strikethrough) 40 | // -> add 1+ rules to the converter. the last added will be used first. 41 | 42 | markdown, err := conv.ConvertString(html) 43 | if err != nil { 44 | log.Fatal(err) 45 | } 46 | 47 | fmt.Printf("\n\nmarkdown:'%s'\n", markdown) 48 | } 49 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/nav_nested_list/output.default.golden: -------------------------------------------------------------------------------- 1 | - [Startseite](http://example.com/ "Startseite") 2 | - [Die Gruppe](http://example.com/die-gruppe/unsere-unternehmen/ "Die Gruppe") - [Unsere Unternehmen](http://example.com/die-gruppe/unsere-unternehmen/ "Unsere Unternehmen") 3 | - [Unternehmenshistorie](http://example.com/die-gruppe/unternehmenshistorie/ "Unternehmenshistorie") 4 | - [Standortportraits](http://example.com/die-gruppe/standortportraits/ "Standortportraits") 5 | - [Unsere Marken](http://example.com/die-gruppe/unsere-marken/ "Unsere Marken") 6 | - [Kontakt](http://example.com/die-gruppe/kontakt/ "Kontakt") 7 | - [Medien](http://example.com/medien/aktuelle-meldungen/ "Medien") - [Aktuelle Meldungen](http://example.com/medien/aktuelle-meldungen/ "Aktuelle Meldungen") 8 | - [Pressearchiv](http://example.com/medien/pressearchiv/ "Pressearchiv") 9 | - [Pressekontakt](http://example.com/medien/pressekontakt/ "Pressekontakt") 10 | - [Einblicke](http://example.com/medien/einblicke/ "Einblicke") 11 | - [Karriere](http://example.com/karriere/ "Karriere") - [Video-Einblicke](http://example.com/karriere/video-einblicke/ "Video-Einblicke") 12 | - [Stellenangebote](https://career5.successfactors.eu/career?company=mllerservi&site=VjItNGdGZlNGSEJEYTVJSVRUaXp4N1E4Zz09 "Stellenangebote") 13 | - [Erlebnisberichte](http://example.com/karriere/erlebnisberichte/ "Erlebnisberichte") 14 | - [Traineeprogramm](http://example.com/karriere/traineeprogramm/ "Traineeprogramm") 15 | - [Termine](http://example.com/karriere/termine/ "Termine") 16 | - [News](http://example.com/karriere/news/ "News") -------------------------------------------------------------------------------- /plugin/frontmatter.go: -------------------------------------------------------------------------------- 1 | package plugin 2 | 3 | import ( 4 | "fmt" 5 | 6 | md "github.com/JohannesKaufmann/html-to-markdown" 7 | yaml "gopkg.in/yaml.v2" 8 | ) 9 | 10 | // type frontMatterCallback func(selec *goquery.Selection) map[string]interface{} 11 | // TODO: automatically convert to formats (look at hugo) 12 | 13 | // EXPERIMENTALFrontMatter was an experiment to add certain data 14 | // from a callback function into the beginning of the file as frontmatter. 15 | // It not really working right now. 16 | // 17 | // If someone has a need for it, let me know what your use-case is. Then 18 | // I can create a plugin with a good interface. 19 | func EXPERIMENTALFrontMatter(format string) md.Plugin { 20 | return func(c *md.Converter) []md.Rule { 21 | data := make(map[string]interface{}) 22 | 23 | d, err := yaml.Marshal(data) 24 | if err != nil { 25 | panic(err) 26 | } 27 | 28 | fmt.Println(string(d)) 29 | /* 30 | add rule for `head` 31 | - get title 32 | - return AdvancedResult{ Header: formated_yaml }, skip 33 | -> added to head 34 | -> others rules can apply 35 | 36 | */ 37 | 38 | title := "" // c.Find("head title").Text() 39 | 40 | var text string 41 | switch format { 42 | case "toml": // +++ 43 | text = fmt.Sprintf(` 44 | +++ 45 | title = "%s" 46 | +++ 47 | `, title) 48 | case "yaml": // --- 49 | text = fmt.Sprintf(` 50 | --- 51 | title: %s 52 | --- 53 | `, title) 54 | case "json": // { } 55 | text = fmt.Sprintf(` 56 | { 57 | "title": "%s" 58 | } 59 | `, title) 60 | default: 61 | panic("unknown format") 62 | } 63 | 64 | _ = text 65 | // c.AddLeading(text) 66 | return nil 67 | } 68 | } 69 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/link/output.referenced_full.golden: -------------------------------------------------------------------------------- 1 | [Simple Absolute Link][1] 2 | 3 | [Simple Relative Link][2] 4 | 5 | [Link with Space][3] 6 | 7 | [Link with Title][4] 8 | 9 | [Link with multiline Title][5] 10 | 11 | [Broken Link][6] 12 | 13 | [First Text\ 14 | ![](http://example.com/xxx)\ 15 | Second Text][7] 16 | 17 | - [First Text\ 18 | \ 19 | Second Text][8] 20 | 21 | [GitHub][9] 22 | 23 | [first top][10] [second below][11] 24 | 25 | [first left][12] [second right][13] 26 | 27 | Before [close][14] After 28 | 29 | [**Heading A** **Heading B**][15] 30 | 31 | [DIW-Chef zum Grünen-Programm "Vermögenssteuer ist aus wirtschaftlicher Sicht klug"][16] 32 | 33 | [**Die App WDR aktuell begleitet Sie durch den Tag**\ 34 | \ 35 | Sie möchten eine App, die Sie so durch den Tag in NRW begleitet, dass Sie jederzeit mitreden können? Die App WDR aktuell bietet Ihnen dafür immer die passenden Nachrichten.\ 36 |  \| \ 37 | **mehr**][17] 38 | 39 | [1]: http://simple.org/ 40 | [2]: http://example.com/page.html 41 | [3]: http://space.org/ 42 | [4]: http://title.org/ "Some Title" 43 | [5]: http://title.org/ "Some Long Title" 44 | [6]: http://broken.com/test.html%5Cr 45 | [7]: http://multi.org/ 46 | [8]: http://list.org/ 47 | [9]: https://github.com "GitHub" 48 | [10]: http://first_under.com 49 | [11]: http://second_under.com 50 | [12]: http://first_next.com 51 | [13]: http://second_next.com 52 | [14]: http://example_close.com 53 | [15]: http://example.com/page.html 54 | [16]: http://example.com/page.html "\"Vermögenssteuer ist aus wirtschaftlicher Sicht klug\"" 55 | [17]: http://example.com/nachrichten/wdr-aktuell-app-stores-100.html "Die App WDR aktuell begleitet Sie durch den Tag " -------------------------------------------------------------------------------- /testdata/TestCommonmark/pre_code/goldmark.golden: -------------------------------------------------------------------------------- 1 |

last_30_days

2 |

with backtick (`)

3 |

with backtick (``)

4 |

here are three ``` here are four ```` here is one ` that's it

5 |

`starting & ending with a backtick`

6 |

Who ate the most donuts this week?

7 |
Jeff  15
 8 | Sam   11
 9 | Robin  6
10 | 
11 |
// Fprint formats using the default formats for its operands and writes to w.
12 | // Spaces are added between operands when neither is a string.
13 | // It returns the number of bytes written and any write error encountered.
14 | func Fprint(w io.Writer, a ...interface{}) (n int, err error) {
15 | 
16 |

When x = 3, that means x + 2 = 5

17 |

The <img> tag is used to embed an image.

18 |

The <img/> tag is used to embed an image.

19 |

Two variables A B

20 |

CSS: body { color: yellow; font-size: 16px; }

21 |

CSS: body { color: yellow; font-size: 16px; }

22 |
```
23 | 
24 |
~~~
25 | 
26 |

27 | Some ~~~
28 | totally ~~~~~~ normal
29 | ~ code
30 | 
31 | 
32 |

33 | The <img> tag is used to embed an image.
34 | 
35 | The <img/> tag is used to embed an image.
36 | 
37 | 
38 |

39 | <a href="#Blabla" data-index="1">
40 |     <img src="http://bla.bla/img/img.svg" style="height:auto" width="200px"/>
41 | </a>
42 | 
43 | 
44 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/tweet/input.html: -------------------------------------------------------------------------------- 1 | 2 | 6 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/golang.org/goldmark.golden: -------------------------------------------------------------------------------- 1 |

The Go Programming Language

2 |

...

3 |

The Go Programming Language

4 |

Go

5 |

6 |

Documents Packages The Project Help Blog Play submit search

7 |

RunFormatShare

8 |

Pop-out

9 |

Try Go

10 |
Hello, 世界
11 | 
12 | 
13 |

RunShare Tour

14 |

Hello, World!Conway's Game of LifeFibonacci ClosurePeano IntegersConcurrent piConcurrent Prime SievePeg Solitaire SolverTree Comparison

15 |

Go is an open source programming language that makes it easy to build 16 | simple, reliable, and efficient software.

17 |

Download Go
18 | Binary distributions available for
19 |
20 | Linux, Mac OS X, Windows, and more.

21 |

Featured video

22 |

Featured articles

23 |

Read more

24 |

Build version go1.10.2.

25 |

Except as noted, 26 | the content of this page is licensed under the 27 | Creative Commons Attribution 3.0 License, 28 | and code is licensed under a BSD license.

29 |

Terms of Service | 30 | Privacy Policy

31 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/list_nested/goldmark.golden: -------------------------------------------------------------------------------- 1 |
    2 |
  • 3 |

    foo

    4 |
      5 |
    • bar 6 |
        7 |
      • baz 8 |
          9 |
        • boo
        • 10 |
        11 |
      • 12 |
      13 |
    • 14 |
    15 |
  • 16 |
  • 17 |

    Coffee

    18 |
  • 19 |
  • 20 |

    Tea

    21 |
      22 |
    • Black tea
    • 23 |
    • Green tea
    • 24 |
    25 |
  • 26 |
  • 27 |

    Milk

    28 |
  • 29 |
30 |

header1

31 |
    32 |
  • 33 |

    Bullet list

    34 |
      35 |
    • Nested bullet 36 |
        37 |
      • Sub-nested bullet etc
      • 38 |
      39 |
    • 40 |
    41 |
  • 42 |
  • 43 |

    Bullet list item 2

    44 |
  • 45 |
  • 46 |

    One

    47 |
      48 |
    • One point one
    • 49 |
    • One point two
    • 50 |
    51 |
  • 52 |
53 |
    54 |
  1. One 55 |
      56 |
    1. One point one
    2. 57 |
    3. One point two
    4. 58 |
    59 |
  2. 60 |
61 |
    62 |
  • 1
  • 63 |
  • 2 64 |
      65 |
    • 2.1
    • 66 |
    • 2.2 67 |
        68 |
      • 2.2.1
      • 69 |
      • 2.2.2
      • 70 |
      • 2.2.3
      • 71 |
      72 |
    • 73 |
    74 |
  • 75 |
  • 3
  • 76 |
77 |
    78 |
  1. 1
  2. 79 |
  3. 2 80 |
      81 |
    1. 2.1
    2. 82 |
    3. 2.2 83 |
        84 |
      1. 2.2.1
      2. 85 |
      3. 2.2.2
      4. 86 |
      5. 2.2.3
      6. 87 |
      88 |
    4. 89 |
    90 |
  4. 91 |
  5. 3
  6. 92 |
93 |
    94 |
  • 95 |

    First Thing

    96 |

    Second Thing

    97 |
      98 |
    • 99 |

      Nested First Thing

      100 |

      Nested Thing

      101 |
    • 102 |
    103 |
  • 104 |
105 |
    106 |
  1. 107 |

    First Thing

    108 |

    Second Thing

    109 |
      110 |
    1. 111 |

      Nested First Thing

      112 |

      Nested Thing

      113 |
    2. 114 |
    3. 115 |

      Date:

      116 |

      20.02.2021

      117 |

      26. Mai - 3. Juni

      118 |
    4. 119 |
    120 |
  2. 121 |
122 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/nav_nested_list/input.html: -------------------------------------------------------------------------------- 1 | 2 | 35 | 36 | -------------------------------------------------------------------------------- /plugin_test.go: -------------------------------------------------------------------------------- 1 | package md_test 2 | 3 | import ( 4 | "testing" 5 | 6 | md "github.com/JohannesKaufmann/html-to-markdown" 7 | "github.com/JohannesKaufmann/html-to-markdown/plugin" 8 | ) 9 | 10 | func TestPlugins(t *testing.T) { 11 | var tests = []GoldenTest{ 12 | { 13 | Name: "strikethrough", 14 | Variations: map[string]Variation{ 15 | "default": { 16 | Plugins: []md.Plugin{ 17 | plugin.Strikethrough(""), 18 | }, 19 | }, 20 | }, 21 | }, 22 | { 23 | Name: "checkbox", 24 | Variations: map[string]Variation{ 25 | "default": { 26 | Plugins: []md.Plugin{ 27 | plugin.TaskListItems(), 28 | }, 29 | }, 30 | }, 31 | }, 32 | { 33 | Name: "table", 34 | DisableGoldmark: true, 35 | Variations: map[string]Variation{ 36 | "default": {}, 37 | "table": { 38 | Plugins: []md.Plugin{ 39 | plugin.Table(), 40 | }, 41 | }, 42 | "tablecompat": { 43 | Plugins: []md.Plugin{ 44 | plugin.TableCompat(), 45 | }, 46 | }, 47 | }, 48 | }, 49 | { 50 | Name: "movefrontmatter/simple", 51 | Variations: map[string]Variation{ 52 | "default": { 53 | Plugins: []md.Plugin{ 54 | plugin.EXPERIMENTALMoveFrontMatter(), 55 | }, 56 | }, 57 | }, 58 | }, 59 | { 60 | Name: "movefrontmatter/not", 61 | Variations: map[string]Variation{ 62 | "default": { 63 | Plugins: []md.Plugin{ 64 | plugin.EXPERIMENTALMoveFrontMatter(), 65 | }, 66 | }, 67 | }, 68 | }, 69 | { 70 | Name: "movefrontmatter/jekyll", 71 | Variations: map[string]Variation{ 72 | "default": { 73 | Plugins: []md.Plugin{ 74 | plugin.EXPERIMENTALMoveFrontMatter('-', '+'), 75 | }, 76 | }, 77 | }, 78 | }, 79 | { 80 | Name: "movefrontmatter/blog", 81 | Variations: map[string]Variation{ 82 | "default": { 83 | Plugins: []md.Plugin{ 84 | plugin.EXPERIMENTALMoveFrontMatter(), 85 | }, 86 | }, 87 | }, 88 | }, 89 | } 90 | 91 | RunGoldenTest(t, tests) 92 | } 93 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/list/goldmark.golden: -------------------------------------------------------------------------------- 1 |
    2 |
  • Some Thing
  • 3 |
  • Another Thing
  • 4 |
5 |
    6 |
  1. 7 |

    First Thing

    8 |
  2. 9 |
  3. 10 |

    Second Thing

    11 |
  4. 12 |
  5. 13 |

    1

    14 |
  6. 15 |
  7. 16 |

    2

    17 |
  8. 18 |
  9. 19 |

    3

    20 |
  10. 21 |
  11. 22 |

    4

    23 |
  12. 24 |
  13. 25 |

    5

    26 |
  14. 27 |
  15. 28 |

    6

    29 |
  16. 30 |
  17. 31 |

    7

    32 |
  18. 33 |
  19. 34 |

    8

    35 |
  20. 36 |
  21. 37 |

    9

    38 |
  22. 39 |
  23. 40 |

    10

    41 |
  24. 42 |
  25. 43 |

    11

    44 |
  26. 45 |
  27. 46 |

    12

    47 |
  28. 48 |
  29. 49 |

    13

    50 |
  30. 51 |
  31. 52 |

    14

    53 |
  32. 54 |
  33. 55 |

    15

    56 |
  34. 57 |
  35. 58 |

    16

    59 |
  36. 60 |
  37. 61 |

    17

    62 |
  38. 63 |
  39. 64 |

    18

    65 |
  40. 66 |
  41. 67 |

    19

    68 |
  42. 69 |
  43. 70 |

    20

    71 |
  44. 72 |
  45. 73 |

    74 |
  46. 75 |
  47. 76 |

    22

    77 |
  48. 78 |
79 | 85 |
    86 |
  1. First Thing 87 |
      88 |
    • Some Thing
    • 89 |
    • Another Thing
    • 90 |
    91 |
  2. 92 |
  3. Second Thing
  4. 93 |
94 |
    95 |
  • 96 |

    foo

    97 |
  • 98 |
  • 99 |

    bar

    100 |
  • 101 |
  • 102 |

    List items

    103 |
  • 104 |
  • 105 |

    Ending with

    106 |
  • 107 |
  • 108 |

    A space

    109 |
  • 110 |
  • 111 |

    Indent First Thing

    112 |

    Second Thing

    113 |
  • 114 |
  • 115 |

    Third Thing

    116 |
  • 117 |
118 |

- Not List

119 |

1. Not List 1. Not List 120 | 1. Not List

121 |
    122 |
  1. 123 |

    A paragraph 124 | with two lines.

    125 |
    indented code
    126 | 
    127 |
    128 |

    A block quote.

    129 |
    130 |
  2. 131 |
132 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/nav_nested_list/goldmark.golden: -------------------------------------------------------------------------------- 1 | 28 | -------------------------------------------------------------------------------- /escape/escape.go: -------------------------------------------------------------------------------- 1 | // Package escape escapes characters that are commonly used in 2 | // markdown like the * for strong/italic. 3 | package escape 4 | 5 | import ( 6 | "regexp" 7 | "strings" 8 | ) 9 | 10 | var backslash = regexp.MustCompile(`\\(\S)`) 11 | var heading = regexp.MustCompile(`(?m)^(#{1,6} )`) 12 | var orderedList = regexp.MustCompile(`(?m)^(\W* {0,3})(\d+)\. `) 13 | var unorderedList = regexp.MustCompile(`(?m)^([^\\\w]*)[*+-] `) 14 | var horizontalDivider = regexp.MustCompile(`(?m)^([-*_] *){3,}$`) 15 | var blockquote = regexp.MustCompile(`(?m)^(\W* {0,3})> `) 16 | 17 | var replacer = strings.NewReplacer( 18 | `*`, `\*`, 19 | `_`, `\_`, 20 | "`", "\\`", 21 | `|`, `\|`, 22 | ) 23 | 24 | // MarkdownCharacters escapes common markdown characters so that 25 | // `

**Not Bold**

ends up as correct markdown `\*\*Not Strong\*\*`. 26 | // No worry, the escaped characters will display fine, just without the formatting. 27 | func MarkdownCharacters(text string) string { 28 | // Escape backslash escapes! 29 | text = backslash.ReplaceAllString(text, `\\$1`) 30 | 31 | // Escape headings 32 | text = heading.ReplaceAllString(text, `\$1`) 33 | 34 | // Escape hr 35 | text = horizontalDivider.ReplaceAllStringFunc(text, func(t string) string { 36 | if strings.Contains(t, "-") { 37 | return strings.Replace(t, "-", `\-`, 3) 38 | } else if strings.Contains(t, "_") { 39 | return strings.Replace(t, "_", `\_`, 3) 40 | } 41 | return strings.Replace(t, "*", `\*`, 3) 42 | }) 43 | 44 | // Escape ol bullet points 45 | text = orderedList.ReplaceAllString(text, `$1$2\. `) 46 | 47 | // Escape ul bullet points 48 | text = unorderedList.ReplaceAllStringFunc(text, func(t string) string { 49 | return regexp.MustCompile(`([*+-])`).ReplaceAllString(t, `\$1`) 50 | }) 51 | 52 | // Escape blockquote indents 53 | text = blockquote.ReplaceAllString(text, `$1\> `) 54 | 55 | // Escape em/strong * 56 | // Escape em/strong _ 57 | // Escape code _ 58 | text = replacer.Replace(text) 59 | 60 | // Escape link brackets 61 | // (disabled) 62 | // var link = regexp.MustCompile(`[\[\]]`) 63 | // text = link.ReplaceAllString(text, `\$&`) 64 | 65 | return text 66 | } 67 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/link/output.referenced_shortcut.golden: -------------------------------------------------------------------------------- 1 | [Simple Absolute Link] 2 | 3 | [Simple Relative Link] 4 | 5 | [Link with Space] 6 | 7 | [Link with Title] 8 | 9 | [Link with multiline Title] 10 | 11 | [Broken Link] 12 | 13 | [First Text\ 14 | ![](http://example.com/xxx)\ 15 | Second Text] 16 | 17 | - [First Text\ 18 | \ 19 | Second Text] 20 | 21 | [GitHub] 22 | 23 | [first top] [second below] 24 | 25 | [first left] [second right] 26 | 27 | Before [close] After 28 | 29 | [**Heading A** **Heading B**] 30 | 31 | [DIW-Chef zum Grünen-Programm "Vermögenssteuer ist aus wirtschaftlicher Sicht klug"] 32 | 33 | [**Die App WDR aktuell begleitet Sie durch den Tag**\ 34 | \ 35 | Sie möchten eine App, die Sie so durch den Tag in NRW begleitet, dass Sie jederzeit mitreden können? Die App WDR aktuell bietet Ihnen dafür immer die passenden Nachrichten.\ 36 |  \| \ 37 | **mehr**] 38 | 39 | [Simple Absolute Link]: http://simple.org/ 40 | [Simple Relative Link]: http://example.com/page.html 41 | [Link with Space]: http://space.org/ 42 | [Link with Title]: http://title.org/ "Some Title" 43 | [Link with multiline Title]: http://title.org/ "Some Long Title" 44 | [Broken Link]: http://broken.com/test.html%5Cr 45 | [First Text\ 46 | ![](http://example.com/xxx)\ 47 | Second Text]: http://multi.org/ 48 | [First Text\ 49 | \ 50 | Second Text]: http://list.org/ 51 | [GitHub]: https://github.com "GitHub" 52 | [first top]: http://first_under.com 53 | [second below]: http://second_under.com 54 | [first left]: http://first_next.com 55 | [second right]: http://second_next.com 56 | [close]: http://example_close.com 57 | [**Heading A** **Heading B**]: http://example.com/page.html 58 | [DIW-Chef zum Grünen-Programm "Vermögenssteuer ist aus wirtschaftlicher Sicht klug"]: http://example.com/page.html "\"Vermögenssteuer ist aus wirtschaftlicher Sicht klug\"" 59 | [**Die App WDR aktuell begleitet Sie durch den Tag**\ 60 | \ 61 | Sie möchten eine App, die Sie so durch den Tag in NRW begleitet, dass Sie jederzeit mitreden können? Die App WDR aktuell bietet Ihnen dafür immer die passenden Nachrichten.\ 62 |  \| \ 63 | **mehr**]: http://example.com/nachrichten/wdr-aktuell-app-stores-100.html "Die App WDR aktuell begleitet Sie durch den Tag " -------------------------------------------------------------------------------- /testdata/TestCommonmark/link/output.referenced_collapsed.golden: -------------------------------------------------------------------------------- 1 | [Simple Absolute Link][] 2 | 3 | [Simple Relative Link][] 4 | 5 | [Link with Space][] 6 | 7 | [Link with Title][] 8 | 9 | [Link with multiline Title][] 10 | 11 | [Broken Link][] 12 | 13 | [First Text\ 14 | ![](http://example.com/xxx)\ 15 | Second Text][] 16 | 17 | - [First Text\ 18 | \ 19 | Second Text][] 20 | 21 | [GitHub][] 22 | 23 | [first top][] [second below][] 24 | 25 | [first left][] [second right][] 26 | 27 | Before [close][] After 28 | 29 | [**Heading A** **Heading B**][] 30 | 31 | [DIW-Chef zum Grünen-Programm "Vermögenssteuer ist aus wirtschaftlicher Sicht klug"][] 32 | 33 | [**Die App WDR aktuell begleitet Sie durch den Tag**\ 34 | \ 35 | Sie möchten eine App, die Sie so durch den Tag in NRW begleitet, dass Sie jederzeit mitreden können? Die App WDR aktuell bietet Ihnen dafür immer die passenden Nachrichten.\ 36 |  \| \ 37 | **mehr**][] 38 | 39 | [Simple Absolute Link]: http://simple.org/ 40 | [Simple Relative Link]: http://example.com/page.html 41 | [Link with Space]: http://space.org/ 42 | [Link with Title]: http://title.org/ "Some Title" 43 | [Link with multiline Title]: http://title.org/ "Some Long Title" 44 | [Broken Link]: http://broken.com/test.html%5Cr 45 | [First Text\ 46 | ![](http://example.com/xxx)\ 47 | Second Text]: http://multi.org/ 48 | [First Text\ 49 | \ 50 | Second Text]: http://list.org/ 51 | [GitHub]: https://github.com "GitHub" 52 | [first top]: http://first_under.com 53 | [second below]: http://second_under.com 54 | [first left]: http://first_next.com 55 | [second right]: http://second_next.com 56 | [close]: http://example_close.com 57 | [**Heading A** **Heading B**]: http://example.com/page.html 58 | [DIW-Chef zum Grünen-Programm "Vermögenssteuer ist aus wirtschaftlicher Sicht klug"]: http://example.com/page.html "\"Vermögenssteuer ist aus wirtschaftlicher Sicht klug\"" 59 | [**Die App WDR aktuell begleitet Sie durch den Tag**\ 60 | \ 61 | Sie möchten eine App, die Sie so durch den Tag in NRW begleitet, dass Sie jederzeit mitreden können? Die App WDR aktuell bietet Ihnen dafür immer die passenden Nachrichten.\ 62 |  \| \ 63 | **mehr**]: http://example.com/nachrichten/wdr-aktuell-app-stores-100.html "Die App WDR aktuell begleitet Sie durch den Tag " -------------------------------------------------------------------------------- /plugin/plugin_test.go: -------------------------------------------------------------------------------- 1 | package plugin 2 | 3 | import ( 4 | "testing" 5 | 6 | md "github.com/JohannesKaufmann/html-to-markdown" 7 | ) 8 | 9 | func TestConfluenceCodeBlock(t *testing.T) { 10 | conv := md.NewConverter("", true, nil) 11 | conv.Use(ConfluenceCodeBlock()) 12 | 13 | input := ` 16 | some other stuff 17 | sql` 20 | expected := "```" + ` 21 | FOR stuff IN imdb_vertices 22 | FILTER LIKE(stuff.description, "%good%vs%evil%", true) 23 | RETURN stuff.description 24 | ` + "```" + ` 25 | some other stuff 26 | ` + "```sql" + ` 27 | FOR stuff IN imdb_vertices 28 | FILTER LIKE(stuff.description, "%good%vs%evil%", true) 29 | RETURN stuff.description 30 | ` + "```" 31 | markdown, err := conv.ConvertString(input) 32 | if err != nil { 33 | t.Error(err) 34 | } 35 | 36 | if markdown != expected { 37 | t.Errorf("got '%s' but wanted '%s'", markdown, expected) 38 | } 39 | } 40 | 41 | func TestConfluenceAttachments(t *testing.T) { 42 | conv := md.NewConverter("", true, nil) 43 | conv.Use(ConfluenceAttachments()) 44 | 45 | input := `

Here’s an image:

Another one

` 46 | expected := `Here’s an image: 47 | 48 | ![][image.png] 49 | 50 | Another one 51 | 52 | ![][image.jpg]` 53 | markdown, err := conv.ConvertString(input) 54 | if err != nil { 55 | t.Error(err) 56 | } 57 | 58 | if markdown != expected { 59 | t.Errorf("got '%s' but wanted '%s'", markdown, expected) 60 | } 61 | } 62 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/list/input.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 |

    5 |
  • Some Thing
  • 6 |
  • Another Thing
  • 7 |
8 | 9 | 10 | 11 |
    12 |
  1. First Thing
  2. 13 |
  3. Second Thing
  4. 14 |
15 | 16 | 17 | 18 |
    19 |
  1. 1
  2. 20 |
  3. 2
  4. 21 |
  5. 3
  6. 22 |
  7. 4
  8. 23 |
  9. 5
  10. 24 |
  11. 6
  12. 25 |
  13. 7
  14. 26 |
  15. 8
  16. 27 |
  17. 9
  18. 28 |
  19. 10
  20. 29 |
  21. 11
  22. 30 |
  23. 12
  24. 31 |
  25. 13
  26. 32 |
  27. 14
  28. 33 |
  29. 15
  30. 34 |
  31. 35 |
  32. 36 |
  33. 37 |
  34. 16
  35. 38 |
  36. 17
  37. 39 |
  38. 18
  39. 40 |
  40. 19
  41. 41 |
  42. 20
  43. 42 |
  44. 43 |
  45. 22
  46. 44 |
45 | 46 | 47 | 48 |
    49 |
  • Link: example works
  • 50 |
  • 51 | Link: 52 | example 53 | works 54 |
  • 55 |
56 | 57 | 58 |
    59 |
  1. 60 |

    First Thing

    61 |
      62 |
    • Some Thing
    • 63 |
    • Another Thing
    • 64 |
    65 |
  2. 66 |
  3. Second Thing
  4. 67 |
68 | 69 | 70 | 71 |
    72 |
  • foo
  • 73 |
  • 74 |
  • bar
  • 75 |
76 | 77 | 78 | 79 |
    80 |
  • List items
  • 81 |
  • Ending with
  • 82 |
  • A space
  • 83 |
84 | 85 | 86 | 87 |
    88 |
  • 89 | Indent First Thing 90 |

    Second Thing

    91 |
  • 92 |
  • Third Thing
  • 93 |
94 | 95 | 96 | 97 |

- Not List

98 | 99 |

1. Not List 1. Not List 100 | 1. Not List

101 | 102 | 103 | 104 |
    105 |
  1. 106 |

    A paragraph 107 | with two lines.

    108 | 109 |
    indented code
    110 | 111 |
    112 |

    A block quote.

    113 |
    114 |
  2. 115 |
-------------------------------------------------------------------------------- /testdata/TestCommonmark/list_nested/input.html: -------------------------------------------------------------------------------- 1 | 2 | 3 |
    4 |
  • foo 5 |
      6 |
    • bar 7 |
        8 |
      • baz 9 |
          10 |
        • boo
        • 11 |
        12 |
      • 13 |
      14 |
    • 15 |
    16 |
  • 17 |
18 | 19 | 20 | 21 |
    22 |
  • Coffee
  • 23 |
  • Tea
      24 |
    • Black tea
    • 25 |
    • Green tea
    • 26 |
    27 |
  • 28 |
  • Milk
  • 29 |

header1

30 | 31 | 32 | 33 |
    34 |
  • Bullet list 35 |
      36 |
    • Nested bullet 37 |
        38 |
      • Sub-nested bullet etc
      • 39 |
      40 |
    • 41 |
    42 |
  • 43 |
  • Bullet list item 2
  • 44 |
45 | 46 | 47 | 48 |
    49 |
  • One
  • 50 |
      51 |
    • One point one
    • 52 |
    • One point two
    • 53 |
    54 |
55 | 56 | 57 |
    58 |
  1. One
  2. 59 |
      60 |
    1. One point one
    2. 61 |
    3. One point two
    4. 62 |
    63 |
64 | 65 | 66 | 67 |
    68 |
  • 1
  • 69 |
  • 2
  • 70 |
  • 71 |
      72 |
    • 2.1
    • 73 |
    • 2.2
    • 74 |
    • 75 |
        76 |
      • 2.2.1
      • 77 |
      • 2.2.2
      • 78 |
      • 2.2.3
      • 79 |
      80 |
    • 81 |
    82 |
  • 83 |
  • 3
  • 84 |
85 | 86 | 87 | 88 |
    89 |
  1. 1
  2. 90 |
  3. 2
  4. 91 |
  5. 92 |
      93 |
    1. 2.1
    2. 94 |
    3. 2.2
    4. 95 |
    5. 96 |
        97 |
      1. 2.2.1
      2. 98 |
      3. 2.2.2
      4. 99 |
      5. 2.2.3
      6. 100 |
      101 |
    6. 102 |
    103 |
  6. 104 |
  7. 3
  8. 105 |
106 | 107 | 108 | 109 |
    110 |
  • 111 |

    First Thing

    112 |

    Second Thing

    113 |
      114 |
    • 115 |

      Nested First Thing

      116 |

      Nested Thing

      117 |
    • 118 |
    119 |
  • 120 |
121 | 122 | 123 | 124 |
    125 |
  1. 126 |

    First Thing

    127 |

    Second Thing

    128 |
      129 |
    1. 130 |

      Nested First Thing

      131 |

      Nested Thing

      132 |
    2. 133 |
    3. 134 |

      Date:

      135 |

      20.02.2021

      136 |

      26. Mai - 3. Juni

      137 |
    4. 138 |
    139 |
  2. 140 |
141 | 142 | -------------------------------------------------------------------------------- /plugin/movefrontmatter.go: -------------------------------------------------------------------------------- 1 | package plugin 2 | 3 | import ( 4 | "strings" 5 | 6 | md "github.com/JohannesKaufmann/html-to-markdown" 7 | "github.com/PuerkitoBio/goquery" 8 | ) 9 | 10 | const moveFrontmatterAttr = "movefrontmatter" 11 | 12 | // EXPERIMENTALMoveFrontMatter moves a frontmatter block at the beginning 13 | // of the document to the top of the generated markdown block, without touching (and escaping) it. 14 | func EXPERIMENTALMoveFrontMatter(delimiters ...rune) md.Plugin { 15 | return func(c *md.Converter) []md.Rule { 16 | if len(delimiters) == 0 { 17 | delimiters = []rune{'+', '$', '-', '%'} 18 | } 19 | 20 | var delimitersList []string 21 | for _, c := range delimiters { 22 | delimitersList = append(delimitersList, strings.Repeat(string(c), 3)) 23 | } 24 | 25 | isDelimiter := func(line string) bool { 26 | for _, delimiter := range delimitersList { 27 | if strings.HasPrefix(line, delimiter) { 28 | return true 29 | } 30 | } 31 | return false 32 | } 33 | 34 | c.Before(func(selec *goquery.Selection) { 35 | selec.Find("body").Contents().EachWithBreak(func(i int, s *goquery.Selection) bool { 36 | text := s.Text() 37 | 38 | // skip empty strings 39 | if strings.TrimSpace(text) == "" { 40 | return true 41 | } 42 | 43 | var frontmatter string 44 | var html string = text // if there is no frontmatter, keep the text 45 | 46 | lines := strings.Split(text, "\n") 47 | for i := 0; i < len(lines); i++ { 48 | if isDelimiter(lines[i]) { 49 | if i == 0 { 50 | continue 51 | } 52 | 53 | // split the frontmatter 54 | f := lines[:i+1] 55 | frontmatter = strings.Join(f, "\n") 56 | 57 | // and the html content AFTER the frontmatter 58 | h := lines[i+1:] 59 | html = strings.Join(h, "\n") 60 | break 61 | } 62 | } 63 | 64 | s.SetAttr(moveFrontmatterAttr, frontmatter) 65 | s.SetText(html) 66 | 67 | // the front matter must be the first thing in the file. So we break out of the loop 68 | return false 69 | }) 70 | }) 71 | 72 | return []md.Rule{ 73 | { 74 | Filter: []string{"#text"}, 75 | AdvancedReplacement: func(content string, selec *goquery.Selection, opt *md.Options) (md.AdvancedResult, bool) { 76 | frontmatter, exists := selec.Attr(moveFrontmatterAttr) 77 | 78 | if !exists { 79 | return md.AdvancedResult{}, true 80 | } 81 | 82 | return md.AdvancedResult{ 83 | Header: frontmatter, 84 | Markdown: content, 85 | }, false 86 | }, 87 | }, 88 | } 89 | } 90 | } 91 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/pre_code/input.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | last_30_days 4 |
5 | 6 | 7 | 8 | with backtick (`) 9 |
10 | 11 | 12 | 13 | with backtick (``) 14 |
15 | 16 | 17 | 18 | here are three ``` here are four ```` here is one ` that's it 19 |
20 | 21 | 22 | 23 | `starting & ending with a backtick` 24 |
25 | 26 | 27 | 28 |
29 |

Who ate the most donuts this week?

30 |
Jeff  15
 31 | Sam   11
 32 | Robin  6
33 |
34 | 35 | 36 | 37 |
// Fprint formats using the default formats for its operands and writes to w.
 38 | // Spaces are added between operands when neither is a string.
 39 | // It returns the number of bytes written and any write error encountered.
 40 | func Fprint(w io.Writer, a ...interface{}) (n int, err error) {
41 | 42 | 43 | 44 |

When x = 3, that means x + 2 = 5

45 | 46 | 47 | 48 |

The <img> tag is used to embed an image.

49 |

The tag is used to embed an image.

50 | 51 | 52 | 53 |

Two variables A B

54 | 55 | 56 | 57 |

CSS: 58 | body { 59 | color: yellow; 60 | font-size: 16px; 61 | } 62 |

63 | 64 | 65 |

CSS: 66 | 67 | 68 | body { 69 | color: yellow; 70 | font-size: 16px; 71 | } 72 | 73 | 74 |

75 | 76 | 77 | 78 |
```
79 | 80 |
~~~
81 | 82 |

 83 | Some ~~~
 84 | totally ~~~~~~ normal
 85 | ~ code
 86 | 
87 | 88 | 89 | 90 |

 91 | The <img> tag is used to embed an image.
 92 | 
 93 | The  tag is used to embed an image.
 94 | 
95 | 96 | 97 |

 98 | 
 99 |     
100 | 
101 | 
102 | -------------------------------------------------------------------------------- /testdata/TestCommonmark/link/input.html: -------------------------------------------------------------------------------- 1 | 2 | Simple Absolute Link 3 |
4 | 5 | 6 | 7 | Simple Relative Link 8 |
9 | 10 | 11 | 12 | Link with Space 13 |
14 | 15 | 16 | 17 | Link with Title 18 |
19 | 20 | 21 | 22 | Link with multiline Title 26 |
27 | 28 | 29 | Broken Link 30 |
31 | 32 | 33 | 34 | 35 |

First Text

36 | 37 |

Second Text

38 |
39 |
40 | 41 | 42 | 43 | 59 | 60 | 61 | 62 | 63 | 64 | 65 |
66 | 67 | 68 | 69 |

70 | first top 71 | 72 | second below 73 |

74 |
75 | 76 | 77 | 78 |

first left second right

79 |
80 | 81 | 82 | 83 |

BeforecloseAfter

84 |
85 | 86 | 87 | 88 | 89 |

Heading A

90 |

Heading B

91 |
92 |
93 | 94 | 95 | 96 | 100 | DIW-Chef zum Grünen-Programm 101 | "Vermögenssteuer ist aus wirtschaftlicher Sicht klug" 102 | 103 |
104 | 105 | 106 | 107 |

108 | Die App WDR aktuell begleitet Sie durch den Tag 109 |

110 |

111 | Sie möchten eine App, die Sie so durch den Tag in NRW begleitet, dass Sie jederzeit mitreden können? Die App WDR aktuell bietet Ihnen dafür immer die passenden Nachrichten. 112 |  |  113 | mehr 114 |

115 |
116 |
-------------------------------------------------------------------------------- /testdata/TestPlugins/table/input.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 |
FirstnameLastnameAge
JillSmith50
EveJackson94
Empty
End
28 | 29 | 30 | 31 | 32 |

With | Character

33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 49 | 50 | 51 |
FirstnameWith | CharacterAge
JillSmith50
Eve 47 | Jackson 48 | 94
52 | 53 | 54 | 55 |

Tabelle mit thead, tfoot, and tbody

56 | 57 | 58 | 59 | 60 | 61 | 62 | 63 | 64 | 65 | 66 | 67 | 68 | 69 | 70 | 71 | 72 | 73 | 74 | 75 |
Header content 1Header content 2
Footer content 1Footer content 2
Body content 1Body content 2
76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 |
Unglaublich tolle Beschreibung
Unglaublich tolle Daten
85 | 86 | 87 | 88 | 89 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 |
90 |

Pegel DUISBURG-RUHRORT

91 | logo 92 |

Quelle:

93 |
AB
ABCD
ABC
110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 |
StrongLinkItalic
varbc
125 | 126 | 127 | 128 | 129 | 130 | 135 | 136 |
131 |

1

132 |

2

133 |

3

134 |
137 | 138 | 139 | 140 | 141 | 142 | 153 | 154 |
143 | 144 | 145 | 150 | 151 |
146 |

1

147 |

2

148 |

3

149 |
152 |
155 | -------------------------------------------------------------------------------- /go.sum: -------------------------------------------------------------------------------- 1 | github.com/PuerkitoBio/goquery v1.5.1 h1:PSPBGne8NIUWw+/7vFBV+kG2J/5MOjbzc7154OaKCSE= 2 | github.com/PuerkitoBio/goquery v1.5.1/go.mod h1:GsLWisAFVj4WgDibEWF4pvYnkVQBpKBKeU+7zCJoLcc= 3 | github.com/andybalholm/cascadia v1.1.0 h1:BuuO6sSfQNFRu1LppgbD25Hr2vLYW25JvxHs5zzsLTo= 4 | github.com/andybalholm/cascadia v1.1.0/go.mod h1:GsXiBklL0woXo1j/WYWtSYYC4ouU9PqHO0sqidkEA4Y= 5 | github.com/davecgh/go-spew v1.1.0/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 6 | github.com/davecgh/go-spew v1.1.1 h1:vj9j/u1bqnvCEfJOwUhtlOARqs3+rkHYY13jYWTU97c= 7 | github.com/davecgh/go-spew v1.1.1/go.mod h1:J7Y8YcW2NihsgmVo/mv3lAwl/skON4iLHjSsI+c5H38= 8 | github.com/kr/pretty v0.1.0 h1:L/CwN0zerZDmRFUapSPitk6f+Q3+0za1rQkzVuMiMFI= 9 | github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo= 10 | github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ= 11 | github.com/kr/text v0.1.0 h1:45sCR5RtlFHMR4UwH9sdQ5TC8v0qDQCHnXt+kaKSTVE= 12 | github.com/kr/text v0.1.0/go.mod h1:4Jbv+DJW3UT/LiOwJeYQe1efqtUx/iVham/4vfdArNI= 13 | github.com/pkg/errors v0.8.1/go.mod h1:bwawxfHBFNV+L2hUp1rHADufV3IMtnDRdf1r5NINEl0= 14 | github.com/pmezard/go-difflib v1.0.0 h1:4DBwDE0NGyQoBHbLQYPwSUPoCMWR5BEzIk/f1lZbAQM= 15 | github.com/pmezard/go-difflib v1.0.0/go.mod h1:iKH77koFhYxTK1pcRnkKkqfTogsbg7gZNVY4sRDYZ/4= 16 | github.com/sebdah/goldie/v2 v2.5.1 h1:hh70HvG4n3T3MNRJN2z/baxPR8xutxo7JVxyi2svl+s= 17 | github.com/sebdah/goldie/v2 v2.5.1/go.mod h1:oZ9fp0+se1eapSRjfYbsV/0Hqhbuu3bJVvKI/NNtssI= 18 | github.com/sergi/go-diff v1.0.0/go.mod h1:0CfEIISq7TuYL3j771MWULgwwjU+GofnZX9QAmXWZgo= 19 | github.com/sergi/go-diff v1.1.0 h1:we8PVUC3FE2uYfodKH/nBHMSetSfHDR6scGdBi+erh0= 20 | github.com/sergi/go-diff v1.1.0/go.mod h1:STckp+ISIX8hZLjrqAeVduY0gWCT9IjLuqbuNXdaHfM= 21 | github.com/stretchr/objx v0.1.0/go.mod h1:HFkY916IF+rwdDfMAkV7OtwuqBVzrE8GR6GFx+wExME= 22 | github.com/stretchr/testify v1.3.0/go.mod h1:M5WIy9Dh21IEIfnGCwXGc5bZfKNJtfHm1UVUgZn+9EI= 23 | github.com/stretchr/testify v1.4.0 h1:2E4SXV/wtOkTonXsotYi4li6zVWxYlZuYNCXe9XRJyk= 24 | github.com/stretchr/testify v1.4.0/go.mod h1:j7eGeouHqKxXV5pUuKE4zz7dFj8WfuZ+81PSLYec5m4= 25 | github.com/yuin/goldmark v1.2.0 h1:WOOcyaJPlzb8fZ8TloxFe8QZkhOOJx87leDa9MIT9dc= 26 | github.com/yuin/goldmark v1.2.0/go.mod h1:3hX8gzYuyVAZsxl0MRgGTJEmQBFcNTphYh9decYSb74= 27 | golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w= 28 | golang.org/x/net v0.0.0-20180218175443-cbe0f9307d01/go.mod h1:mL1N/T3taQHkDXs73rZJwtUhF3w3ftmwwsq0BUmARs4= 29 | golang.org/x/net v0.0.0-20200202094626-16171245cfb2/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= 30 | golang.org/x/net v0.0.0-20200320220750-118fecf932d8 h1:1+zQlQqEEhUeStBTi653GZAnAuivZq/2hz+Iz+OP7rg= 31 | golang.org/x/net v0.0.0-20200320220750-118fecf932d8/go.mod h1:z5CRVTTTmAJ677TzLLGU+0bjPO0LkuOLi4/5GtJWs/s= 32 | golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= 33 | golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ= 34 | gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= 35 | gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15 h1:YR8cESwS4TdDjEe65xsg0ogRM/Nc3DYOhEAlW+xobZo= 36 | gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0= 37 | gopkg.in/yaml.v2 v2.2.2/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= 38 | gopkg.in/yaml.v2 v2.2.4/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= 39 | gopkg.in/yaml.v2 v2.2.8 h1:obN1ZagJSUGI0Ek/LBmuj4SNLPfIny3KsKFopxRdj10= 40 | gopkg.in/yaml.v2 v2.2.8/go.mod h1:hI93XBmqTisBFMUTm0b8Fm+jr3Dg1NNxqwp+5A1VGuI= 41 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/github_about/input.html: -------------------------------------------------------------------------------- 1 | 2 | 3 |

About

4 |

5 | ⚙️ 6 | Convert HTML to Markdown. Even works with whole websites. 7 |

8 |

Topics

9 | 28 |

Resources

29 | 37 |

License

38 | 46 | 47 | -------------------------------------------------------------------------------- /plugin/vimeo.go: -------------------------------------------------------------------------------- 1 | package plugin 2 | 3 | import ( 4 | "encoding/json" 5 | "fmt" 6 | "net/http" 7 | "regexp" 8 | "strings" 9 | "time" 10 | "unicode/utf8" 11 | 12 | md "github.com/JohannesKaufmann/html-to-markdown" 13 | "github.com/PuerkitoBio/goquery" 14 | ) 15 | 16 | // Timeout for the http client 17 | var Timeout = time.Second * 10 18 | var netClient = &http.Client{ 19 | Timeout: Timeout, 20 | } 21 | 22 | type vimeoVideo struct { 23 | Type string `json:"type"` 24 | Version string `json:"version"` 25 | ProviderName string `json:"provider_name"` 26 | ProviderURL string `json:"provider_url"` 27 | Title string `json:"title"` 28 | AuthorName string `json:"author_name"` 29 | AuthorURL string `json:"author_url"` 30 | IsPlus string `json:"is_plus"` 31 | AccountType string `json:"account_type"` 32 | HTML string `json:"html"` 33 | Width int `json:"width"` 34 | Height int `json:"height"` 35 | Duration int `json:"duration"` 36 | Description string `json:"description"` 37 | ThumbnailURL string `json:"thumbnail_url"` 38 | ThumbnailWidth int `json:"thumbnail_width"` 39 | ThumbnailHeight int `json:"thumbnail_height"` 40 | ThumbnailURLWithPlayButton string `json:"thumbnail_url_with_play_button"` 41 | UploadDate string `json:"upload_date"` 42 | VideoID int `json:"video_id"` 43 | URI string `json:"uri"` 44 | } 45 | 46 | var vimeoID = regexp.MustCompile(`video\/(\d*)`) 47 | 48 | type vimeoVariation int 49 | 50 | // Configure how the Vimeo Plugin should display the video in markdown. 51 | const ( 52 | VimeoOnlyThumbnail vimeoVariation = iota 53 | VimeoWithTitle 54 | VimeoWithDescription 55 | ) 56 | 57 | // EXPERIMENTALVimeoEmbed registers a rule (for iframes) and 58 | // returns a markdown compatible representation (link to video, ...). 59 | func EXPERIMENTALVimeoEmbed(variation vimeoVariation) md.Plugin { 60 | return func(c *md.Converter) []md.Rule { 61 | getVimeoData := func(id string) (*vimeoVideo, error) { 62 | u := fmt.Sprintf("http://vimeo.com/api/oembed.json?url=https://vimeo.com/%s", id) 63 | 64 | resp, err := netClient.Get(u) 65 | if err != nil { 66 | return nil, err 67 | } 68 | 69 | defer resp.Body.Close() 70 | 71 | var res vimeoVideo 72 | err = json.NewDecoder(resp.Body).Decode(&res) 73 | if err != nil { 74 | return nil, err 75 | } 76 | return &res, nil 77 | } 78 | cleanDescription := func(html string) (string, error) { 79 | text, err := c.ConvertString(html) 80 | if err != nil { 81 | return "", err 82 | } 83 | 84 | text = strings.Replace(text, "\n", " ", -1) 85 | text = strings.Replace(text, "\t", " ", -1) 86 | before := utf8.RuneCountInString(text) 87 | text = summary(text, 70) 88 | after := utf8.RuneCountInString(text) 89 | if after != before { 90 | text += "..." 91 | } 92 | return text, nil 93 | } 94 | 95 | return []md.Rule{ 96 | { 97 | Filter: []string{"iframe"}, 98 | Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 99 | src := selec.AttrOr("src", "") 100 | if !strings.Contains(src, "vimeo.com") { 101 | return nil 102 | } 103 | parts := vimeoID.FindStringSubmatch(src) 104 | if len(parts) != 2 { 105 | return nil 106 | } 107 | id := parts[1] 108 | 109 | video, err := getVimeoData(id) 110 | if err != nil { 111 | panic(err) 112 | } 113 | 114 | // desc, err := cleanDescription(video.Description) 115 | // if err != nil { 116 | // panic(err) 117 | // } 118 | 119 | // [![Little red ridning hood](http://i.imgur.com/7YTMFQp.png)](https://vimeo.com/3514904 "Little red riding hood - Click to Watch!") 120 | // text := fmt.Sprintf("[![%s](%s) ](%s)", desc, video.ThumbnailURLWithPlayButton, "https://vimeo.com/"+video.URI) 121 | text := fmt.Sprintf(`[![](%s)](https://vimeo.com/%d)`, video.ThumbnailURLWithPlayButton, video.VideoID) 122 | 123 | switch variation { 124 | case VimeoOnlyThumbnail: 125 | // do nothing 126 | case VimeoWithTitle: 127 | duration := time.Duration(video.Duration) * time.Second 128 | text += fmt.Sprintf("\n\n'%s' by ['%s'](%s) (%s)", video.Title, video.AuthorName, video.AuthorURL, duration.String()) 129 | case VimeoWithDescription: 130 | desc, err := cleanDescription(video.Description) 131 | if err != nil { 132 | panic(err) 133 | } 134 | text += "\n\n" + desc 135 | } 136 | 137 | return &text 138 | }, 139 | }, 140 | } 141 | } 142 | } 143 | 144 | // truncate 145 | func summary(text string, limit int) string { 146 | result := text 147 | chars := 0 148 | for i := range text { 149 | if chars >= limit { 150 | result = text[:i] 151 | break 152 | } 153 | chars++ 154 | } 155 | return result 156 | } 157 | -------------------------------------------------------------------------------- /plugin/table.go: -------------------------------------------------------------------------------- 1 | package plugin 2 | 3 | import ( 4 | "regexp" 5 | "strings" 6 | 7 | md "github.com/JohannesKaufmann/html-to-markdown" 8 | "github.com/PuerkitoBio/goquery" 9 | ) 10 | 11 | // TableCompat is a compatibility plugon for environments where 12 | // only commonmark markdown (without Tables) is supported. 13 | // 14 | // Note: In an environment that supports "real" Tables, like GitHub's Flavored Markdown 15 | // use `plugin.Table()` instead. 16 | func TableCompat() md.Plugin { 17 | return func(c *md.Converter) []md.Rule { 18 | return []md.Rule{ 19 | { 20 | Filter: []string{"td", "th"}, 21 | Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 22 | content = strings.TrimSpace(content) 23 | 24 | if content == "" { 25 | return &content 26 | } 27 | 28 | next := selec.Next() 29 | nextIsEmpty := strings.TrimSpace(next.Text()) == "" 30 | if (next.Is("td") || next.Is("th")) && !nextIsEmpty { 31 | content = content + " · " 32 | } 33 | 34 | return &content 35 | }, 36 | }, 37 | { 38 | Filter: []string{"tr"}, 39 | Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 40 | content = content + "\n\n" 41 | 42 | return &content 43 | }, 44 | }, 45 | } 46 | } 47 | } 48 | 49 | // Table converts a html table (using hyphens and pipe characters) to a 50 | // visuall representation in markdown. 51 | // 52 | // Note: This Plugin overrides the default compatibility rules from `commonmark.go`. 53 | // Only use this Plugin in an environment that has extendeded the normal syntax, 54 | // like GitHub's Flavored Markdown. 55 | func Table() md.Plugin { 56 | return func(c *md.Converter) []md.Rule { 57 | c.Before(func(selec *goquery.Selection) { 58 | selec.Find("caption").Each(func(i int, s *goquery.Selection) { 59 | parent := s.Parent() 60 | if !parent.Is("table") { 61 | return 62 | } 63 | 64 | // move the caption from inside the table to after the table 65 | parent.AfterSelection(s) 66 | }) 67 | }) 68 | 69 | return []md.Rule{ 70 | { 71 | Filter: []string{"table"}, 72 | Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 73 | noHeader := selec.Find("thead").Length() == 0 && selec.Find("th").Length() == 0 74 | if noHeader { 75 | var maxCount int 76 | selec.Find("tr").Each(func(i int, s *goquery.Selection) { 77 | count := s.Children().Length() 78 | if count > maxCount { 79 | maxCount = count 80 | } 81 | }) 82 | 83 | // add an empty header, so that the table is recognized. 84 | header := "|" + strings.Repeat(" |", maxCount) 85 | divider := "|" + strings.Repeat(" --- |", maxCount) 86 | 87 | content = header + "\n" + divider + content 88 | } 89 | 90 | content = "\n\n" + content + "\n\n" 91 | return &content 92 | }, 93 | }, 94 | { // TableCell 95 | Filter: []string{"th", "td"}, 96 | Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 97 | return md.String(getCellContent(content, selec)) 98 | }, 99 | }, 100 | { // TableRow 101 | Filter: []string{"tr"}, 102 | Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 103 | borderCells := "" 104 | 105 | if isHeadingRow(selec) { 106 | selec.Children().Each(func(i int, s *goquery.Selection) { 107 | border := "---" 108 | if align, ok := s.Attr("align"); ok { 109 | switch align { 110 | case "left": 111 | border = ":--" 112 | case "right": 113 | border = "--:" 114 | case "center": 115 | border = ":-:" 116 | } 117 | } 118 | 119 | borderCells += getCellContent(border, s) 120 | }) 121 | } 122 | 123 | text := "\n" + content 124 | if borderCells != "" { 125 | text += "\n" + borderCells 126 | } 127 | return &text 128 | }, 129 | }, 130 | } 131 | } 132 | } 133 | 134 | // A tr is a heading row if: 135 | // - the parent is a THEAD 136 | // - or if its the first child of the TABLE or the first TBODY (possibly 137 | // following a blank THEAD) 138 | // - and every cell is a TH 139 | func isHeadingRow(s *goquery.Selection) bool { 140 | parent := s.Parent() 141 | 142 | if goquery.NodeName(parent) == "thead" { 143 | return true 144 | } 145 | 146 | isTableOrBody := parent.Is("table") || isFirstTbody(parent) 147 | 148 | everyTH := true 149 | s.Children().Each(func(i int, s *goquery.Selection) { 150 | if goquery.NodeName(s) != "th" { 151 | everyTH = false 152 | } 153 | }) 154 | 155 | if parent.Children().First().IsSelection(s) && isTableOrBody && everyTH { 156 | return true 157 | } 158 | 159 | return false 160 | } 161 | func isFirstTbody(s *goquery.Selection) bool { 162 | firstSibling := s.Siblings().Eq(0) // TODO: previousSibling 163 | if s.Is("tbody") && firstSibling.Length() == 0 { 164 | return true 165 | } 166 | 167 | return false 168 | } 169 | 170 | var newLineRe = regexp.MustCompile(`(\r?\n)+`) 171 | 172 | func getCellContent(content string, s *goquery.Selection) string { 173 | content = strings.TrimSpace(content) 174 | if s.Find("table").Length() == 0 { 175 | // nested tables not found 176 | content = newLineRe.ReplaceAllString(content, "
") 177 | } 178 | index := -1 179 | for i, node := range s.Parent().Children().Nodes { 180 | if s.IsNodes(node) { 181 | index = i 182 | break 183 | } 184 | } 185 | prefix := " " 186 | if index == 0 { 187 | prefix = "| " 188 | } 189 | return prefix + content + " |" 190 | } 191 | -------------------------------------------------------------------------------- /markdown.go: -------------------------------------------------------------------------------- 1 | package md 2 | 3 | import ( 4 | "bytes" 5 | "log" 6 | "net/url" 7 | "regexp" 8 | "strings" 9 | 10 | "github.com/PuerkitoBio/goquery" 11 | "golang.org/x/net/html" 12 | ) 13 | 14 | var ( 15 | ruleDefault = func(content string, selec *goquery.Selection, opt *Options) *string { 16 | return &content 17 | } 18 | ruleKeep = func(content string, selec *goquery.Selection, opt *Options) *string { 19 | element := selec.Get(0) 20 | 21 | var buf bytes.Buffer 22 | err := html.Render(&buf, element) 23 | if err != nil { 24 | log.Println("[JohannesKaufmann/html-to-markdown] ruleKeep: error while rendering the element to html:", err) 25 | return String("") 26 | } 27 | 28 | return String(buf.String()) 29 | } 30 | ) 31 | 32 | var inlineElements = []string{ // -> https://developer.mozilla.org/de/docs/Web/HTML/Inline_elemente 33 | "b", "big", "i", "small", "tt", 34 | "abbr", "acronym", "cite", "code", "dfn", "em", "kbd", "strong", "samp", "var", 35 | "a", "bdo", "br", "img", "map", "object", "q", "script", "span", "sub", "sup", 36 | "button", "input", "label", "select", "textarea", 37 | } 38 | 39 | // IsInlineElement can be used to check wether a node name (goquery.Nodename) is 40 | // an html inline element and not a block element. Used in the rule for the 41 | // p tag to check wether the text is inside a block element. 42 | func IsInlineElement(e string) bool { 43 | for _, element := range inlineElements { 44 | if element == e { 45 | return true 46 | } 47 | } 48 | return false 49 | } 50 | 51 | // String is a helper function to return a pointer. 52 | func String(text string) *string { 53 | return &text 54 | } 55 | 56 | // Options to customize the output. You can change stuff like 57 | // the character that is used for strong text. 58 | type Options struct { 59 | // "setext" or "atx" 60 | // default: "atx" 61 | HeadingStyle string 62 | 63 | // Any Thematic break 64 | // default: "* * *" 65 | HorizontalRule string 66 | 67 | // "-", "+", or "*" 68 | // default: "-" 69 | BulletListMarker string 70 | 71 | // "indented" or "fenced" 72 | // default: "indented" 73 | CodeBlockStyle string 74 | 75 | // ``` or ~~~ 76 | // default: ``` 77 | Fence string 78 | 79 | // _ or * 80 | // default: _ 81 | EmDelimiter string 82 | 83 | // ** or __ 84 | // default: ** 85 | StrongDelimiter string 86 | 87 | // inlined or referenced 88 | // default: inlined 89 | LinkStyle string 90 | 91 | // full, collapsed, or shortcut 92 | // default: full 93 | LinkReferenceStyle string 94 | 95 | domain string 96 | 97 | // GetAbsoluteURL parses the `rawURL` and adds the `domain` to convert relative (/page.html) 98 | // urls to absolute urls (http://domain.com/page.html). 99 | // 100 | // The default is `DefaultGetAbsoluteURL`, unless you override it. That can also 101 | // be useful if you want to proxy the images. 102 | GetAbsoluteURL func(selec *goquery.Selection, rawURL string, domain string) string 103 | 104 | // GetCodeBlockLanguage identifies the language for syntax highlighting 105 | // of a code block. The default is `DefaultGetCodeBlockLanguage`, which 106 | // only gets the attribute x from the selection. 107 | // 108 | // You can override it if you want more results, for example by using 109 | // lexers.Analyse(content) from github.com/alecthomas/chroma 110 | // TODO: implement 111 | // GetCodeBlockLanguage func(s *goquery.Selection, content string) string 112 | } 113 | 114 | // DefaultGetAbsoluteURL is the default function and can be overridden through `GetAbsoluteURL` in the options. 115 | func DefaultGetAbsoluteURL(selec *goquery.Selection, rawURL string, domain string) string { 116 | if domain == "" { 117 | return rawURL 118 | } 119 | 120 | u, err := url.Parse(rawURL) 121 | if err != nil { 122 | // we can't do anything with this url because it is invalid 123 | return rawURL 124 | } 125 | 126 | if u.Scheme == "data" { 127 | // this is a data uri (for example an inline base64 image) 128 | return rawURL 129 | } 130 | 131 | if u.Scheme == "" { 132 | u.Scheme = "http" 133 | } 134 | if u.Host == "" { 135 | u.Host = domain 136 | } 137 | 138 | return u.String() 139 | } 140 | 141 | // AdvancedResult is used for example for links. If you use LinkStyle:referenced 142 | // the link href is placed at the bottom of the generated markdown (Footer). 143 | type AdvancedResult struct { 144 | Header string 145 | Markdown string 146 | Footer string 147 | } 148 | 149 | // Rule to convert certain html tags to markdown. 150 | // md.Rule{ 151 | // Filter: []string{"del", "s", "strike"}, 152 | // Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 153 | // // You need to return a pointer to a string (md.String is just a helper function). 154 | // // If you return nil the next function for that html element 155 | // // will be picked. For example you could only convert an element 156 | // // if it has a certain class name and fallback if not. 157 | // return md.String("~" + content + "~") 158 | // }, 159 | // } 160 | type Rule struct { 161 | Filter []string 162 | Replacement func(content string, selec *goquery.Selection, options *Options) *string 163 | AdvancedReplacement func(content string, selec *goquery.Selection, options *Options) (res AdvancedResult, skip bool) 164 | } 165 | 166 | var leadingNewlinesR = regexp.MustCompile(`^\n+`) 167 | var trailingNewlinesR = regexp.MustCompile(`\n+$`) 168 | 169 | var newlinesR = regexp.MustCompile(`\n+`) 170 | var tabR = regexp.MustCompile(`\t+`) 171 | var indentR = regexp.MustCompile(`(?m)\n`) 172 | 173 | func (conv *Converter) selecToMD(domain string, selec *goquery.Selection, opt *Options) AdvancedResult { 174 | var result AdvancedResult 175 | 176 | var builder strings.Builder 177 | selec.Contents().Each(func(i int, s *goquery.Selection) { 178 | name := goquery.NodeName(s) 179 | rules := conv.getRuleFuncs(name) 180 | 181 | for i := len(rules) - 1; i >= 0; i-- { 182 | rule := rules[i] 183 | 184 | content := conv.selecToMD(domain, s, opt) 185 | if content.Header != "" { 186 | result.Header += content.Header 187 | } 188 | if content.Footer != "" { 189 | result.Footer += content.Footer 190 | } 191 | 192 | res, skip := rule(content.Markdown, s, opt) 193 | if res.Header != "" { 194 | result.Header += res.Header + "\n" 195 | } 196 | if res.Footer != "" { 197 | result.Footer += res.Footer + "\n" 198 | } 199 | 200 | if !skip { 201 | builder.WriteString(res.Markdown) 202 | return 203 | } 204 | } 205 | }) 206 | result.Markdown = builder.String() 207 | return result 208 | } 209 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # html-to-markdown 2 | 3 | [![Go Report Card](https://goreportcard.com/badge/github.com/JohannesKaufmann/html-to-markdown)](https://goreportcard.com/report/github.com/JohannesKaufmann/html-to-markdown) 4 | [![codecov](https://codecov.io/gh/JohannesKaufmann/html-to-markdown/branch/master/graph/badge.svg)](https://codecov.io/gh/JohannesKaufmann/html-to-markdown) 5 | ![GitHub MIT License](https://img.shields.io/github/license/JohannesKaufmann/html-to-markdown) 6 | [![GoDoc](https://godoc.org/github.com/JohannesKaufmann/html-to-markdown?status.png)](http://godoc.org/github.com/JohannesKaufmann/html-to-markdown) 7 | 8 | ![gopher stading on top of a machine that converts a box of html to blocks of markdown](/logo.png) 9 | 10 | Convert HTML into Markdown with Go. It is using an [HTML Parser](https://github.com/PuerkitoBio/goquery) to avoid the use of `regexp` as much as possible. That should prevent some [weird cases](https://stackoverflow.com/a/1732454) and allows it to be used for cases where the input is totally unknown. 11 | 12 | ## Installation 13 | 14 | ``` 15 | go get github.com/JohannesKaufmann/html-to-markdown 16 | ``` 17 | 18 | ## Usage 19 | 20 | ```go 21 | import md "github.com/JohannesKaufmann/html-to-markdown" 22 | 23 | converter := md.NewConverter("", true, nil) 24 | 25 | html = `Important` 26 | 27 | markdown, err := converter.ConvertString(html) 28 | if err != nil { 29 | log.Fatal(err) 30 | } 31 | fmt.Println("md ->", markdown) 32 | ``` 33 | 34 | If you are already using [goquery](https://github.com/PuerkitoBio/goquery) you can pass a selection to `Convert`. 35 | 36 | ```go 37 | markdown, err := converter.Convert(selec) 38 | ``` 39 | 40 | ### Using it on the command line 41 | 42 | If you want to make use of `html-to-markdown` on the command line without any Go coding, check out [`html2md`](https://github.com/suntong/html2md#usage), a cli wrapper for `html-to-markdown` that has all the following options and plugins builtin. 43 | 44 | ## Options 45 | 46 | The third parameter to `md.NewConverter` is `*md.Options`. 47 | 48 | For example you can change the character that is around a bold text ("`**`") to a different one (for example "`__`") by changing the value of `StrongDelimiter`. 49 | 50 | ```go 51 | opt := &md.Options{ 52 | StrongDelimiter: "__", // default: ** 53 | // ... 54 | } 55 | converter := md.NewConverter("", true, opt) 56 | ``` 57 | 58 | For all the possible options look at [godocs](https://godoc.org/github.com/JohannesKaufmann/html-to-markdown/#Options) and for a example look at the [example](/examples/options/main.go). 59 | 60 | ## Adding Rules 61 | 62 | ```go 63 | converter.AddRules( 64 | md.Rule{ 65 | Filter: []string{"del", "s", "strike"}, 66 | Replacement: func(content string, selec *goquery.Selection, opt *md.Options) *string { 67 | // You need to return a pointer to a string (md.String is just a helper function). 68 | // If you return nil the next function for that html element 69 | // will be picked. For example you could only convert an element 70 | // if it has a certain class name and fallback if not. 71 | content = strings.TrimSpace(content) 72 | return md.String("~" + content + "~") 73 | }, 74 | }, 75 | // more rules 76 | ) 77 | ``` 78 | 79 | For more information have a look at the example [add_rules](/examples/add_rules/main.go). 80 | 81 | ## Using Plugins 82 | 83 | If you want plugins (github flavored markdown like striketrough, tables, ...) you can pass it to `Use`. 84 | 85 | ```go 86 | import "github.com/JohannesKaufmann/html-to-markdown/plugin" 87 | 88 | // Use the `GitHubFlavored` plugin from the `plugin` package. 89 | converter.Use(plugin.GitHubFlavored()) 90 | ``` 91 | 92 | Or if you only want to use the `Strikethrough` plugin. You can change the character that distinguishes 93 | the text that is crossed out by setting the first argument to a different value (for example "~~" instead of "~"). 94 | 95 | ```go 96 | converter.Use(plugin.Strikethrough("")) 97 | ``` 98 | 99 | For more information have a look at the example [github_flavored](/examples/github_flavored/main.go). 100 | 101 | ## Writing Plugins 102 | 103 | Have a look at the [plugin folder](/plugin) for a reference implementation. The most basic one is [Strikethrough](/plugin/strikethrough.go). 104 | 105 | ## Security 106 | 107 | This library produces markdown that is readable and can be changed by humans. 108 | 109 | Once you convert this markdown back to HTML (e.g. using [goldmark](https://github.com/yuin/goldmark) or [blackfriday](https://github.com/russross/blackfriday)) you need to be careful of malicious content. 110 | 111 | This library does NOT sanitize untrusted content. Use an HTML sanitizer such as [bluemonday](https://github.com/microcosm-cc/bluemonday) before displaying the HTML in the browser. 112 | 113 | ## Other Methods 114 | 115 | [Godoc](https://godoc.org/github.com/JohannesKaufmann/html-to-markdown) 116 | 117 | ### `func (c *Converter) Keep(tags ...string) *Converter` 118 | 119 | Determines which elements are to be kept and rendered as HTML. 120 | 121 | ### `func (c *Converter) Remove(tags ...string) *Converter` 122 | 123 | Determines which elements are to be removed altogether i.e. converted to an empty string. 124 | 125 | ## Issues 126 | 127 | If you find HTML snippets (or even full websites) that don't produce the expected results, please open an issue! 128 | 129 | ## Contributing & Testing 130 | 131 | Please first discuss the change you wish to make, by opening an issue. I'm also happy to guide you to where a change is most likely needed. 132 | 133 | _Note: The outside API should not change because of backwards compatibility..._ 134 | 135 | You don't have to be afraid of breaking the converter, since there are many "Golden File Tests": 136 | 137 | Add your problematic HTML snippet to one of the `input.html` files in the `testdata` folder. Then run `go test -update` and have a look at which `.golden` files changed in GIT. 138 | 139 | You can now change the internal logic and inspect what impact your change has by running `go test -update` again. 140 | 141 | _Note: Before submitting your change as a PR, make sure that you run those tests and check the files into GIT..._ 142 | 143 | ## Related Projects 144 | 145 | - [turndown (js)](https://github.com/domchristie/turndown), a very good library written in javascript. 146 | - [lunny/html2md](https://github.com/lunny/html2md), which is using [regex instead of goquery](https://stackoverflow.com/a/1732454). I came around a few edge case when using it (leaving some html comments, ...) so I wrote my own. 147 | 148 | ## License 149 | 150 | This project is licensed under the terms of the MIT license. 151 | -------------------------------------------------------------------------------- /commonmark_test.go: -------------------------------------------------------------------------------- 1 | package md_test 2 | 3 | import ( 4 | "bytes" 5 | "fmt" 6 | "io/ioutil" 7 | "os" 8 | "path" 9 | "path/filepath" 10 | "strings" 11 | "testing" 12 | 13 | md "github.com/JohannesKaufmann/html-to-markdown" 14 | "github.com/PuerkitoBio/goquery" 15 | "github.com/sebdah/goldie/v2" 16 | "github.com/yuin/goldmark" 17 | "github.com/yuin/goldmark/extension" 18 | ) 19 | 20 | type Variation struct { 21 | Options *md.Options 22 | Plugins []md.Plugin 23 | } 24 | type GoldenTest struct { 25 | Name string 26 | Domain string 27 | 28 | DisableGoldmark bool 29 | Variations map[string]Variation 30 | } 31 | 32 | func runGoldenTest(t *testing.T, test GoldenTest, variationKey string) { 33 | variation := test.Variations[variationKey] 34 | 35 | g := goldie.New(t) 36 | 37 | // testdata/TestCommonmark/name/input.html 38 | p := path.Join(t.Name(), "input.html") 39 | 40 | // get the input html from a file 41 | input, err := ioutil.ReadFile(path.Join("testdata", p)) 42 | if err != nil { 43 | t.Error(err) 44 | return 45 | } 46 | 47 | if test.Domain == "" { 48 | test.Domain = "example.com" 49 | } 50 | 51 | conv := md.NewConverter(test.Domain, true, variation.Options) 52 | conv.Keep("keep-tag").Remove("remove-tag") 53 | for _, plugin := range variation.Plugins { 54 | conv.Use(plugin) 55 | } 56 | markdown, err := conv.ConvertBytes(input) 57 | if err != nil { 58 | t.Error(err) 59 | } 60 | 61 | // testdata/TestCommonmark/name/output.default.golden 62 | p = path.Join(t.Name(), "output."+variationKey) 63 | g.Assert(t, p, markdown) 64 | 65 | gold := goldmark.New(goldmark.WithExtensions(extension.GFM)) 66 | var buf bytes.Buffer 67 | if err := gold.Convert(markdown, &buf); err != nil { 68 | t.Error(err) 69 | } 70 | 71 | if !test.DisableGoldmark { 72 | // testdata/TestCommonmark/name/goldmark.golden 73 | p = path.Join(t.Name(), "goldmark") 74 | g.Assert(t, p, buf.Bytes()) 75 | } 76 | } 77 | 78 | func RunGoldenTest(t *testing.T, tests []GoldenTest) { 79 | // loop through all test cases that were added manually 80 | dirs := make(map[string]struct{}) 81 | for _, test := range tests { 82 | name := test.Name 83 | name = strings.Replace(name, " ", "_", -1) 84 | dirs[name] = struct{}{} 85 | } 86 | 87 | // now add all tests that were found on disk to the tests slice 88 | err := filepath.Walk(path.Join("testdata", t.Name()), 89 | func(p string, info os.FileInfo, err error) error { 90 | if err != nil { 91 | return err 92 | } 93 | if !info.IsDir() { 94 | return nil 95 | } 96 | 97 | // skip folders that don't contain an input.html file 98 | if _, err := os.Stat(path.Join(p, "input.html")); os.IsNotExist(err) { 99 | return nil 100 | } 101 | 102 | parts := strings.SplitN(p, string(os.PathSeparator), 3) 103 | p = parts[2] // remove "testdata/TestCommonmark/" from "testdata/TestCommonmark/..." 104 | 105 | _, ok := dirs[p] 106 | if ok { 107 | return nil 108 | } 109 | 110 | // add the folder from disk to the tests slice, since its not it there yet 111 | tests = append(tests, GoldenTest{ 112 | Name: p, 113 | }) 114 | return nil 115 | }) 116 | if err != nil { 117 | t.Error(err) 118 | return 119 | } 120 | 121 | for _, test := range tests { 122 | if len(test.Variations) == 0 { 123 | test.Variations = map[string]Variation{ 124 | "default": {}, 125 | } 126 | } 127 | 128 | t.Run(test.Name, func(t *testing.T) { 129 | if strings.Contains(t.Name(), "#") { 130 | fmt.Println("the name", test.Name, t.Name(), "seems too be used for multiple tests") 131 | return 132 | } 133 | 134 | for variationKey := range test.Variations { 135 | runGoldenTest(t, test, variationKey) 136 | } 137 | }) 138 | } 139 | } 140 | 141 | func TestCommonmark(t *testing.T) { 142 | var tests = []GoldenTest{ 143 | { 144 | Name: "link", 145 | DisableGoldmark: true, 146 | Variations: map[string]Variation{ 147 | "relative": { 148 | Options: &md.Options{ 149 | GetAbsoluteURL: func(selec *goquery.Selection, rawURL string, domain string) string { 150 | return rawURL 151 | }, 152 | }, 153 | }, 154 | 155 | "inlined": { 156 | Options: &md.Options{LinkStyle: "inlined"}, 157 | }, 158 | "referenced_full": { 159 | Options: &md.Options{LinkStyle: "referenced", LinkReferenceStyle: "full"}, 160 | }, 161 | "referenced_collapsed": { 162 | Options: &md.Options{LinkStyle: "referenced", LinkReferenceStyle: "collapsed"}, 163 | }, 164 | "referenced_shortcut": { 165 | Options: &md.Options{LinkStyle: "referenced", LinkReferenceStyle: "shortcut"}, 166 | }, 167 | }, 168 | }, 169 | { 170 | Name: "heading", 171 | Variations: map[string]Variation{ 172 | "atx": { 173 | Options: &md.Options{HeadingStyle: "atx"}, 174 | }, 175 | "setext": { 176 | Options: &md.Options{HeadingStyle: "setext"}, 177 | }, 178 | }, 179 | }, 180 | { 181 | Name: "italic", 182 | Variations: map[string]Variation{ 183 | "asterisks": { 184 | Options: &md.Options{EmDelimiter: "*"}, 185 | }, 186 | "underscores": { 187 | Options: &md.Options{EmDelimiter: "_"}, 188 | }, 189 | }, 190 | }, 191 | { 192 | Name: "bold", 193 | Variations: map[string]Variation{ 194 | "asterisks": { 195 | Options: &md.Options{StrongDelimiter: "**"}, 196 | }, 197 | "underscores": { 198 | Options: &md.Options{StrongDelimiter: "__"}, 199 | }, 200 | }, 201 | }, 202 | { 203 | Name: "pre_code", 204 | Variations: map[string]Variation{ 205 | "indented": { 206 | Options: &md.Options{CodeBlockStyle: "indented"}, 207 | }, 208 | "fenced_backtick": { 209 | Options: &md.Options{CodeBlockStyle: "fenced", Fence: "```"}, 210 | }, 211 | "fenced_tilde": { 212 | Options: &md.Options{CodeBlockStyle: "fenced", Fence: "~~~"}, 213 | }, 214 | }, 215 | }, 216 | { 217 | Name: "list", 218 | Variations: map[string]Variation{ 219 | "asterisks": { 220 | Options: &md.Options{BulletListMarker: "*"}, 221 | }, 222 | "dash": { 223 | Options: &md.Options{BulletListMarker: "-"}, 224 | }, 225 | "plus": { 226 | Options: &md.Options{BulletListMarker: "+"}, 227 | }, 228 | }, 229 | }, 230 | { 231 | Name: "list_nested", 232 | Variations: map[string]Variation{ 233 | "asterisks": { 234 | Options: &md.Options{BulletListMarker: "*"}, 235 | }, 236 | "dash": { 237 | Options: &md.Options{BulletListMarker: "-"}, 238 | }, 239 | "plus": { 240 | Options: &md.Options{BulletListMarker: "+"}, 241 | }, 242 | }, 243 | }, 244 | // + all the test on disk that are added automatically 245 | } 246 | 247 | RunGoldenTest(t, tests) 248 | } 249 | 250 | func TestRealWorld(t *testing.T) { 251 | var tests = []GoldenTest{ 252 | { 253 | Name: "blog.golang.org", 254 | Domain: "blog.golang.org", 255 | Variations: map[string]Variation{ 256 | "inlined": { 257 | Options: &md.Options{LinkStyle: "inlined"}, 258 | }, 259 | "referenced_full": { 260 | Options: &md.Options{LinkStyle: "referenced", LinkReferenceStyle: "full"}, 261 | }, 262 | "referenced_collapsed": { 263 | Options: &md.Options{LinkStyle: "referenced", LinkReferenceStyle: "collapsed"}, 264 | }, 265 | "referenced_shortcut": { 266 | Options: &md.Options{LinkStyle: "referenced", LinkReferenceStyle: "shortcut"}, 267 | }, 268 | 269 | "emphasis_asterisks": { 270 | Options: &md.Options{EmDelimiter: "*", StrongDelimiter: "**"}, 271 | }, 272 | "emphasis_underscores": { 273 | Options: &md.Options{EmDelimiter: "_", StrongDelimiter: "__"}, 274 | }, 275 | }, 276 | }, 277 | { 278 | Name: "golang.org", 279 | Domain: "golang.org", 280 | }, 281 | { 282 | Name: "bonnerruderverein.de", 283 | Domain: "bonnerruderverein.de", 284 | }, 285 | // + all the test on disk that are added automatically 286 | } 287 | RunGoldenTest(t, tests) 288 | } 289 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/bonnerruderverein.de/output.default.golden: -------------------------------------------------------------------------------- 1 | Bonner Ruder-Verein 1882 e.V. 2 | 3 | [![Logo](https://www.bonnerruderverein.de/wp-content/uploads/2014/12/Logo-BRV_120.png)](http://www.bonnerruderverein.de) 4 | 5 | # Bonner Ruder-Verein 1882 e.V. 6 | 7 | Menu 8 | 9 | -  10 | 11 | - [Startseite](http://www.bonnerruderverein.de/) 12 | - [Über uns](http://www.bonnerruderverein.de/ueber-uns/informationen/) 13 | - [Allgemeine Informationen](http://www.bonnerruderverein.de/ueber-uns/informationen/) 14 | - [Der Vorstand](http://www.bonnerruderverein.de/ueber-uns/der-vorstand-des-bonner-rudervereins-1882-e-v/) 15 | - [Mitgliedschaft](http://www.bonnerruderverein.de/ueber-uns/mitgliedschaft/) 16 | - [Geschichte des BRV](http://www.bonnerruderverein.de/bootshaus/raeumlichkeiten/) 17 | - [Gegenwärtige Geschichte](http://www.bonnerruderverein.de/die-gegenwaertige-geschichte/) 18 | - [Die 60er und 70er Jahre](http://www.bonnerruderverein.de/die-sechziger-und-siebziger-jahre/) 19 | - [Nachkriegszeit](http://www.bonnerruderverein.de/die-20er-und-30er-jahre/) 20 | - [1882-1934](http://www.bonnerruderverein.de/bootshaus/veranstaltungsort/) 21 | - [Alles zum Rudern](http://www.bonnerruderverein.de/alles-zum-rudern/ruderbetrieb/) 22 | - [Allgemeiner Ruderbetrieb](http://www.bonnerruderverein.de/alles-zum-rudern/ruderbetrieb/) 23 | - [Wanderrudern](http://www.bonnerruderverein.de/alles-zum-rudern/wanderrudern/) 24 | - [Langstreckenrudern](http://www.bonnerruderverein.de/termine/kategorie/langstreckenrudern/) 25 | - [Das Blaue Band](http://www.bonnerruderverein.de/das-blaue-band/) 26 | - [Jugendriege](http://www.bonnerruderverein.de/alles-zum-rudern/jugendriege/) 27 | - [Anfängerausbildung](http://www.bonnerruderverein.de/alles-zum-rudern/anfaengerausbildung/) 28 | - [Unser Bootshaus](http://www.bonnerruderverein.de/bootshaus/unser-bootshaus/) 29 | - [Aktuelles](http://www.bonnerruderverein.de/aktuelles/) 30 | - [Termine/Veranstaltungen](http://www.bonnerruderverein.de/termine) 31 | - [Nützliche Links](http://www.bonnerruderverein.de/partner/) 32 | - [BRV-Intern](http://www.bonnerruderverein.de/mitglieder/) 33 | 34 | * * * 35 | 36 | - ![Anrudern18](http://www.bonnerruderverein.de/wp-content/uploads/2018/04/Anrudern18.jpg) 37 | - ![Bootshaus-01.06](http://www.bonnerruderverein.de/wp-content/uploads/2017/06/Bootshaus-01.06.jpg) 38 | - ![K50_4233-LR](http://www.bonnerruderverein.de/wp-content/uploads/2015/11/K50_4233-LR.jpg) 39 | - ![Rhein-Drachenfels-Boot-F-Stender](http://www.bonnerruderverein.de/wp-content/uploads/2016/01/Rhein-Drachenfels-Boot-F-Stender.jpg) 40 | - ![Eurega_2016](http://www.bonnerruderverein.de/wp-content/uploads/2016/05/Eurega_2016.jpg) 41 | 42 | # Rudern in unserem Verein bedeutet… 43 | 44 | ein unvergleichlich abwechslungsreiches Ruderrevier, ein Bootshaus in einer traumhaften Lage, eine liebevoll geführte Vereinsgastronomie, ein Top-Bootspark und – das Wichtigste – eine bunte Mischung aus interessanten, interessierten und engagierten Mitgliedern. 45 | 46 | Im Mittelpunkt der Aktivitäten steht der Breitensport mit nahezu täglichen Ruderangeboten sowie regelmäßigen Wanderfahrten auf den unterschiedlichsten Flüssen Deutschlands und Europas. Rudern im BRV heißt, Ausgleichssport mit netten Leuten, ein, zwei Stunden Bewegung an der frischen Luft, und zum Abschluss ein Getränk in netter Runde. Wir trainieren nicht für Kurzstrecken-Rennen oder Meisterschaften. Wer jedoch Wanderfahrten liebt, also mehrtägige Ruder-Reisen mit Tages-Etappen von 30 bis 40 Kilometern, ist beim BRV gut aufgehoben. 47 | 48 | # Aktuelles 49 | 50 | [BRV-abend](http://www.bonnerruderverein.de/wp-content/uploads/2015/09/BRV-abend.jpg "BRV-abend") 51 | 52 | 25 Mai 53 | 54 | ![](http://www.bonnerruderverein.de/wp-content/uploads/2015/09/BRV-abend.jpg) 55 | 56 | ### [9\. Bonner Nachtlauf - Einschränkungen am Bootshaus](http://www.bonnerruderverein.de/bonner-nachtlauf/) 57 | 58 | am Mittwoch, dem 30. Mai 2018 findet am Bonner Rheinufer der 9. ... 59 | [More](http://www.bonnerruderverein.de/bonner-nachtlauf/) 60 | 61 | # Kommende Veranstaltungen… 62 | 63 | 1. #### [Fronleichnamsfahrt](http://www.bonnerruderverein.de/termin/fronleichnamsfahrt-2/) 64 | 65 | 66 | 26\. Mai \- 3\. Juni 67 | 68 | 2. #### [“Anfängerfahrt” auf der Lahn](http://www.bonnerruderverein.de/termin/anfaenger-wanderfahrt-2/) 69 | 70 | 71 | 8\. Juni \- 10\. Juni 72 | 73 | 3. #### [Oste-Marathon](http://www.bonnerruderverein.de/termin/oste-marathon-2/) 74 | 75 | 76 | 15\. Juni \- 17\. Juni 77 | 78 | 4. #### [Vorstandssitzung Juni](http://www.bonnerruderverein.de/termin/vorstandssitzung/) 79 | 80 | 81 | 15\. Juni @ 19:00 \- 22:00 82 | 83 | 84 | [Alle Veranstaltungen anzeigen](http://www.bonnerruderverein.de/termine/) 85 | 86 | > “Wir kommen rückwärts vorwärts, wie die Ruderer.” 87 | > \- Michel de Montaigne (1533-1592), französischer Philosoph und Essayist 88 | 89 | ##### Eindrücke 90 | 91 | [Rudern auf der Vilaine in Frankreich](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Vilaine.jpg "Rudern auf der Vilaine in Frankreich") 92 | 93 | ![](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Vilaine-900x600.jpg) 94 | 95 | [Abends am Strand in Oberkassel](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Oberkassel-2.jpg "Abends am Strand in Oberkassel") 96 | 97 | ![](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Oberkassel-2-900x600.jpg) 98 | 99 | [Neckarschleuse in Heilbronn](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Neckarschleuse.jpg "Neckarschleuse in Heilbronn") 100 | 101 | ![](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Neckarschleuse-900x600.jpg) 102 | 103 | [Auf dem Main](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Main.jpg "Auf dem Main") 104 | 105 | ![](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Main-900x600.jpg) 106 | 107 | [Hochwasser](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Hochwasser.jpg "Hochwasser") 108 | 109 | ![](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Hochwasser-900x600.jpg) 110 | 111 | [Rudern in Friesland](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Heeg.jpg "Rudern in Friesland") 112 | 113 | ![](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Heeg-900x600.jpg) 114 | 115 | [Gepackt für die Wanderfahrt](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Haenger.jpg "Gepackt für die Wanderfahrt") 116 | 117 | ![](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Haenger-900x600.jpg) 118 | 119 | [Karo Dame in Leeuwarden](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Elfsteden.jpg "Karo Dame in Leeuwarden") 120 | 121 | ![](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Elfsteden-900x600.jpg) 122 | 123 | [Rudern auf dem Comer See](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Comer-see.jpg "Rudern auf dem Comer See") 124 | 125 | ![](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Comer-see-900x600.jpg) 126 | 127 | [Abends auf dem Rhein](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Abend.jpg "Abends auf dem Rhein") 128 | 129 | ![](http://www.bonnerruderverein.de/wp-content/uploads/2015/08/Abend-900x600.jpg) 130 | 131 | ### Anfahrt 132 | 133 | [![Klicken, um eine größere Karte zu öffnen](http://maps.googleapis.com/maps/api/staticmap?key=AIzaSyBgMsrXWkCAcRhjd7DVQ9PQQB1ZqSjRhI4&scale=1&format=png&zoom=13&size=250x250&language=en&maptype=roadmap&markers=size%3Adefault%7Ccolor%3A0xff0000%7Clabel%3AA%7CBonner+Ruder-Verein+1882+e.V.+%2C+Wilhelm-Spiritusufer+2%2C+53113+Bonn+¢er=Bonner+Ruder-Verein+1882+e.V.+%2C+Wilhelm-Spiritusufer+2%2C+53113+Bonn+)](http://bonnerruderverein.de#gmw-dialog-googlemapswidget-2 "Klicken, um eine größere Karte zu öffnen") 134 | 135 | ### Kontakt 136 | 137 | Bonner Ruder-Verein 1882 e.V. 138 | 139 | Wilhelm-Spiritusufer 2 140 | 141 | 53113 Bonn 142 | 143 | Tel. 0176 322 888 97 144 | 145 | Email: [info@bonnerruderverein.de](mailto:info@bonnerruderverein.de) 146 | 147 | Bankverbindung: 148 | 149 | IBAN DE15370501980031027535 150 | 151 | BIC COLSDE33 152 | 153 | Sparkasse Köln/Bonn 154 | 155 | ### Informationen 156 | 157 | [Rhein Pegel Bonn vom WSA Köln](http://www.bafg.de/php/BONNRHEINW.htm) 158 | 159 | [alle Rhein Pegel auf ELWIS](https://www.elwis.de/DE/dynamisch/gewaesserkunde/wasserstaende/index.php?target=2&gw=RHEIN) 160 | 161 | Für uns ist die Hochwassermarke II am Pegel Oberwinter maßgeblich 162 | 163 | [Siegpegel Eitorf](http://luadb.it.nrw.de/LUA/hygon/pegel.php?stationsname=Eitorf&yAchse=Standard&nachSuche=&hoehe=468&breite=724&datum=2016-07-17&progn=&meindatum=17.07.2016&yAchse=Standard&ersteWoche=7-Tageslinie&meifocus=&neuname=) 164 | 165 | [Impressum & Datenschutzerklärung](http://www.bonnerruderverein.de/impressum/) 166 | 167 | [Sitemap](http://www.bonnerruderverein.de/sitemap/) -------------------------------------------------------------------------------- /utils_test.go: -------------------------------------------------------------------------------- 1 | package md 2 | 3 | import ( 4 | "strings" 5 | "testing" 6 | 7 | "github.com/PuerkitoBio/goquery" 8 | "golang.org/x/net/html" 9 | ) 10 | 11 | func getNodeFromString(t *testing.T, rawHTML string) *html.Node { 12 | docNode, err := html.Parse(strings.NewReader(rawHTML)) 13 | if err != nil { 14 | t.Error(err) 15 | return nil 16 | } 17 | 18 | // -> #document -> body -> actuall content 19 | return docNode.FirstChild.LastChild.FirstChild 20 | } 21 | 22 | func TestAddSpaceIfNessesary(t *testing.T) { 23 | var tests = []struct { 24 | Name string 25 | 26 | Prev string 27 | Next string 28 | Markdown string 29 | 30 | Expect string 31 | }{ 32 | { 33 | Name: "dont count comment", 34 | Prev: ` 35 | 36 | `, 37 | Next: ` 38 | 39 | `, 40 | Markdown: `_Comment Content_`, 41 | Expect: `_Comment Content_`, 42 | }, 43 | { 44 | 45 | Name: "bold with break", 46 | Prev: `
`, 47 | Next: `
`, 48 | Markdown: `**Bold**`, 49 | Expect: `**Bold**`, 50 | }, 51 | { 52 | Name: "italic with no space", 53 | Prev: ``, 54 | Next: `and no space afterward.`, // #text 55 | Markdown: `_Content_`, 56 | Expect: `_Content_ `, 57 | }, 58 | { 59 | Name: "bold with no space", 60 | Prev: `Some`, 61 | Next: `Text`, 62 | Markdown: `**Bold**`, 63 | Expect: ` **Bold** `, 64 | }, 65 | { 66 | Name: "bold with no space in span", 67 | Prev: `Some`, 68 | Next: `Text`, 69 | Markdown: `**Bold**`, 70 | Expect: ` **Bold** `, 71 | }, 72 | { 73 | Name: "italic with no space", 74 | Prev: ``, 75 | Next: `and no space afterward.`, 76 | Markdown: `_Content_`, 77 | Expect: `_Content_ `, 78 | }, 79 | { 80 | Name: "github example without new lines", 81 | Prev: `go`, 82 | Next: `html`, 83 | Markdown: `[golang](http://example.com/topics/golang "Topic: golang")`, 84 | Expect: ` [golang](http://example.com/topics/golang "Topic: golang")`, 85 | }, 86 | { 87 | Name: "github example", 88 | Prev: ` 89 | go 90 | `, 91 | Next: ` 92 | html 93 | `, 94 | Markdown: `[golang](http://example.com/topics/golang "Topic: golang")`, 95 | Expect: ` [golang](http://example.com/topics/golang "Topic: golang")`, 96 | }, 97 | } 98 | 99 | for _, test := range tests { 100 | t.Run(test.Name, func(t *testing.T) { 101 | 102 | // build a selection for goquery with siblings 103 | selec := &goquery.Selection{ 104 | Nodes: []*html.Node{ 105 | { 106 | Data: "a", 107 | PrevSibling: getNodeFromString(t, test.Prev), 108 | NextSibling: getNodeFromString(t, test.Next), 109 | }, 110 | }, 111 | } 112 | output := AddSpaceIfNessesary(selec, test.Markdown) 113 | 114 | if output != test.Expect { 115 | t.Errorf("expected '%s' but got '%s'", test.Expect, output) 116 | } 117 | }) 118 | } 119 | } 120 | 121 | func TestTrimpLeadingSpaces(t *testing.T) { 122 | var tests = []struct { 123 | Name string 124 | Text string 125 | Expect string 126 | }{ 127 | { 128 | Name: "trim normal text", 129 | Text: ` 130 | This is a normal paragraph 131 | this as well 132 | just with some spaces before 133 | `, 134 | Expect: ` 135 | This is a normal paragraph 136 | this as well 137 | just with some spaces before 138 | `, 139 | }, 140 | { 141 | Name: "dont trim nested lists", 142 | Text: ` 143 | - Home 144 | - About 145 | - People 146 | - History 147 | - 2019 148 | - 2020 149 | `, 150 | Expect: ` 151 | - Home 152 | - About 153 | - People 154 | - History 155 | - 2019 156 | - 2020 157 | `, 158 | }, 159 | { 160 | Name: "dont trim list with multiple paragraphs", 161 | Text: ` 162 | 1. This is a list item with two paragraphs. Lorem ipsum dolor 163 | sit amet, consectetuer adipiscing elit. Aliquam hendrerit 164 | mi posuere lectus. 165 | 166 | Vestibulum enim wisi, viverra nec, fringilla in, laoreet 167 | vitae, risus. Donec sit amet nisl. Aliquam semper ipsum 168 | sit amet velit. 169 | 170 | 2. Suspendisse id sem consectetuer libero luctus adipiscing. 171 | `, 172 | Expect: ` 173 | 1. This is a list item with two paragraphs. Lorem ipsum dolor 174 | sit amet, consectetuer adipiscing elit. Aliquam hendrerit 175 | mi posuere lectus. 176 | 177 | Vestibulum enim wisi, viverra nec, fringilla in, laoreet 178 | vitae, risus. Donec sit amet nisl. Aliquam semper ipsum 179 | sit amet velit. 180 | 181 | 2. Suspendisse id sem consectetuer libero luctus adipiscing. 182 | `, 183 | }, 184 | { 185 | Name: "dont trim code blocks", 186 | Text: ` 187 | This is a normal paragraph: 188 | 189 | This is a code block. 190 | `, 191 | Expect: ` 192 | This is a normal paragraph: 193 | 194 | This is a code block. 195 | `, 196 | }, 197 | } 198 | 199 | for _, test := range tests { 200 | t.Run(test.Name, func(t *testing.T) { 201 | output := TrimpLeadingSpaces(test.Text) 202 | 203 | if output != test.Expect { 204 | t.Errorf("expected '%s' but got '%s'", test.Expect, output) 205 | } 206 | }) 207 | } 208 | 209 | } 210 | 211 | func TestTrimTrailingSpaces(t *testing.T) { 212 | var tests = []struct { 213 | Name string 214 | Text string 215 | Expect string 216 | }{ 217 | { 218 | Name: "trim after normal text", 219 | Text: ` 220 | 1\. xxx 221 | 222 | 2\. xxxx 223 | `, 224 | Expect: ` 225 | 1\. xxx 226 | 227 | 2\. xxxx 228 | `, 229 | }, 230 | { 231 | Name: "dont trim inside normal text", 232 | Text: "When `x = 3`, that means `x + 2 = 5`", 233 | Expect: "When `x = 3`, that means `x + 2 = 5`", 234 | }, 235 | } 236 | 237 | for _, test := range tests { 238 | t.Run(test.Name, func(t *testing.T) { 239 | output := TrimTrailingSpaces(test.Text) 240 | 241 | if output != test.Expect { 242 | t.Errorf("expected '%s' but got '%s'", test.Expect, output) 243 | } 244 | }) 245 | } 246 | } 247 | 248 | func TestEscapeMultiLine(t *testing.T) { 249 | var tests = []struct { 250 | Name string 251 | Text string 252 | Expect string 253 | }{ 254 | { 255 | Name: "escape new lines", 256 | Text: `line1 257 | line2 258 | 259 | line3 260 | 261 | 262 | 263 | 264 | 265 | 266 | 267 | 268 | 269 | line4`, 270 | Expect: `line1\ 271 | line2\ 272 | \ 273 | line3\ 274 | \ 275 | line4`, 276 | }, 277 | } 278 | 279 | for _, test := range tests { 280 | t.Run(test.Name, func(t *testing.T) { 281 | output := EscapeMultiLine(test.Text) 282 | 283 | if output != test.Expect { 284 | t.Errorf("expected '%s' but got '%s'", test.Expect, output) 285 | } 286 | }) 287 | } 288 | } 289 | 290 | func TestCalculateCodeFence(t *testing.T) { 291 | var tests = []struct { 292 | Name string 293 | FenceChar rune 294 | 295 | Text string 296 | Expect string 297 | }{ 298 | { 299 | Name: "no occurrences with backtick", 300 | FenceChar: '`', 301 | Text: `normal ~~~ code block`, 302 | Expect: "```", 303 | }, 304 | { 305 | Name: "no occurrences with tilde", 306 | FenceChar: '~', 307 | Text: "normal ``` code block", 308 | Expect: "~~~", 309 | }, 310 | { 311 | Name: "one exact occurrence", 312 | FenceChar: '`', 313 | Text: "```", 314 | Expect: "````", 315 | }, 316 | { 317 | Name: "one occurrences with backtick", 318 | FenceChar: '`', 319 | Text: "normal ``` code block", 320 | Expect: "````", 321 | }, 322 | { 323 | Name: "one bigger occurrences with backtick", 324 | FenceChar: '`', 325 | Text: "normal ````` code block", 326 | Expect: "``````", 327 | }, 328 | { 329 | Name: "multiple occurrences with backtick", 330 | FenceChar: '`', 331 | Text: "normal ``` code `````` block", 332 | Expect: "```````", 333 | }, 334 | { 335 | Name: "multiple occurrences with tilde", 336 | FenceChar: '~', 337 | Text: "normal ~~~ code ~~~~~~~~~~~~ block", 338 | Expect: "~~~~~~~~~~~~~", 339 | }, 340 | { 341 | Name: "multiple occurrences on different lines with tilde", 342 | FenceChar: '~', 343 | Text: ` 344 | normal 345 | ~~~ 346 | code ~~~~~~~~~~~~ block 347 | `, 348 | Expect: "~~~~~~~~~~~~~", 349 | }, 350 | } 351 | 352 | for _, test := range tests { 353 | t.Run(test.Name, func(t *testing.T) { 354 | output := CalculateCodeFence(test.FenceChar, test.Text) 355 | 356 | if output != test.Expect { 357 | t.Errorf("expected '%s' (x%d) but got '%s' (x%d)", test.Expect, strings.Count(test.Expect, string(test.FenceChar)), output, strings.Count(output, string(test.FenceChar))) 358 | } 359 | }) 360 | } 361 | } 362 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/snippets/code_design_heading_in_link/input.html: -------------------------------------------------------------------------------- 1 | 2 | 60 | -------------------------------------------------------------------------------- /testdata/TestRealWorld/blog.golang.org/output.inlined.golden: -------------------------------------------------------------------------------- 1 | Godoc: documenting Go code - The Go Blog 2 | 3 | [The Go Programming Language](http://golang.org/) 4 | 5 | [Go](http://golang.org/) 6 | 7 | ▽ 8 | 9 | [Documents](http://golang.org/doc/) [Packages](http://golang.org/pkg/) [The Project](http://golang.org/project/) [Help](http://golang.org/help/) [Blog](http://blog.golang.org/)submit search 10 | 11 | #### Next article 12 | 13 | [Introducing Gofix](http://blog.golang.org/introducing-gofix) 14 | 15 | #### Previous article 16 | 17 | [Gobs of data](http://blog.golang.org/gobs-of-data) 18 | 19 | #### Links 20 | 21 | - [golang.org](http://golang.org/) 22 | - [Install Go](http://golang.org/doc/install.html) 23 | - [A Tour of Go](http://tour.golang.org/) 24 | - [Go Documentation](http://golang.org/doc/) 25 | - [Go Mailing List](http://groups.google.com/group/golang-nuts) 26 | - [Go on Google+](http://plus.google.com/101406623878176903605) 27 | - [Go+ Community](http://plus.google.com/communities/114112804251407510571) 28 | - [Go on Twitter](http://twitter.com/golang) 29 | 30 | [Blog index](http://blog.golang.org/index) 31 | 32 | # [The Go Blog](http://blog.golang.org/) 33 | 34 | ### [Godoc: documenting Go code](http://blog.golang.org/godoc-documenting-go-code) 35 | 36 | 31 March 2011 37 | 38 | The Go project takes documentation seriously. Documentation is a huge part of making software accessible and maintainable. 39 | Of course it must be well-written and accurate, but it also must be easy to write and to maintain. Ideally, it 40 | should be coupled to the code itself so the documentation evolves along with the code. The easier it is for programmers 41 | to produce good documentation, the better for everyone. 42 | 43 | 44 | To that end, we have developed the 45 | [godoc](https://golang.org/cmd/godoc/) documentation tool. This article describes godoc's approach to documentation, and explains how 46 | you can use our conventions and tools to write good documentation for your own projects. 47 | 48 | 49 | Godoc parses Go source code - including comments - and produces documentation as HTML or plain text. The end result is documentation 50 | tightly coupled with the code it documents. For example, through godoc's web interface you can navigate from 51 | a function's 52 | [documentation](https://golang.org/pkg/strings/#HasPrefix) to its 53 | [implementation](https://golang.org/src/pkg/strings/strings.go#L493) with one click. 54 | 55 | 56 | Godoc is conceptually related to Python's 57 | [Docstring](http://www.python.org/dev/peps/pep-0257/) and Java's 58 | [Javadoc](http://www.oracle.com/technetwork/java/javase/documentation/index-jsp-135444.html), but its design is simpler. The comments read by godoc are not language constructs (as with Docstring) 59 | nor must they have their own machine-readable syntax (as with Javadoc). Godoc comments are just good comments, 60 | the sort you would want to read even if godoc didn't exist. 61 | 62 | 63 | The convention is simple: to document a type, variable, constant, function, or even a package, write a regular comment directly 64 | preceding its declaration, with no intervening blank line. Godoc will then present that comment as text alongside 65 | the item it documents. For example, this is the documentation for the 66 | `fmt` package's 67 | [`Fprint`](https://golang.org/pkg/fmt/#Fprint) function: 68 | 69 | 70 | ``` 71 | // Fprint formats using the default formats for its operands and writes to w. 72 | // Spaces are added between operands when neither is a string. 73 | // It returns the number of bytes written and any write error encountered. 74 | func Fprint(w io.Writer, a ...interface{}) (n int, err error) { 75 | ``` 76 | 77 | Notice this comment is a complete sentence that begins with the name of the element it describes. This important convention 78 | allows us to generate documentation in a variety of formats, from plain text to HTML to UNIX man pages, and makes 79 | it read better when tools truncate it for brevity, such as when they extract the first line or sentence. 80 | 81 | 82 | Comments on package declarations should provide general package documentation. These comments can be short, like the 83 | [`sort`](https://golang.org/pkg/sort/) package's brief description: 84 | 85 | 86 | ``` 87 | // Package sort provides primitives for sorting slices and user-defined 88 | // collections. 89 | package sort 90 | ``` 91 | 92 | They can also be detailed like the 93 | [gob package](https://golang.org/pkg/encoding/gob/)'s overview. That package uses another convention for packages that need large amounts of 94 | introductory documentation: the package comment is placed in its own file, 95 | [doc.go](https://golang.org/src/pkg/encoding/gob/doc.go), which contains only those comments and a package clause. 96 | 97 | 98 | When writing package comments of any size, keep in mind that their first sentence will appear in godoc's 99 | [package list](https://golang.org/pkg/). 100 | 101 | 102 | Comments that are not adjacent to a top-level declaration are omitted from godoc's output, with one notable exception. 103 | Top-level comments that begin with the word 104 | `"BUG(who)”` are recognized as known bugs, and included in the "Bugs” section of the package documentation. The "who” 105 | part should be the user name of someone who could provide more information. For example, this is a known issue 106 | from the 107 | [bytes package](https://golang.org/pkg/bytes/#pkg-note-BUG): 108 | 109 | 110 | ``` 111 | // BUG(r): The rule Title uses for word boundaries does not handle Unicode punctuation properly. 112 | ``` 113 | 114 | Sometimes a struct field, function, type, or even a whole package becomes redundant or unnecessary, but must be kept for 115 | compatibility with existing programs. To signal that an identifier should not be used, add a paragraph to its 116 | doc comment that begins with "Deprecated:" followed by some information about the deprecation. There 117 | are a few examples 118 | [in the standard library](https://golang.org/search?q=Deprecated:). 119 | 120 | 121 | There are a few formatting rules that Godoc uses when converting comments to HTML: 122 | 123 | 124 | - Subsequent lines of text are considered part of the same paragraph; you must leave a blank line to separate paragraphs. 125 | 126 | - Pre-formatted text must be indented relative to the surrounding comment text (see gob's 127 | [doc.go](https://golang.org/src/pkg/encoding/gob/doc.go) for an example). 128 | 129 | - URLs will be converted to HTML links; no special markup is necessary. 130 | 131 | Note that none of these rules requires you to do anything out of the ordinary. 132 | 133 | 134 | In fact, the best thing about godoc's minimal approach is how easy it is to use. As a result, a lot of Go code, including 135 | all of the standard library, already follows the conventions. 136 | 137 | 138 | Your own code can present good documentation just by having comments as described above. Any Go packages installed inside 139 | `$GOROOT/src/pkg` and any 140 | `GOPATH` work spaces will already be accessible via godoc's command-line and HTTP interfaces, and you can specify 141 | additional paths for indexing via the 142 | `-path` flag or just by running 143 | `"godoc ."` in the source directory. See the 144 | [godoc documentation](https://golang.org/cmd/godoc/) for more details. 145 | 146 | 147 | By Andrew Gerrand 148 | 149 | ## Related articles 150 | 151 | - [HTTP/2 Server Push](http://blog.golang.org/h2push) 152 | - [Introducing HTTP Tracing](http://blog.golang.org/http-tracing) 153 | - [Testable Examples in Go](http://blog.golang.org/examples) 154 | - [Generating code](http://blog.golang.org/generate) 155 | - [Introducing the Go Race Detector](http://blog.golang.org/race-detector) 156 | - [Go maps in action](http://blog.golang.org/go-maps-in-action) 157 | - [go fmt your code](http://blog.golang.org/go-fmt-your-code) 158 | - [Organizing Go code](http://blog.golang.org/organizing-go-code) 159 | - [Debugging Go programs with the GNU Debugger](http://blog.golang.org/debugging-go-programs-with-gnu-debugger) 160 | - [The Go image/draw package](http://blog.golang.org/go-imagedraw-package) 161 | - [The Go image package](http://blog.golang.org/go-image-package) 162 | - [The Laws of Reflection](http://blog.golang.org/laws-of-reflection) 163 | - [Error handling and Go](http://blog.golang.org/error-handling-and-go) 164 | - ["First Class Functions in Go"](http://blog.golang.org/first-class-functions-in-go-and-new-go) 165 | - [Profiling Go Programs](http://blog.golang.org/profiling-go-programs) 166 | - [A GIF decoder: an exercise in Go interfaces](http://blog.golang.org/gif-decoder-exercise-in-go-interfaces) 167 | - [Introducing Gofix](http://blog.golang.org/introducing-gofix) 168 | - [Gobs of data](http://blog.golang.org/gobs-of-data) 169 | - [C? Go? Cgo!](http://blog.golang.org/c-go-cgo) 170 | - [JSON and Go](http://blog.golang.org/json-and-go) 171 | - [Go Slices: usage and internals](http://blog.golang.org/go-slices-usage-and-internals) 172 | - [Go Concurrency Patterns: Timing out, moving on](http://blog.golang.org/go-concurrency-patterns-timing-out-and) 173 | - [Defer, Panic, and Recover](http://blog.golang.org/defer-panic-and-recover) 174 | - [Share Memory By Communicating](http://blog.golang.org/share-memory-by-communicating) 175 | - [JSON-RPC: a tale of interfaces](http://blog.golang.org/json-rpc-tale-of-interfaces) 176 | 177 | Except as 178 | [noted](https://developers.google.com/site-policies#restrictions), the content of this page is licensed under the Creative Commons Attribution 3.0 License, 179 | 180 | 181 | and code is licensed under a 182 | [BSD license](http://golang.org/LICENSE). 183 | 184 | 185 | [Terms of Service](http://golang.org/doc/tos.html) \| 186 | [Privacy Policy](http://www.google.com/intl/en/policies/privacy/) \| 187 | [View the source code](https://go.googlesource.com/blog/) -------------------------------------------------------------------------------- /testdata/TestRealWorld/blog.golang.org/output.emphasis_asterisks.golden: -------------------------------------------------------------------------------- 1 | Godoc: documenting Go code - The Go Blog 2 | 3 | [The Go Programming Language](http://golang.org/) 4 | 5 | [Go](http://golang.org/) 6 | 7 | ▽ 8 | 9 | [Documents](http://golang.org/doc/) [Packages](http://golang.org/pkg/) [The Project](http://golang.org/project/) [Help](http://golang.org/help/) [Blog](http://blog.golang.org/)submit search 10 | 11 | #### Next article 12 | 13 | [Introducing Gofix](http://blog.golang.org/introducing-gofix) 14 | 15 | #### Previous article 16 | 17 | [Gobs of data](http://blog.golang.org/gobs-of-data) 18 | 19 | #### Links 20 | 21 | - [golang.org](http://golang.org/) 22 | - [Install Go](http://golang.org/doc/install.html) 23 | - [A Tour of Go](http://tour.golang.org/) 24 | - [Go Documentation](http://golang.org/doc/) 25 | - [Go Mailing List](http://groups.google.com/group/golang-nuts) 26 | - [Go on Google+](http://plus.google.com/101406623878176903605) 27 | - [Go+ Community](http://plus.google.com/communities/114112804251407510571) 28 | - [Go on Twitter](http://twitter.com/golang) 29 | 30 | [Blog index](http://blog.golang.org/index) 31 | 32 | # [The Go Blog](http://blog.golang.org/) 33 | 34 | ### [Godoc: documenting Go code](http://blog.golang.org/godoc-documenting-go-code) 35 | 36 | 31 March 2011 37 | 38 | The Go project takes documentation seriously. Documentation is a huge part of making software accessible and maintainable. 39 | Of course it must be well-written and accurate, but it also must be easy to write and to maintain. Ideally, it 40 | should be coupled to the code itself so the documentation evolves along with the code. The easier it is for programmers 41 | to produce good documentation, the better for everyone. 42 | 43 | 44 | To that end, we have developed the 45 | [godoc](https://golang.org/cmd/godoc/) documentation tool. This article describes godoc's approach to documentation, and explains how 46 | you can use our conventions and tools to write good documentation for your own projects. 47 | 48 | 49 | Godoc parses Go source code - including comments - and produces documentation as HTML or plain text. The end result is documentation 50 | tightly coupled with the code it documents. For example, through godoc's web interface you can navigate from 51 | a function's 52 | [documentation](https://golang.org/pkg/strings/#HasPrefix) to its 53 | [implementation](https://golang.org/src/pkg/strings/strings.go#L493) with one click. 54 | 55 | 56 | Godoc is conceptually related to Python's 57 | [Docstring](http://www.python.org/dev/peps/pep-0257/) and Java's 58 | [Javadoc](http://www.oracle.com/technetwork/java/javase/documentation/index-jsp-135444.html), but its design is simpler. The comments read by godoc are not language constructs (as with Docstring) 59 | nor must they have their own machine-readable syntax (as with Javadoc). Godoc comments are just good comments, 60 | the sort you would want to read even if godoc didn't exist. 61 | 62 | 63 | The convention is simple: to document a type, variable, constant, function, or even a package, write a regular comment directly 64 | preceding its declaration, with no intervening blank line. Godoc will then present that comment as text alongside 65 | the item it documents. For example, this is the documentation for the 66 | `fmt` package's 67 | [`Fprint`](https://golang.org/pkg/fmt/#Fprint) function: 68 | 69 | 70 | ``` 71 | // Fprint formats using the default formats for its operands and writes to w. 72 | // Spaces are added between operands when neither is a string. 73 | // It returns the number of bytes written and any write error encountered. 74 | func Fprint(w io.Writer, a ...interface{}) (n int, err error) { 75 | ``` 76 | 77 | Notice this comment is a complete sentence that begins with the name of the element it describes. This important convention 78 | allows us to generate documentation in a variety of formats, from plain text to HTML to UNIX man pages, and makes 79 | it read better when tools truncate it for brevity, such as when they extract the first line or sentence. 80 | 81 | 82 | Comments on package declarations should provide general package documentation. These comments can be short, like the 83 | [`sort`](https://golang.org/pkg/sort/) package's brief description: 84 | 85 | 86 | ``` 87 | // Package sort provides primitives for sorting slices and user-defined 88 | // collections. 89 | package sort 90 | ``` 91 | 92 | They can also be detailed like the 93 | [gob package](https://golang.org/pkg/encoding/gob/)'s overview. That package uses another convention for packages that need large amounts of 94 | introductory documentation: the package comment is placed in its own file, 95 | [doc.go](https://golang.org/src/pkg/encoding/gob/doc.go), which contains only those comments and a package clause. 96 | 97 | 98 | When writing package comments of any size, keep in mind that their first sentence will appear in godoc's 99 | [package list](https://golang.org/pkg/). 100 | 101 | 102 | Comments that are not adjacent to a top-level declaration are omitted from godoc's output, with one notable exception. 103 | Top-level comments that begin with the word 104 | `"BUG(who)”` are recognized as known bugs, and included in the "Bugs” section of the package documentation. The "who” 105 | part should be the user name of someone who could provide more information. For example, this is a known issue 106 | from the 107 | [bytes package](https://golang.org/pkg/bytes/#pkg-note-BUG): 108 | 109 | 110 | ``` 111 | // BUG(r): The rule Title uses for word boundaries does not handle Unicode punctuation properly. 112 | ``` 113 | 114 | Sometimes a struct field, function, type, or even a whole package becomes redundant or unnecessary, but must be kept for 115 | compatibility with existing programs. To signal that an identifier should not be used, add a paragraph to its 116 | doc comment that begins with "Deprecated:" followed by some information about the deprecation. There 117 | are a few examples 118 | [in the standard library](https://golang.org/search?q=Deprecated:). 119 | 120 | 121 | There are a few formatting rules that Godoc uses when converting comments to HTML: 122 | 123 | 124 | - Subsequent lines of text are considered part of the same paragraph; you must leave a blank line to separate paragraphs. 125 | 126 | - Pre-formatted text must be indented relative to the surrounding comment text (see gob's 127 | [doc.go](https://golang.org/src/pkg/encoding/gob/doc.go) for an example). 128 | 129 | - URLs will be converted to HTML links; no special markup is necessary. 130 | 131 | Note that none of these rules requires you to do anything out of the ordinary. 132 | 133 | 134 | In fact, the best thing about godoc's minimal approach is how easy it is to use. As a result, a lot of Go code, including 135 | all of the standard library, already follows the conventions. 136 | 137 | 138 | Your own code can present good documentation just by having comments as described above. Any Go packages installed inside 139 | `$GOROOT/src/pkg` and any 140 | `GOPATH` work spaces will already be accessible via godoc's command-line and HTTP interfaces, and you can specify 141 | additional paths for indexing via the 142 | `-path` flag or just by running 143 | `"godoc ."` in the source directory. See the 144 | [godoc documentation](https://golang.org/cmd/godoc/) for more details. 145 | 146 | 147 | By Andrew Gerrand 148 | 149 | ## Related articles 150 | 151 | - [HTTP/2 Server Push](http://blog.golang.org/h2push) 152 | - [Introducing HTTP Tracing](http://blog.golang.org/http-tracing) 153 | - [Testable Examples in Go](http://blog.golang.org/examples) 154 | - [Generating code](http://blog.golang.org/generate) 155 | - [Introducing the Go Race Detector](http://blog.golang.org/race-detector) 156 | - [Go maps in action](http://blog.golang.org/go-maps-in-action) 157 | - [go fmt your code](http://blog.golang.org/go-fmt-your-code) 158 | - [Organizing Go code](http://blog.golang.org/organizing-go-code) 159 | - [Debugging Go programs with the GNU Debugger](http://blog.golang.org/debugging-go-programs-with-gnu-debugger) 160 | - [The Go image/draw package](http://blog.golang.org/go-imagedraw-package) 161 | - [The Go image package](http://blog.golang.org/go-image-package) 162 | - [The Laws of Reflection](http://blog.golang.org/laws-of-reflection) 163 | - [Error handling and Go](http://blog.golang.org/error-handling-and-go) 164 | - ["First Class Functions in Go"](http://blog.golang.org/first-class-functions-in-go-and-new-go) 165 | - [Profiling Go Programs](http://blog.golang.org/profiling-go-programs) 166 | - [A GIF decoder: an exercise in Go interfaces](http://blog.golang.org/gif-decoder-exercise-in-go-interfaces) 167 | - [Introducing Gofix](http://blog.golang.org/introducing-gofix) 168 | - [Gobs of data](http://blog.golang.org/gobs-of-data) 169 | - [C? Go? Cgo!](http://blog.golang.org/c-go-cgo) 170 | - [JSON and Go](http://blog.golang.org/json-and-go) 171 | - [Go Slices: usage and internals](http://blog.golang.org/go-slices-usage-and-internals) 172 | - [Go Concurrency Patterns: Timing out, moving on](http://blog.golang.org/go-concurrency-patterns-timing-out-and) 173 | - [Defer, Panic, and Recover](http://blog.golang.org/defer-panic-and-recover) 174 | - [Share Memory By Communicating](http://blog.golang.org/share-memory-by-communicating) 175 | - [JSON-RPC: a tale of interfaces](http://blog.golang.org/json-rpc-tale-of-interfaces) 176 | 177 | Except as 178 | [noted](https://developers.google.com/site-policies#restrictions), the content of this page is licensed under the Creative Commons Attribution 3.0 License, 179 | 180 | 181 | and code is licensed under a 182 | [BSD license](http://golang.org/LICENSE). 183 | 184 | 185 | [Terms of Service](http://golang.org/doc/tos.html) \| 186 | [Privacy Policy](http://www.google.com/intl/en/policies/privacy/) \| 187 | [View the source code](https://go.googlesource.com/blog/) -------------------------------------------------------------------------------- /testdata/TestRealWorld/blog.golang.org/output.emphasis_underscores.golden: -------------------------------------------------------------------------------- 1 | Godoc: documenting Go code - The Go Blog 2 | 3 | [The Go Programming Language](http://golang.org/) 4 | 5 | [Go](http://golang.org/) 6 | 7 | ▽ 8 | 9 | [Documents](http://golang.org/doc/) [Packages](http://golang.org/pkg/) [The Project](http://golang.org/project/) [Help](http://golang.org/help/) [Blog](http://blog.golang.org/)submit search 10 | 11 | #### Next article 12 | 13 | [Introducing Gofix](http://blog.golang.org/introducing-gofix) 14 | 15 | #### Previous article 16 | 17 | [Gobs of data](http://blog.golang.org/gobs-of-data) 18 | 19 | #### Links 20 | 21 | - [golang.org](http://golang.org/) 22 | - [Install Go](http://golang.org/doc/install.html) 23 | - [A Tour of Go](http://tour.golang.org/) 24 | - [Go Documentation](http://golang.org/doc/) 25 | - [Go Mailing List](http://groups.google.com/group/golang-nuts) 26 | - [Go on Google+](http://plus.google.com/101406623878176903605) 27 | - [Go+ Community](http://plus.google.com/communities/114112804251407510571) 28 | - [Go on Twitter](http://twitter.com/golang) 29 | 30 | [Blog index](http://blog.golang.org/index) 31 | 32 | # [The Go Blog](http://blog.golang.org/) 33 | 34 | ### [Godoc: documenting Go code](http://blog.golang.org/godoc-documenting-go-code) 35 | 36 | 31 March 2011 37 | 38 | The Go project takes documentation seriously. Documentation is a huge part of making software accessible and maintainable. 39 | Of course it must be well-written and accurate, but it also must be easy to write and to maintain. Ideally, it 40 | should be coupled to the code itself so the documentation evolves along with the code. The easier it is for programmers 41 | to produce good documentation, the better for everyone. 42 | 43 | 44 | To that end, we have developed the 45 | [godoc](https://golang.org/cmd/godoc/) documentation tool. This article describes godoc's approach to documentation, and explains how 46 | you can use our conventions and tools to write good documentation for your own projects. 47 | 48 | 49 | Godoc parses Go source code - including comments - and produces documentation as HTML or plain text. The end result is documentation 50 | tightly coupled with the code it documents. For example, through godoc's web interface you can navigate from 51 | a function's 52 | [documentation](https://golang.org/pkg/strings/#HasPrefix) to its 53 | [implementation](https://golang.org/src/pkg/strings/strings.go#L493) with one click. 54 | 55 | 56 | Godoc is conceptually related to Python's 57 | [Docstring](http://www.python.org/dev/peps/pep-0257/) and Java's 58 | [Javadoc](http://www.oracle.com/technetwork/java/javase/documentation/index-jsp-135444.html), but its design is simpler. The comments read by godoc are not language constructs (as with Docstring) 59 | nor must they have their own machine-readable syntax (as with Javadoc). Godoc comments are just good comments, 60 | the sort you would want to read even if godoc didn't exist. 61 | 62 | 63 | The convention is simple: to document a type, variable, constant, function, or even a package, write a regular comment directly 64 | preceding its declaration, with no intervening blank line. Godoc will then present that comment as text alongside 65 | the item it documents. For example, this is the documentation for the 66 | `fmt` package's 67 | [`Fprint`](https://golang.org/pkg/fmt/#Fprint) function: 68 | 69 | 70 | ``` 71 | // Fprint formats using the default formats for its operands and writes to w. 72 | // Spaces are added between operands when neither is a string. 73 | // It returns the number of bytes written and any write error encountered. 74 | func Fprint(w io.Writer, a ...interface{}) (n int, err error) { 75 | ``` 76 | 77 | Notice this comment is a complete sentence that begins with the name of the element it describes. This important convention 78 | allows us to generate documentation in a variety of formats, from plain text to HTML to UNIX man pages, and makes 79 | it read better when tools truncate it for brevity, such as when they extract the first line or sentence. 80 | 81 | 82 | Comments on package declarations should provide general package documentation. These comments can be short, like the 83 | [`sort`](https://golang.org/pkg/sort/) package's brief description: 84 | 85 | 86 | ``` 87 | // Package sort provides primitives for sorting slices and user-defined 88 | // collections. 89 | package sort 90 | ``` 91 | 92 | They can also be detailed like the 93 | [gob package](https://golang.org/pkg/encoding/gob/)'s overview. That package uses another convention for packages that need large amounts of 94 | introductory documentation: the package comment is placed in its own file, 95 | [doc.go](https://golang.org/src/pkg/encoding/gob/doc.go), which contains only those comments and a package clause. 96 | 97 | 98 | When writing package comments of any size, keep in mind that their first sentence will appear in godoc's 99 | [package list](https://golang.org/pkg/). 100 | 101 | 102 | Comments that are not adjacent to a top-level declaration are omitted from godoc's output, with one notable exception. 103 | Top-level comments that begin with the word 104 | `"BUG(who)”` are recognized as known bugs, and included in the "Bugs” section of the package documentation. The "who” 105 | part should be the user name of someone who could provide more information. For example, this is a known issue 106 | from the 107 | [bytes package](https://golang.org/pkg/bytes/#pkg-note-BUG): 108 | 109 | 110 | ``` 111 | // BUG(r): The rule Title uses for word boundaries does not handle Unicode punctuation properly. 112 | ``` 113 | 114 | Sometimes a struct field, function, type, or even a whole package becomes redundant or unnecessary, but must be kept for 115 | compatibility with existing programs. To signal that an identifier should not be used, add a paragraph to its 116 | doc comment that begins with "Deprecated:" followed by some information about the deprecation. There 117 | are a few examples 118 | [in the standard library](https://golang.org/search?q=Deprecated:). 119 | 120 | 121 | There are a few formatting rules that Godoc uses when converting comments to HTML: 122 | 123 | 124 | - Subsequent lines of text are considered part of the same paragraph; you must leave a blank line to separate paragraphs. 125 | 126 | - Pre-formatted text must be indented relative to the surrounding comment text (see gob's 127 | [doc.go](https://golang.org/src/pkg/encoding/gob/doc.go) for an example). 128 | 129 | - URLs will be converted to HTML links; no special markup is necessary. 130 | 131 | Note that none of these rules requires you to do anything out of the ordinary. 132 | 133 | 134 | In fact, the best thing about godoc's minimal approach is how easy it is to use. As a result, a lot of Go code, including 135 | all of the standard library, already follows the conventions. 136 | 137 | 138 | Your own code can present good documentation just by having comments as described above. Any Go packages installed inside 139 | `$GOROOT/src/pkg` and any 140 | `GOPATH` work spaces will already be accessible via godoc's command-line and HTTP interfaces, and you can specify 141 | additional paths for indexing via the 142 | `-path` flag or just by running 143 | `"godoc ."` in the source directory. See the 144 | [godoc documentation](https://golang.org/cmd/godoc/) for more details. 145 | 146 | 147 | By Andrew Gerrand 148 | 149 | ## Related articles 150 | 151 | - [HTTP/2 Server Push](http://blog.golang.org/h2push) 152 | - [Introducing HTTP Tracing](http://blog.golang.org/http-tracing) 153 | - [Testable Examples in Go](http://blog.golang.org/examples) 154 | - [Generating code](http://blog.golang.org/generate) 155 | - [Introducing the Go Race Detector](http://blog.golang.org/race-detector) 156 | - [Go maps in action](http://blog.golang.org/go-maps-in-action) 157 | - [go fmt your code](http://blog.golang.org/go-fmt-your-code) 158 | - [Organizing Go code](http://blog.golang.org/organizing-go-code) 159 | - [Debugging Go programs with the GNU Debugger](http://blog.golang.org/debugging-go-programs-with-gnu-debugger) 160 | - [The Go image/draw package](http://blog.golang.org/go-imagedraw-package) 161 | - [The Go image package](http://blog.golang.org/go-image-package) 162 | - [The Laws of Reflection](http://blog.golang.org/laws-of-reflection) 163 | - [Error handling and Go](http://blog.golang.org/error-handling-and-go) 164 | - ["First Class Functions in Go"](http://blog.golang.org/first-class-functions-in-go-and-new-go) 165 | - [Profiling Go Programs](http://blog.golang.org/profiling-go-programs) 166 | - [A GIF decoder: an exercise in Go interfaces](http://blog.golang.org/gif-decoder-exercise-in-go-interfaces) 167 | - [Introducing Gofix](http://blog.golang.org/introducing-gofix) 168 | - [Gobs of data](http://blog.golang.org/gobs-of-data) 169 | - [C? Go? Cgo!](http://blog.golang.org/c-go-cgo) 170 | - [JSON and Go](http://blog.golang.org/json-and-go) 171 | - [Go Slices: usage and internals](http://blog.golang.org/go-slices-usage-and-internals) 172 | - [Go Concurrency Patterns: Timing out, moving on](http://blog.golang.org/go-concurrency-patterns-timing-out-and) 173 | - [Defer, Panic, and Recover](http://blog.golang.org/defer-panic-and-recover) 174 | - [Share Memory By Communicating](http://blog.golang.org/share-memory-by-communicating) 175 | - [JSON-RPC: a tale of interfaces](http://blog.golang.org/json-rpc-tale-of-interfaces) 176 | 177 | Except as 178 | [noted](https://developers.google.com/site-policies#restrictions), the content of this page is licensed under the Creative Commons Attribution 3.0 License, 179 | 180 | 181 | and code is licensed under a 182 | [BSD license](http://golang.org/LICENSE). 183 | 184 | 185 | [Terms of Service](http://golang.org/doc/tos.html) \| 186 | [Privacy Policy](http://www.google.com/intl/en/policies/privacy/) \| 187 | [View the source code](https://go.googlesource.com/blog/) --------------------------------------------------------------------------------