├── .gitignore
├── CHANGES.md
├── readme.txt
├── License.text
├── PHP Markdown Readme.text
└── php-markdown-modified.php
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
--------------------------------------------------------------------------------
/CHANGES.md:
--------------------------------------------------------------------------------
1 | ### Markdown-Modified
2 |
3 | #### 1.0.1
4 |
5 | * renamed functions and `MARKDOWN_VERSION` const so not errors if another version of `markdown.php` is being loaded by another plugin
6 |
7 | #### 1.0
8 |
9 | * initial commit
--------------------------------------------------------------------------------
/readme.txt:
--------------------------------------------------------------------------------
1 | Modified version of original PHP Markdown plugin to work along side Markdown on Save variants.
2 |
3 | All Markdown markup will be rendered in all posts regardless of setting in Markdown on Save variant.
4 |
5 | Better solution for those who have always used Markdown markup would be some method of modifying the setting to **not use** Markdown on posts to **use** Markdown.
6 |
--------------------------------------------------------------------------------
/License.text:
--------------------------------------------------------------------------------
1 | PHP Markdown
2 | Copyright (c) 2004-2009 Michel Fortin
3 |
4 | All rights reserved.
5 |
6 | Based on Markdown
7 | Copyright (c) 2003-2006 John Gruber
8 |
9 | All rights reserved.
10 |
11 | Redistribution and use in source and binary forms, with or without
12 | modification, are permitted provided that the following conditions are
13 | met:
14 |
15 | * Redistributions of source code must retain the above copyright notice,
16 | this list of conditions and the following disclaimer.
17 |
18 | * Redistributions in binary form must reproduce the above copyright
19 | notice, this list of conditions and the following disclaimer in the
20 | documentation and/or other materials provided with the distribution.
21 |
22 | * Neither the name "Markdown" nor the names of its contributors may
23 | be used to endorse or promote products derived from this software
24 | without specific prior written permission.
25 |
26 | This software is provided by the copyright holders and contributors "as
27 | is" and any express or implied warranties, including, but not limited
28 | to, the implied warranties of merchantability and fitness for a
29 | particular purpose are disclaimed. In no event shall the copyright owner
30 | or contributors be liable for any direct, indirect, incidental, special,
31 | exemplary, or consequential damages (including, but not limited to,
32 | procurement of substitute goods or services; loss of use, data, or
33 | profits; or business interruption) however caused and on any theory of
34 | liability, whether in contract, strict liability, or tort (including
35 | negligence or otherwise) arising in any way out of the use of this
36 | software, even if advised of the possibility of such damage.
37 |
--------------------------------------------------------------------------------
/PHP Markdown Readme.text:
--------------------------------------------------------------------------------
1 | PHP Markdown
2 | ============
3 |
4 | Version 1.0.1m - Sat 21 Jun 2008
5 |
6 | by Michel Fortin
7 |
8 |
9 | based on work by John Gruber
10 |
11 |
12 |
13 | Introduction
14 | ------------
15 |
16 | Markdown is a text-to-HTML conversion tool for web writers. Markdown
17 | allows you to write using an easy-to-read, easy-to-write plain text
18 | format, then convert it to structurally valid XHTML (or HTML).
19 |
20 | "Markdown" is two things: a plain text markup syntax, and a software
21 | tool, written in Perl, that converts the plain text markup to HTML.
22 | PHP Markdown is a port to PHP of the original Markdown program by
23 | John Gruber.
24 |
25 | PHP Markdown can work as a plug-in for WordPress and bBlog, as a
26 | modifier for the Smarty templating engine, or as a remplacement for
27 | textile formatting in any software that support textile.
28 |
29 | Full documentation of Markdown's syntax is available on John's
30 | Markdown page:
31 |
32 |
33 | Installation and Requirement
34 | ----------------------------
35 |
36 | PHP Markdown requires PHP version 4.0.5 or later.
37 |
38 |
39 | ### WordPress ###
40 |
41 | PHP Markdown works with [WordPress][wp], version 1.2 or later.
42 |
43 | [wp]: http://wordpress.org/
44 |
45 | 1. To use PHP Markdown with WordPress, place the "makrdown.php" file
46 | in the "plugins" folder. This folder is located inside
47 | "wp-content" at the root of your site:
48 |
49 | (site home)/wp-content/plugins/
50 |
51 | 2. Activate the plugin with the administrative interface of
52 | WordPress. In the "Plugins" section you will now find Markdown.
53 | To activate the plugin, click on the "Activate" button on the
54 | same line than Markdown. Your entries will now be formatted by
55 | PHP Markdown.
56 |
57 | 3. To post Markdown content, you'll first have to disable the
58 | "visual" editor in the User section of WordPress.
59 |
60 | You can configure PHP Markdown to not apply to the comments on your
61 | WordPress weblog. See the "Configuration" section below.
62 |
63 | It is not possible at this time to apply a different set of
64 | filters to different entries. All your entries will be formated by
65 | PHP Markdown. This is a limitation of WordPress. If your old entries
66 | are written in HTML (as opposed to another formatting syntax, like
67 | Textile), they'll probably stay fine after installing Markdown.
68 |
69 |
70 | ### bBlog ###
71 |
72 | PHP Markdown also works with [bBlog][bb].
73 |
74 | [bb]: http://www.bblog.com/
75 |
76 | To use PHP Markdown with bBlog, rename "markdown.php" to
77 | "modifier.markdown.php" and place the file in the "bBlog_plugins"
78 | folder. This folder is located inside the "bblog" directory of
79 | your site, like this:
80 |
81 | (site home)/bblog/bBlog_plugins/modifier.markdown.php
82 |
83 | Select "Markdown" as the "Entry Modifier" when you post a new
84 | entry. This setting will only apply to the entry you are editing.
85 |
86 |
87 | ### Replacing Textile in TextPattern ###
88 |
89 | [TextPattern][tp] use [Textile][tx] to format your text. You can
90 | replace Textile by Markdown in TextPattern without having to change
91 | any code by using the *Texitle Compatibility Mode*. This may work
92 | with other software that expect Textile too.
93 |
94 | [tx]: http://www.textism.com/tools/textile/
95 | [tp]: http://www.textpattern.com/
96 |
97 | 1. Rename the "markdown.php" file to "classTextile.php". This will
98 | make PHP Markdown behave as if it was the actual Textile parser.
99 |
100 | 2. Replace the "classTextile.php" file TextPattern installed in your
101 | web directory. It can be found in the "lib" directory:
102 |
103 | (site home)/textpattern/lib/
104 |
105 | Contrary to Textile, Markdown does not convert quotes to curly ones
106 | and does not convert multiple hyphens (`--` and `---`) into en- and
107 | em-dashes. If you use PHP Markdown in Textile Compatibility Mode, you
108 | can solve this problem by installing the "smartypants.php" file from
109 | [PHP SmartyPants][psp] beside the "classTextile.php" file. The Textile
110 | Compatibility Mode function will use SmartyPants automatically without
111 | further modification.
112 |
113 | [psp]: http://michelf.com/projects/php-smartypants/
114 |
115 |
116 | ### Updating Markdown in Other Programs ###
117 |
118 | Many web applications now ship with PHP Markdown, or have plugins to
119 | perform the conversion to HTML. You can update PHP Markdown in many of
120 | these programs by swapping the old "markdown.php" file for the new one.
121 |
122 | Here is a short non-exhaustive list of some programs and where they
123 | hide the "markdown.php" file.
124 |
125 | | Program | Path to Markdown
126 | | ------- | ----------------
127 | | [Pivot][] | `(site home)/pivot/includes/markdown/markdown.php`
128 |
129 | If you're unsure if you can do this with your application, ask the
130 | developer, or wait for the developer to update his application or
131 | plugin with the new version of PHP Markdown.
132 |
133 | [Pivot]: http://pivotlog.net/
134 |
135 |
136 | ### In Your Own Programs ###
137 |
138 | You can use PHP Markdown easily in your current PHP program. Simply
139 | include the file and then call the Markdown function on the text you
140 | want to convert:
141 |
142 | include_once "markdown.php";
143 | $my_html = Markdown($my_text);
144 |
145 | If you wish to use PHP Markdown with another text filter function
146 | built to parse HTML, you should filter the text *after* the Markdown
147 | function call. This is an example with [PHP SmartyPants][psp]:
148 |
149 | $my_html = SmartyPants(Markdown($my_text));
150 |
151 |
152 | ### With Smarty ###
153 |
154 | If your program use the [Smarty][sm] template engine, PHP Markdown
155 | can now be used as a modifier for your templates. Rename "markdown.php"
156 | to "modifier.markdown.php" and put it in your smarty plugins folder.
157 |
158 | [sm]: http://smarty.php.net/
159 |
160 | If you are using MovableType 3.1 or later, the Smarty plugin folder is
161 | located at `(MT CGI root)/php/extlib/smarty/plugins`. This will allow
162 | Markdown to work on dynamic pages.
163 |
164 |
165 | Configuration
166 | -------------
167 |
168 | By default, PHP Markdown produces XHTML output for tags with empty
169 | elements. E.g.:
170 |
171 |
172 |
173 | Markdown can be configured to produce HTML-style tags; e.g.:
174 |
175 |
176 |
177 | To do this, you must edit the "MARKDOWN_EMPTY_ELEMENT_SUFFIX"
178 | definition below the "Global default settings" header at the start of
179 | the "markdown.php" file.
180 |
181 |
182 | ### WordPress-Specific Settings ###
183 |
184 | By default, the Markdown plugin applies to both posts and comments on
185 | your WordPress weblog. To deactivate one or the other, edit the
186 | `MARKDOWN_WP_POSTS` or `MARKDOWN_WP_COMMENTS` definitions under the
187 | "WordPress settings" header at the start of the "markdown.php" file.
188 |
189 |
190 | Bugs
191 | ----
192 |
193 | To file bug reports please send email to:
194 |
195 |
196 | Please include with your report: (1) the example input; (2) the output you
197 | expected; (3) the output PHP Markdown actually produced.
198 |
199 |
200 | Version History
201 | ---------------
202 |
203 | 1.0.1n (10 Oct 2009):
204 |
205 | * Enabled reference-style shortcut links. Now you can write reference-style
206 | links with less brakets:
207 |
208 | This is [my website].
209 |
210 | [my website]: http://example.com/
211 |
212 | This was added in the 1.0.2 betas, but commented out in the 1.0.1 branch,
213 | waiting for the feature to be officialized. [But half of the other Markdown
214 | implementations are supporting this syntax][half], so it makes sense for
215 | compatibility's sake to allow it in PHP Markdown too.
216 |
217 | [half]: http://babelmark.bobtfish.net/?markdown=This+is+%5Bmy+website%5D.%0D%0A%09%09%0D%0A%5Bmy+website%5D%3A+http%3A%2F%2Fexample.com%2F%0D%0A&src=1&dest=2
218 |
219 | * Now accepting many valid email addresses in autolinks that were
220 | previously rejected, such as:
221 |
222 |
223 |
224 | <"abc@def"@example.com>
225 | <"Fred Bloggs"@example.com>
226 |
227 |
228 | * Now accepting spaces in URLs for inline and reference-style links. Such
229 | URLs need to be surrounded by angle brakets. For instance:
230 |
231 | [link text]( "optional title")
232 |
233 | [link text][ref]
234 | [ref]: "optional title"
235 |
236 | There is still a quirk which may prevent this from working correctly with
237 | relative URLs in inline-style links however.
238 |
239 | * Fix for adjacent list of different kind where the second list could
240 | end as a sublist of the first when not separated by an empty line.
241 |
242 | * Fixed a bug where inline-style links wouldn't be recognized when the link
243 | definition contains a line break between the url and the title.
244 |
245 | * Fixed a bug where tags where the name contains an underscore aren't parsed
246 | correctly.
247 |
248 | * Fixed some corner-cases mixing underscore-ephasis and asterisk-emphasis.
249 |
250 |
251 | 1.0.1m (21 Jun 2008):
252 |
253 | * Lists can now have empty items.
254 |
255 | * Rewrote the emphasis and strong emphasis parser to fix some issues
256 | with odly placed and overlong markers.
257 |
258 |
259 | 1.0.1l (11 May 2008):
260 |
261 | * Now removing the UTF-8 BOM at the start of a document, if present.
262 |
263 | * Now accepting capitalized URI schemes (such as HTTP:) in automatic
264 | links, such as ``.
265 |
266 | * Fixed a problem where `
` was seen as a horizontal
267 | rule instead of an automatic link.
268 |
269 | * Fixed an issue where some characters in Markdown-generated HTML
270 | attributes weren't properly escaped with entities.
271 |
272 | * Fix for code blocks as first element of a list item. Previously,
273 | this didn't create any code block for item 2:
274 |
275 | * Item 1 (regular paragraph)
276 |
277 | * Item 2 (code block)
278 |
279 | * A code block starting on the second line of a document wasn't seen
280 | as a code block. This has been fixed.
281 |
282 | * Added programatically-settable parser properties `predef_urls` and
283 | `predef_titles` for predefined URLs and titles for reference-style
284 | links. To use this, your PHP code must call the parser this way:
285 |
286 | $parser = new Markdwon_Parser;
287 | $parser->predef_urls = array('linkref' => 'http://example.com');
288 | $html = $parser->transform($text);
289 |
290 | You can then use the URL as a normal link reference:
291 |
292 | [my link][linkref]
293 | [my link][linkRef]
294 |
295 | Reference names in the parser properties *must* be lowercase.
296 | Reference names in the Markdown source may have any case.
297 |
298 | * Added `setup` and `teardown` methods which can be used by subclassers
299 | as hook points to arrange the state of some parser variables before and
300 | after parsing.
301 |
302 |
303 | 1.0.1k (26 Sep 2007):
304 |
305 | * Fixed a problem introduced in 1.0.1i where three or more identical
306 | uppercase letters, as well as a few other symbols, would trigger
307 | a horizontal line.
308 |
309 |
310 | 1.0.1j (4 Sep 2007):
311 |
312 | * Fixed a problem introduced in 1.0.1i where the closing `code` and
313 | `pre` tags at the end of a code block were appearing in the wrong
314 | order.
315 |
316 | * Overriding configuration settings by defining constants from an
317 | external before markdown.php is included is now possible without
318 | producing a PHP warning.
319 |
320 |
321 | 1.0.1i (31 Aug 2007):
322 |
323 | * Fixed a problem where an escaped backslash before a code span
324 | would prevent the code span from being created. This should now
325 | work as expected:
326 |
327 | Litteral backslash: \\`code span`
328 |
329 | * Overall speed improvements, especially with long documents.
330 |
331 |
332 | 1.0.1h (3 Aug 2007):
333 |
334 | * Added two properties (`no_markup` and `no_entities`) to the parser
335 | allowing HTML tags and entities to be disabled.
336 |
337 | * Fix for a problem introduced in 1.0.1g where posting comments in
338 | WordPress would trigger PHP warnings and cause some markup to be
339 | incorrectly filtered by the kses filter in WordPress.
340 |
341 |
342 | 1.0.1g (3 Jul 2007):
343 |
344 | * Fix for PHP 5 compiled without the mbstring module. Previous fix to
345 | calculate the length of UTF-8 strings in `detab` when `mb_strlen` is
346 | not available was only working with PHP 4.
347 |
348 | * Fixed a problem with WordPress 2.x where full-content posts in RSS feeds
349 | were not processed correctly by Markdown.
350 |
351 | * Now supports URLs containing literal parentheses for inline links
352 | and images, such as:
353 |
354 | [WIMP](http://en.wikipedia.org/wiki/WIMP_(computing))
355 |
356 | Such parentheses may be arbitrarily nested, but must be
357 | balanced. Unbalenced parentheses are allowed however when the URL
358 | when escaped or when the URL is enclosed in angle brakets `<>`.
359 |
360 | * Fixed a performance problem where the regular expression for strong
361 | emphasis introduced in version 1.0.1d could sometime be long to process,
362 | give slightly wrong results, and in some circumstances could remove
363 | entirely the content for a whole paragraph.
364 |
365 | * Some change in version 1.0.1d made possible the incorrect nesting of
366 | anchors within each other. This is now fixed.
367 |
368 | * Fixed a rare issue where certain MD5 hashes in the content could
369 | be changed to their corresponding text. For instance, this:
370 |
371 | The MD5 value for "+" is "26b17225b626fb9238849fd60eabdf60".
372 |
373 | was incorrectly changed to this in previous versions of PHP Markdown:
374 |
375 | The MD5 value for "+" is "+".
376 |
377 | * Now convert escaped characters to their numeric character
378 | references equivalent.
379 |
380 | This fix an integration issue with SmartyPants and backslash escapes.
381 | Since Markdown and SmartyPants have some escapable characters in common,
382 | it was sometime necessary to escape them twice. Previously, two
383 | backslashes were sometime required to prevent Markdown from "eating" the
384 | backslash before SmartyPants sees it:
385 |
386 | Here are two hyphens: \\--
387 |
388 | Now, only one backslash will do:
389 |
390 | Here are two hyphens: \--
391 |
392 |
393 | 1.0.1f (7 Feb 2007):
394 |
395 | * Fixed an issue with WordPress where manually-entered excerpts, but
396 | not the auto-generated ones, would contain nested paragraphs.
397 |
398 | * Fixed an issue introduced in 1.0.1d where headers and blockquotes
399 | preceded too closely by a paragraph (not separated by a blank line)
400 | where incorrectly put inside the paragraph.
401 |
402 | * Fixed an issue introduced in 1.0.1d in the tokenizeHTML method where
403 | two consecutive code spans would be merged into one when together they
404 | form a valid tag in a multiline paragraph.
405 |
406 | * Fixed an long-prevailing issue where blank lines in code blocks would
407 | be doubled when the code block is in a list item.
408 |
409 | This was due to the list processing functions relying on artificially
410 | doubled blank lines to correctly determine when list items should
411 | contain block-level content. The list item processing model was thus
412 | changed to avoid the need for double blank lines.
413 |
414 | * Fixed an issue with `<% asp-style %>` instructions used as inline
415 | content where the opening `<` was encoded as `<`.
416 |
417 | * Fixed a parse error occuring when PHP is configured to accept
418 | ASP-style delimiters as boundaries for PHP scripts.
419 |
420 | * Fixed a bug introduced in 1.0.1d where underscores in automatic links
421 | got swapped with emphasis tags.
422 |
423 |
424 | 1.0.1e (28 Dec 2006)
425 |
426 | * Added support for internationalized domain names for email addresses in
427 | automatic link. Improved the speed at which email addresses are converted
428 | to entities. Thanks to Milian Wolff for his optimisations.
429 |
430 | * Made deterministic the conversion to entities of email addresses in
431 | automatic links. This means that a given email address will always be
432 | encoded the same way.
433 |
434 | * PHP Markdown will now use its own function to calculate the length of an
435 | UTF-8 string in `detab` when `mb_strlen` is not available instead of
436 | giving a fatal error.
437 |
438 |
439 | 1.0.1d (1 Dec 2006)
440 |
441 | * Fixed a bug where inline images always had an empty title attribute. The
442 | title attribute is now present only when explicitly defined.
443 |
444 | * Link references definitions can now have an empty title, previously if the
445 | title was defined but left empty the link definition was ignored. This can
446 | be useful if you want an empty title attribute in images to hide the
447 | tooltip in Internet Explorer.
448 |
449 | * Made `detab` aware of UTF-8 characters. UTF-8 multi-byte sequences are now
450 | correctly mapped to one character instead of the number of bytes.
451 |
452 | * Fixed a small bug with WordPress where WordPress' default filter `wpautop`
453 | was not properly deactivated on comment text, resulting in hard line breaks
454 | where Markdown do not prescribes them.
455 |
456 | * Added a `TextileRestrited` method to the textile compatibility mode. There
457 | is no restriction however, as Markdown does not have a restricted mode at
458 | this point. This should make PHP Markdown work again in the latest
459 | versions of TextPattern.
460 |
461 | * Converted PHP Markdown to a object-oriented design.
462 |
463 | * Changed span and block gamut methods so that they loop over a
464 | customizable list of methods. This makes subclassing the parser a more
465 | interesting option for creating syntax extensions.
466 |
467 | * Also added a "document" gamut loop which can be used to hook document-level
468 | methods (like for striping link definitions).
469 |
470 | * Changed all methods which were inserting HTML code so that they now return
471 | a hashed representation of the code. New methods `hashSpan` and `hashBlock`
472 | are used to hash respectivly span- and block-level generated content. This
473 | has a couple of significant effects:
474 |
475 | 1. It prevents invalid nesting of Markdown-generated elements which
476 | could occur occuring with constructs like `*something [link*][1]`.
477 | 2. It prevents problems occuring with deeply nested lists on which
478 | paragraphs were ill-formed.
479 | 3. It removes the need to call `hashHTMLBlocks` twice during the the
480 | block gamut.
481 |
482 | Hashes are turned back to HTML prior output.
483 |
484 | * Made the block-level HTML parser smarter using a specially-crafted regular
485 | expression capable of handling nested tags.
486 |
487 | * Solved backtick issues in tag attributes by rewriting the HTML tokenizer to
488 | be aware of code spans. All these lines should work correctly now:
489 |
490 | bar
491 | bar
492 | ``
493 |
494 | * Changed the parsing of HTML comments to match simply from ``
495 | instead using of the more complicated SGML-style rule with paired `--`.
496 | This is how most browsers parse comments and how XML defines them too.
497 |
498 | * `` has been added to the list of block-level elements and is now
499 | treated as an HTML block instead of being wrapped within paragraph tags.
500 |
501 | * Now only trim trailing newlines from code blocks, instead of trimming
502 | all trailing whitespace characters.
503 |
504 | * Fixed bug where this:
505 |
506 | [text](http://m.com "title" )
507 |
508 | wasn't working as expected, because the parser wasn't allowing for spaces
509 | before the closing paren.
510 |
511 | * Filthy hack to support markdown='1' in div tags.
512 |
513 | * _DoAutoLinks() now supports the 'dict://' URL scheme.
514 |
515 | * PHP- and ASP-style processor instructions are now protected as
516 | raw HTML blocks.
517 |
518 | ... ?>
519 | <% ... %>
520 |
521 | * Fix for escaped backticks still triggering code spans:
522 |
523 | There are two raw backticks here: \` and here: \`, not a code span
524 |
525 |
526 | 1.0.1c (9 Dec 2005)
527 |
528 | * Fixed a problem occurring with PHP 5.1.1 due to a small
529 | change to strings variable replacement behaviour in
530 | this version.
531 |
532 |
533 | 1.0.1b (6 Jun 2005)
534 |
535 | * Fixed a bug where an inline image followed by a reference link would
536 | give a completely wrong result.
537 |
538 | * Fix for escaped backticks still triggering code spans:
539 |
540 | There are two raw backticks here: \` and here: \`, not a code span
541 |
542 | * Fix for an ordered list following an unordered list, and the
543 | reverse. There is now a loop in _DoList that does the two
544 | separately.
545 |
546 | * Fix for nested sub-lists in list-paragraph mode. Previously we got
547 | a spurious extra level of `` tags for something like this:
548 |
549 | * this
550 |
551 | * sub
552 |
553 | that
554 |
555 | * Fixed some incorrect behaviour with emphasis. This will now work
556 | as it should:
557 |
558 | *test **thing***
559 | **test *thing***
560 | ***thing* test**
561 | ***thing** test*
562 |
563 | Name: __________
564 | Address: _______
565 |
566 | * Correct a small bug in `_TokenizeHTML` where a Doctype declaration
567 | was not seen as HTML.
568 |
569 | * Major rewrite of the WordPress integration code that should
570 | correct many problems by preventing default WordPress filters from
571 | tampering with Markdown-formatted text. More details here:
572 |
573 |
574 |
575 | 1.0.1a (15 Apr 2005)
576 |
577 | * Fixed an issue where PHP warnings were trigged when converting
578 | text with list items running on PHP 4.0.6. This was comming from
579 | the `rtrim` function which did not support the second argument
580 | prior version 4.1. Replaced by a regular expression.
581 |
582 | * Markdown now filter correctly post excerpts and comment
583 | excerpts in WordPress.
584 |
585 | * Automatic links and some code sample were "corrected" by
586 | the balenceTag filter in WordPress meant to ensure HTML
587 | is well formed. This new version of PHP Markdown postpone this
588 | filter so that it runs after Markdown.
589 |
590 | * Blockquote syntax and some code sample were stripped by
591 | a new WordPress 1.5 filter meant to remove unwanted HTML
592 | in comments. This new version of PHP Markdown postpone this
593 | filter so that it runs after Markdown.
594 |
595 |
596 | 1.0.1 (16 Dec 2004):
597 |
598 | * Changed the syntax rules for code blocks and spans. Previously,
599 | backslash escapes for special Markdown characters were processed
600 | everywhere other than within inline HTML tags. Now, the contents of
601 | code blocks and spans are no longer processed for backslash escapes.
602 | This means that code blocks and spans are now treated literally,
603 | with no special rules to worry about regarding backslashes.
604 |
605 | **IMPORTANT**: This breaks the syntax from all previous versions of
606 | Markdown. Code blocks and spans involving backslash characters will
607 | now generate different output than before.
608 |
609 | Implementation-wise, this change was made by moving the call to
610 | `_EscapeSpecialChars()` from the top-level `Markdown()` function to
611 | within `_RunSpanGamut()`.
612 |
613 | * Significants performance improvement in `_DoHeader`, `_Detab`
614 | and `_TokenizeHTML`.
615 |
616 | * Added `>`, `+`, and `-` to the list of backslash-escapable
617 | characters. These should have been done when these characters
618 | were added as unordered list item markers.
619 |
620 | * Inline links using `<` and `>` URL delimiters weren't working:
621 |
622 | like [this]()
623 |
624 | Fixed by moving `_DoAutoLinks()` after `_DoAnchors()` in
625 | `_RunSpanGamut()`.
626 |
627 | * Fixed bug where auto-links were being processed within code spans:
628 |
629 | like this: ``
630 |
631 | Fixed by moving `_DoAutoLinks()` from `_RunBlockGamut()` to
632 | `_RunSpanGamut()`.
633 |
634 | * Sort-of fixed a bug where lines in the middle of hard-wrapped
635 | paragraphs, which lines look like the start of a list item,
636 | would accidentally trigger the creation of a list. E.g. a
637 | paragraph that looked like this:
638 |
639 | I recommend upgrading to version
640 | 8. Oops, now this line is treated
641 | as a sub-list.
642 |
643 | This is fixed for top-level lists, but it can still happen for
644 | sub-lists. E.g., the following list item will not be parsed
645 | properly:
646 |
647 | * I recommend upgrading to version
648 | 8. Oops, now this line is treated
649 | as a sub-list.
650 |
651 | Given Markdown's list-creation rules, I'm not sure this can
652 | be fixed.
653 |
654 | * Fix for horizontal rules preceded by 2 or 3 spaces or followed by
655 | trailing spaces and tabs.
656 |
657 | * Standalone HTML comments are now handled; previously, they'd get
658 | wrapped in a spurious `
` tag.
659 |
660 | * `_HashHTMLBlocks()` now tolerates trailing spaces and tabs following
661 | HTML comments and `
` tags.
662 |
663 | * Changed special case pattern for hashing `
` tags in
664 | `_HashHTMLBlocks()` so that they must occur within three spaces
665 | of left margin. (With 4 spaces or a tab, they should be
666 | code blocks, but weren't before this fix.)
667 |
668 | * Auto-linked email address can now optionally contain
669 | a 'mailto:' protocol. I.e. these are equivalent:
670 |
671 |
672 |
673 |
674 | * Fixed annoying bug where nested lists would wind up with
675 | spurious (and invalid) `` tags.
676 |
677 | * Changed `_StripLinkDefinitions()` so that link definitions must
678 | occur within three spaces of the left margin. Thus if you indent
679 | a link definition by four spaces or a tab, it will now be a code
680 | block.
681 |
682 | * You can now write empty links:
683 |
684 | [like this]()
685 |
686 | and they'll be turned into anchor tags with empty href attributes.
687 | This should have worked before, but didn't.
688 |
689 | * `***this***` and `___this___` are now turned into
690 |
691 | this
692 |
693 | Instead of
694 |
695 | this
696 |
697 | which isn't valid.
698 |
699 | * Fixed problem for links defined with urls that include parens, e.g.:
700 |
701 | [1]: http://sources.wikipedia.org/wiki/Middle_East_Policy_(Chomsky)
702 |
703 | "Chomsky" was being erroneously treated as the URL's title.
704 |
705 | * Double quotes in the title of an inline link used to give strange
706 | results (incorrectly made entities). Fixed.
707 |
708 | * Tabs are now correctly changed into spaces. Previously, only
709 | the first tab was converted. In code blocks, the second one was too,
710 | but was not always correctly aligned.
711 |
712 | * Fixed a bug where a tab character inserted after a quote on the same
713 | line could add a slash before the quotes.
714 |
715 | This is "before" [tab] and "after" a tab.
716 |
717 | Previously gave this result:
718 |
719 |
This is \"before\" [tab] and "after" a tab.
720 |
721 | * Removed a call to `htmlentities`. This fixes a bug where multibyte
722 | characters present in the title of a link reference could lead to
723 | invalid utf-8 characters.
724 |
725 | * Changed a regular expression in `_TokenizeHTML` that could lead to
726 | a segmentation fault with PHP 4.3.8 on Linux.
727 |
728 | * Fixed some notices that could show up if PHP error reporting
729 | E_NOTICE flag was set.
730 |
731 |
732 | Copyright and License
733 | ---------------------
734 |
735 | PHP Markdown
736 | Copyright (c) 2004-2009 Michel Fortin
737 |
738 | All rights reserved.
739 |
740 | Based on Markdown
741 | Copyright (c) 2003-2006 John Gruber
742 |
743 | All rights reserved.
744 |
745 | Redistribution and use in source and binary forms, with or without
746 | modification, are permitted provided that the following conditions are
747 | met:
748 |
749 | * Redistributions of source code must retain the above copyright notice,
750 | this list of conditions and the following disclaimer.
751 |
752 | * Redistributions in binary form must reproduce the above copyright
753 | notice, this list of conditions and the following disclaimer in the
754 | documentation and/or other materials provided with the distribution.
755 |
756 | * Neither the name "Markdown" nor the names of its contributors may
757 | be used to endorse or promote products derived from this software
758 | without specific prior written permission.
759 |
760 | This software is provided by the copyright holders and contributors "as
761 | is" and any express or implied warranties, including, but not limited
762 | to, the implied warranties of merchantability and fitness for a
763 | particular purpose are disclaimed. In no event shall the copyright owner
764 | or contributors be liable for any direct, indirect, incidental, special,
765 | exemplary, or consequential damages (including, but not limited to,
766 | procurement of substitute goods or services; loss of use, data, or
767 | profits; or business interruption) however caused and on any theory of
768 | liability, whether in contract, strict liability, or tort (including
769 | negligence or otherwise) arising in any way out of the use of this
770 | software, even if advised of the possibility of such damage.
771 |
--------------------------------------------------------------------------------
/php-markdown-modified.php:
--------------------------------------------------------------------------------
1 |
8 | #
9 | # Original Markdown
10 | # Copyright (c) 2004-2006 John Gruber
11 | #
12 | #
13 |
14 |
15 | define( 'AJF_MARKDOWN_VERSION', "1.0.1o" ); # Sun 8 Jan 2012
16 |
17 |
18 | #
19 | # Global default settings:
20 | #
21 |
22 | # Change to ">" for HTML output
23 | @define( 'MARKDOWN_EMPTY_ELEMENT_SUFFIX', " />");
24 |
25 | # Define the width of a tab for code blocks.
26 | @define( 'MARKDOWN_TAB_WIDTH', 4 );
27 |
28 |
29 | #
30 | # WordPress settings:
31 | #
32 |
33 | # Change to false to remove Markdown from posts and/or comments.
34 | @define( 'MARKDOWN_WP_POSTS', true );
35 | @define( 'MARKDOWN_WP_COMMENTS', true );
36 |
37 |
38 |
39 | ### Standard Function Interface ###
40 |
41 | @define( 'MARKDOWN_PARSER_CLASS', 'AJF_Markdown_Parser' );
42 |
43 | function AJF_Markdown($text) {
44 | #
45 | # Initialize the parser and return the result of its transform method.
46 | #
47 | # Setup static parser variable.
48 | static $parser;
49 | if (!isset($parser)) {
50 | $parser_class = MARKDOWN_PARSER_CLASS;
51 | $parser = new $parser_class;
52 | }
53 |
54 | # Transform text using parser.
55 | return $parser->transform($text);
56 | }
57 |
58 |
59 | ### WordPress Plugin Interface ###
60 |
61 | /*
62 | Plugin Name: Markdown - Modified
63 | Plugin URI: https://github.com/afragen/php-markdown-modified
64 | GitHub Plugin URI: https://github.com/afragen/php-markdown-modified
65 | Description: Modified to work along side Markdown on Save variants. All posts containing Markdown are rendered regardless of Markdown on Save variant setting. Using PHP Markdown 1.0.1o
66 | Version: 1.0.1
67 | Author: Andy Fragen
68 | */
69 |
70 |
71 | if (isset($wp_version)) {
72 | # More details about how it works here:
73 | #
74 |
75 | # Post content and excerpts
76 | # - Remove WordPress paragraph generator.
77 | # - Run Markdown on excerpt, then remove all tags.
78 | # - Add paragraph tag around the excerpt, but remove it for the excerpt rss.
79 | if (MARKDOWN_WP_POSTS) {
80 | remove_filter('the_content', 'wpautop');
81 | remove_filter('the_content_rss', 'wpautop');
82 | remove_filter('the_excerpt', 'wpautop');
83 | add_filter('the_content', 'AJF_Markdown', 6);
84 | add_filter('the_content_rss', 'AJF_Markdown', 6);
85 | add_filter('get_the_excerpt', 'AJF_Markdown', 6);
86 | add_filter('get_the_excerpt', 'trim', 7);
87 | add_filter('the_excerpt', 'ajf_mdwp_add_p');
88 | add_filter('the_excerpt_rss', 'ajf_mdwp_strip_p');
89 |
90 | remove_filter('content_save_pre', 'balanceTags', 50);
91 | remove_filter('excerpt_save_pre', 'balanceTags', 50);
92 | add_filter('the_content', 'balanceTags', 50);
93 | add_filter('get_the_excerpt', 'balanceTags', 9);
94 | }
95 |
96 | # Comments
97 | # - Remove WordPress paragraph generator.
98 | # - Remove WordPress auto-link generator.
99 | # - Scramble important tags before passing them to the kses filter.
100 | # - Run Markdown on excerpt then remove paragraph tags.
101 | if (MARKDOWN_WP_COMMENTS) {
102 | remove_filter('comment_text', 'wpautop', 30);
103 | remove_filter('comment_text', 'make_clickable');
104 | add_filter('pre_comment_content', 'AJF_Markdown', 6);
105 | add_filter('pre_comment_content', 'ajf_mdwp_hide_tags', 8);
106 | add_filter('pre_comment_content', 'ajf_mdwp_show_tags', 12);
107 | add_filter('get_comment_text', 'AJF_Markdown', 6);
108 | add_filter('get_comment_excerpt', 'AJF_Markdown', 6);
109 | add_filter('get_comment_excerpt', 'ajf_mdwp_strip_p', 7);
110 |
111 | global $mdwp_hidden_tags, $mdwp_placeholders;
112 | $mdwp_hidden_tags = explode(' ',
113 | '
');
114 | $mdwp_placeholders = explode(' ', str_rot13(
115 | 'pEj07ZbbBZ U1kqgh4w4p pre2zmeN6K QTi31t9pre ol0MP1jzJR '.
116 | 'ML5IjmbRol ulANi1NsGY J7zRLJqPul liA8ctl16T K9nhooUHli'));
117 | }
118 |
119 | function ajf_mdwp_add_p($text) {
120 | if (!preg_match('{^$|^<(p|ul|ol|dl|pre|blockquote)>}i', $text)) {
121 | $text = ''.$text.'
';
122 | $text = preg_replace('{\n{2,}}', "
\n\n", $text);
123 | }
124 | return $text;
125 | }
126 |
127 | function ajf_mdwp_strip_p($t) { return preg_replace('{?p>}i', '', $t); }
128 |
129 | function ajf_mdwp_hide_tags($text) {
130 | global $mdwp_hidden_tags, $mdwp_placeholders;
131 | return str_replace($mdwp_hidden_tags, $mdwp_placeholders, $text);
132 | }
133 | function ajf_mdwp_show_tags($text) {
134 | global $mdwp_hidden_tags, $mdwp_placeholders;
135 | return str_replace($mdwp_placeholders, $mdwp_hidden_tags, $text);
136 | }
137 | }
138 |
139 |
140 | ### bBlog Plugin Info ###
141 |
142 | function AJF_identify_modifier_markdown() {
143 | return array(
144 | 'name' => 'markdown',
145 | 'type' => 'modifier',
146 | 'nicename' => 'Markdown',
147 | 'description' => 'A text-to-HTML conversion tool for web writers',
148 | 'authors' => 'Michel Fortin and John Gruber',
149 | 'licence' => 'BSD-like',
150 | 'version' => AJF_MARKDOWN_VERSION,
151 | 'help' => 'Markdown syntax allows you to write using an easy-to-read, easy-to-write plain text format. Based on the original Perl version by John Gruber. More...'
152 | );
153 | }
154 |
155 |
156 | ### Smarty Modifier Interface ###
157 |
158 | function AJF_smarty_modifier_markdown($text) {
159 | return AJF_Markdown($text);
160 | }
161 |
162 |
163 | ### Textile Compatibility Mode ###
164 |
165 | # Rename this file to "classTextile.php" and it can replace Textile everywhere.
166 |
167 | if (strcasecmp(substr(__FILE__, -16), "classTextile.php") == 0) {
168 | # Try to include PHP SmartyPants. Should be in the same directory.
169 | @include_once 'smartypants.php';
170 | # Fake Textile class. It calls Markdown instead.
171 | class Textile {
172 | function TextileThis($text, $lite='', $encode='') {
173 | if ($lite == '' && $encode == '') $text = AJF_Markdown($text);
174 | if (function_exists('SmartyPants')) $text = SmartyPants($text);
175 | return $text;
176 | }
177 | # Fake restricted version: restrictions are not supported for now.
178 | function TextileRestricted($text, $lite='', $noimage='') {
179 | return $this->TextileThis($text, $lite);
180 | }
181 | # Workaround to ensure compatibility with TextPattern 4.0.3.
182 | function blockLite($text) { return $text; }
183 | }
184 | }
185 |
186 |
187 |
188 | #
189 | # Markdown Parser Class
190 | #
191 |
192 | class AJF_Markdown_Parser {
193 |
194 | # Regex to match balanced [brackets].
195 | # Needed to insert a maximum bracked depth while converting to PHP.
196 | var $nested_brackets_depth = 6;
197 | var $nested_brackets_re;
198 |
199 | var $nested_url_parenthesis_depth = 4;
200 | var $nested_url_parenthesis_re;
201 |
202 | # Table of hash values for escaped characters:
203 | var $escape_chars = '\`*_{}[]()>#+-.!';
204 | var $escape_chars_re;
205 |
206 | # Change to ">" for HTML output.
207 | var $empty_element_suffix = MARKDOWN_EMPTY_ELEMENT_SUFFIX;
208 | var $tab_width = MARKDOWN_TAB_WIDTH;
209 |
210 | # Change to `true` to disallow markup or entities.
211 | var $no_markup = false;
212 | var $no_entities = false;
213 |
214 | # Predefined urls and titles for reference links and images.
215 | var $predef_urls = array();
216 | var $predef_titles = array();
217 |
218 |
219 | function AJF_Markdown_Parser() {
220 | #
221 | # Constructor function. Initialize appropriate member variables.
222 | #
223 | $this->_initDetab();
224 | $this->prepareItalicsAndBold();
225 |
226 | $this->nested_brackets_re =
227 | str_repeat('(?>[^\[\]]+|\[', $this->nested_brackets_depth).
228 | str_repeat('\])*', $this->nested_brackets_depth);
229 |
230 | $this->nested_url_parenthesis_re =
231 | str_repeat('(?>[^()\s]+|\(', $this->nested_url_parenthesis_depth).
232 | str_repeat('(?>\)))*', $this->nested_url_parenthesis_depth);
233 |
234 | $this->escape_chars_re = '['.preg_quote($this->escape_chars).']';
235 |
236 | # Sort document, block, and span gamut in ascendent priority order.
237 | asort($this->document_gamut);
238 | asort($this->block_gamut);
239 | asort($this->span_gamut);
240 | }
241 |
242 |
243 | # Internal hashes used during transformation.
244 | var $urls = array();
245 | var $titles = array();
246 | var $html_hashes = array();
247 |
248 | # Status flag to avoid invalid nesting.
249 | var $in_anchor = false;
250 |
251 |
252 | function setup() {
253 | #
254 | # Called before the transformation process starts to setup parser
255 | # states.
256 | #
257 | # Clear global hashes.
258 | $this->urls = $this->predef_urls;
259 | $this->titles = $this->predef_titles;
260 | $this->html_hashes = array();
261 |
262 | $in_anchor = false;
263 | }
264 |
265 | function teardown() {
266 | #
267 | # Called after the transformation process to clear any variable
268 | # which may be taking up memory unnecessarly.
269 | #
270 | $this->urls = array();
271 | $this->titles = array();
272 | $this->html_hashes = array();
273 | }
274 |
275 |
276 | function transform($text) {
277 | #
278 | # Main function. Performs some preprocessing on the input text
279 | # and pass it through the document gamut.
280 | #
281 | $this->setup();
282 |
283 | # Remove UTF-8 BOM and marker character in input, if present.
284 | $text = preg_replace('{^\xEF\xBB\xBF|\x1A}', '', $text);
285 |
286 | # Standardize line endings:
287 | # DOS to Unix and Mac to Unix
288 | $text = preg_replace('{\r\n?}', "\n", $text);
289 |
290 | # Make sure $text ends with a couple of newlines:
291 | $text .= "\n\n";
292 |
293 | # Convert all tabs to spaces.
294 | $text = $this->detab($text);
295 |
296 | # Turn block-level HTML blocks into hash entries
297 | $text = $this->hashHTMLBlocks($text);
298 |
299 | # Strip any lines consisting only of spaces and tabs.
300 | # This makes subsequent regexen easier to write, because we can
301 | # match consecutive blank lines with /\n+/ instead of something
302 | # contorted like /[ ]*\n+/ .
303 | $text = preg_replace('/^[ ]+$/m', '', $text);
304 |
305 | # Run document gamut methods.
306 | foreach ($this->document_gamut as $method => $priority) {
307 | $text = $this->$method($text);
308 | }
309 |
310 | $this->teardown();
311 |
312 | return $text . "\n";
313 | }
314 |
315 | var $document_gamut = array(
316 | # Strip link definitions, store in hashes.
317 | "stripLinkDefinitions" => 20,
318 |
319 | "runBasicBlockGamut" => 30,
320 | );
321 |
322 |
323 | function stripLinkDefinitions($text) {
324 | #
325 | # Strips link definitions from text, stores the URLs and titles in
326 | # hash references.
327 | #
328 | $less_than_tab = $this->tab_width - 1;
329 |
330 | # Link defs are in the form: ^[id]: url "optional title"
331 | $text = preg_replace_callback('{
332 | ^[ ]{0,'.$less_than_tab.'}\[(.+)\][ ]?: # id = $1
333 | [ ]*
334 | \n? # maybe *one* newline
335 | [ ]*
336 | (?:
337 | <(.+?)> # url = $2
338 | |
339 | (\S+?) # url = $3
340 | )
341 | [ ]*
342 | \n? # maybe one newline
343 | [ ]*
344 | (?:
345 | (?<=\s) # lookbehind for whitespace
346 | ["(]
347 | (.*?) # title = $4
348 | [")]
349 | [ ]*
350 | )? # title is optional
351 | (?:\n+|\Z)
352 | }xm',
353 | array(&$this, '_stripLinkDefinitions_callback'),
354 | $text);
355 | return $text;
356 | }
357 | function _stripLinkDefinitions_callback($matches) {
358 | $link_id = strtolower($matches[1]);
359 | $url = $matches[2] == '' ? $matches[3] : $matches[2];
360 | $this->urls[$link_id] = $url;
361 | $this->titles[$link_id] =& $matches[4];
362 | return ''; # String that will replace the block
363 | }
364 |
365 |
366 | function hashHTMLBlocks($text) {
367 | if ($this->no_markup) return $text;
368 |
369 | $less_than_tab = $this->tab_width - 1;
370 |
371 | # Hashify HTML blocks:
372 | # We only want to do this for block-level HTML tags, such as headers,
373 | # lists, and tables. That's because we still want to wrap
s around
374 | # "paragraphs" that are wrapped in non-block-level tags, such as anchors,
375 | # phrase emphasis, and spans. The list of tags we're looking for is
376 | # hard-coded:
377 | #
378 | # * List "a" is made of tags which can be both inline or block-level.
379 | # These will be treated block-level when the start tag is alone on
380 | # its line, otherwise they're not matched here and will be taken as
381 | # inline later.
382 | # * List "b" is made of tags which are always block-level;
383 | #
384 | $block_tags_a_re = 'ins|del';
385 | $block_tags_b_re = 'p|div|h[1-6]|blockquote|pre|table|dl|ol|ul|address|'.
386 | 'script|noscript|form|fieldset|iframe|math';
387 |
388 | # Regular expression for the content of a block tag.
389 | $nested_tags_level = 4;
390 | $attr = '
391 | (?> # optional tag attributes
392 | \s # starts with whitespace
393 | (?>
394 | [^>"/]+ # text outside quotes
395 | |
396 | /+(?!>) # slash not followed by ">"
397 | |
398 | "[^"]*" # text inside double quotes (tolerate ">")
399 | |
400 | \'[^\']*\' # text inside single quotes (tolerate ">")
401 | )*
402 | )?
403 | ';
404 | $content =
405 | str_repeat('
406 | (?>
407 | [^<]+ # content without tag
408 | |
409 | <\2 # nested opening tag
410 | '.$attr.' # attributes
411 | (?>
412 | />
413 | |
414 | >', $nested_tags_level). # end of opening tag
415 | '.*?'. # last level nested tag content
416 | str_repeat('
417 | \2\s*> # closing nested tag
418 | )
419 | |
420 | <(?!/\2\s*> # other tags with a different name
421 | )
422 | )*',
423 | $nested_tags_level);
424 | $content2 = str_replace('\2', '\3', $content);
425 |
426 | # First, look for nested blocks, e.g.:
427 | #
428 | #
429 | # tags for inner block must be indented.
430 | #
431 | #
432 | #
433 | # The outermost tags must start at the left margin for this to match, and
434 | # the inner nested divs must be indented.
435 | # We need to do this before the next, more liberal match, because the next
436 | # match will start at the first `` and stop at the first `
`.
437 | $text = preg_replace_callback('{(?>
438 | (?>
439 | (?<=\n\n) # Starting after a blank line
440 | | # or
441 | \A\n? # the beginning of the doc
442 | )
443 | ( # save in $1
444 |
445 | # Match from `\n` to `\n`, handling nested tags
446 | # in between.
447 |
448 | [ ]{0,'.$less_than_tab.'}
449 | <('.$block_tags_b_re.')# start tag = $2
450 | '.$attr.'> # attributes followed by > and \n
451 | '.$content.' # content, support nesting
452 | \2> # the matching end tag
453 | [ ]* # trailing spaces/tabs
454 | (?=\n+|\Z) # followed by a newline or end of document
455 |
456 | | # Special version for tags of group a.
457 |
458 | [ ]{0,'.$less_than_tab.'}
459 | <('.$block_tags_a_re.')# start tag = $3
460 | '.$attr.'>[ ]*\n # attributes followed by >
461 | '.$content2.' # content, support nesting
462 | \3> # the matching end tag
463 | [ ]* # trailing spaces/tabs
464 | (?=\n+|\Z) # followed by a newline or end of document
465 |
466 | | # Special case just for
. It was easier to make a special
467 | # case than to make the other regex more complicated.
468 |
469 | [ ]{0,'.$less_than_tab.'}
470 | <(hr) # start tag = $2
471 | '.$attr.' # attributes
472 | /?> # the matching end tag
473 | [ ]*
474 | (?=\n{2,}|\Z) # followed by a blank line or end of document
475 |
476 | | # Special case for standalone HTML comments:
477 |
478 | [ ]{0,'.$less_than_tab.'}
479 | (?s:
480 |
481 | )
482 | [ ]*
483 | (?=\n{2,}|\Z) # followed by a blank line or end of document
484 |
485 | | # PHP and ASP-style processor instructions ( and <%)
486 |
487 | [ ]{0,'.$less_than_tab.'}
488 | (?s:
489 | <([?%]) # $2
490 | .*?
491 | \2>
492 | )
493 | [ ]*
494 | (?=\n{2,}|\Z) # followed by a blank line or end of document
495 |
496 | )
497 | )}Sxmi',
498 | array(&$this, '_hashHTMLBlocks_callback'),
499 | $text);
500 |
501 | return $text;
502 | }
503 | function _hashHTMLBlocks_callback($matches) {
504 | $text = $matches[1];
505 | $key = $this->hashBlock($text);
506 | return "\n\n$key\n\n";
507 | }
508 |
509 |
510 | function hashPart($text, $boundary = 'X') {
511 | #
512 | # Called whenever a tag must be hashed when a function insert an atomic
513 | # element in the text stream. Passing $text to through this function gives
514 | # a unique text-token which will be reverted back when calling unhash.
515 | #
516 | # The $boundary argument specify what character should be used to surround
517 | # the token. By convension, "B" is used for block elements that needs not
518 | # to be wrapped into paragraph tags at the end, ":" is used for elements
519 | # that are word separators and "X" is used in the general case.
520 | #
521 | # Swap back any tag hash found in $text so we do not have to `unhash`
522 | # multiple times at the end.
523 | $text = $this->unhash($text);
524 |
525 | # Then hash the block.
526 | static $i = 0;
527 | $key = "$boundary\x1A" . ++$i . $boundary;
528 | $this->html_hashes[$key] = $text;
529 | return $key; # String that will replace the tag.
530 | }
531 |
532 |
533 | function hashBlock($text) {
534 | #
535 | # Shortcut function for hashPart with block-level boundaries.
536 | #
537 | return $this->hashPart($text, 'B');
538 | }
539 |
540 |
541 | var $block_gamut = array(
542 | #
543 | # These are all the transformations that form block-level
544 | # tags like paragraphs, headers, and list items.
545 | #
546 | "doHeaders" => 10,
547 | "doHorizontalRules" => 20,
548 |
549 | "doLists" => 40,
550 | "doCodeBlocks" => 50,
551 | "doBlockQuotes" => 60,
552 | );
553 |
554 | function runBlockGamut($text) {
555 | #
556 | # Run block gamut tranformations.
557 | #
558 | # We need to escape raw HTML in Markdown source before doing anything
559 | # else. This need to be done for each block, and not only at the
560 | # begining in the Markdown function since hashed blocks can be part of
561 | # list items and could have been indented. Indented blocks would have
562 | # been seen as a code block in a previous pass of hashHTMLBlocks.
563 | $text = $this->hashHTMLBlocks($text);
564 |
565 | return $this->runBasicBlockGamut($text);
566 | }
567 |
568 | function runBasicBlockGamut($text) {
569 | #
570 | # Run block gamut tranformations, without hashing HTML blocks. This is
571 | # useful when HTML blocks are known to be already hashed, like in the first
572 | # whole-document pass.
573 | #
574 | foreach ($this->block_gamut as $method => $priority) {
575 | $text = $this->$method($text);
576 | }
577 |
578 | # Finally form paragraph and restore hashed blocks.
579 | $text = $this->formParagraphs($text);
580 |
581 | return $text;
582 | }
583 |
584 |
585 | function doHorizontalRules($text) {
586 | # Do Horizontal Rules:
587 | return preg_replace(
588 | '{
589 | ^[ ]{0,3} # Leading space
590 | ([-*_]) # $1: First marker
591 | (?> # Repeated marker group
592 | [ ]{0,2} # Zero, one, or two spaces.
593 | \1 # Marker character
594 | ){2,} # Group repeated at least twice
595 | [ ]* # Tailing spaces
596 | $ # End of line.
597 | }mx',
598 | "\n".$this->hashBlock("
empty_element_suffix")."\n",
599 | $text);
600 | }
601 |
602 |
603 | var $span_gamut = array(
604 | #
605 | # These are all the transformations that occur *within* block-level
606 | # tags like paragraphs, headers, and list items.
607 | #
608 | # Process character escapes, code spans, and inline HTML
609 | # in one shot.
610 | "parseSpan" => -30,
611 |
612 | # Process anchor and image tags. Images must come first,
613 | # because ![foo][f] looks like an anchor.
614 | "doImages" => 10,
615 | "doAnchors" => 20,
616 |
617 | # Make links out of things like ``
618 | # Must come after doAnchors, because you can use < and >
619 | # delimiters in inline links like [this]().
620 | "doAutoLinks" => 30,
621 | "encodeAmpsAndAngles" => 40,
622 |
623 | "doItalicsAndBold" => 50,
624 | "doHardBreaks" => 60,
625 | );
626 |
627 | function runSpanGamut($text) {
628 | #
629 | # Run span gamut tranformations.
630 | #
631 | foreach ($this->span_gamut as $method => $priority) {
632 | $text = $this->$method($text);
633 | }
634 |
635 | return $text;
636 | }
637 |
638 |
639 | function doHardBreaks($text) {
640 | # Do hard breaks:
641 | return preg_replace_callback('/ {2,}\n/',
642 | array(&$this, '_doHardBreaks_callback'), $text);
643 | }
644 | function _doHardBreaks_callback($matches) {
645 | return $this->hashPart("
empty_element_suffix\n");
646 | }
647 |
648 |
649 | function doAnchors($text) {
650 | #
651 | # Turn Markdown link shortcuts into XHTML tags.
652 | #
653 | if ($this->in_anchor) return $text;
654 | $this->in_anchor = true;
655 |
656 | #
657 | # First, handle reference-style links: [link text] [id]
658 | #
659 | $text = preg_replace_callback('{
660 | ( # wrap whole match in $1
661 | \[
662 | ('.$this->nested_brackets_re.') # link text = $2
663 | \]
664 |
665 | [ ]? # one optional space
666 | (?:\n[ ]*)? # one optional newline followed by spaces
667 |
668 | \[
669 | (.*?) # id = $3
670 | \]
671 | )
672 | }xs',
673 | array(&$this, '_doAnchors_reference_callback'), $text);
674 |
675 | #
676 | # Next, inline-style links: [link text](url "optional title")
677 | #
678 | $text = preg_replace_callback('{
679 | ( # wrap whole match in $1
680 | \[
681 | ('.$this->nested_brackets_re.') # link text = $2
682 | \]
683 | \( # literal paren
684 | [ \n]*
685 | (?:
686 | <(.+?)> # href = $3
687 | |
688 | ('.$this->nested_url_parenthesis_re.') # href = $4
689 | )
690 | [ \n]*
691 | ( # $5
692 | ([\'"]) # quote char = $6
693 | (.*?) # Title = $7
694 | \6 # matching quote
695 | [ \n]* # ignore any spaces/tabs between closing quote and )
696 | )? # title is optional
697 | \)
698 | )
699 | }xs',
700 | array(&$this, '_doAnchors_inline_callback'), $text);
701 |
702 | #
703 | # Last, handle reference-style shortcuts: [link text]
704 | # These must come last in case you've also got [link text][1]
705 | # or [link text](/foo)
706 | #
707 | $text = preg_replace_callback('{
708 | ( # wrap whole match in $1
709 | \[
710 | ([^\[\]]+) # link text = $2; can\'t contain [ or ]
711 | \]
712 | )
713 | }xs',
714 | array(&$this, '_doAnchors_reference_callback'), $text);
715 |
716 | $this->in_anchor = false;
717 | return $text;
718 | }
719 | function _doAnchors_reference_callback($matches) {
720 | $whole_match = $matches[1];
721 | $link_text = $matches[2];
722 | $link_id =& $matches[3];
723 |
724 | if ($link_id == "") {
725 | # for shortcut links like [this][] or [this].
726 | $link_id = $link_text;
727 | }
728 |
729 | # lower-case and turn embedded newlines into spaces
730 | $link_id = strtolower($link_id);
731 | $link_id = preg_replace('{[ ]?\n}', ' ', $link_id);
732 |
733 | if (isset($this->urls[$link_id])) {
734 | $url = $this->urls[$link_id];
735 | $url = $this->encodeAttribute($url);
736 |
737 | $result = "titles[$link_id] ) ) {
739 | $title = $this->titles[$link_id];
740 | $title = $this->encodeAttribute($title);
741 | $result .= " title=\"$title\"";
742 | }
743 |
744 | $link_text = $this->runSpanGamut($link_text);
745 | $result .= ">$link_text";
746 | $result = $this->hashPart($result);
747 | }
748 | else {
749 | $result = $whole_match;
750 | }
751 | return $result;
752 | }
753 | function _doAnchors_inline_callback($matches) {
754 | $whole_match = $matches[1];
755 | $link_text = $this->runSpanGamut($matches[2]);
756 | $url = $matches[3] == '' ? $matches[4] : $matches[3];
757 | $title =& $matches[7];
758 |
759 | $url = $this->encodeAttribute($url);
760 |
761 | $result = "encodeAttribute($title);
764 | $result .= " title=\"$title\"";
765 | }
766 |
767 | $link_text = $this->runSpanGamut($link_text);
768 | $result .= ">$link_text";
769 |
770 | return $this->hashPart($result);
771 | }
772 |
773 |
774 | function doImages($text) {
775 | #
776 | # Turn Markdown image shortcuts into
tags.
777 | #
778 | #
779 | # First, handle reference-style labeled images: ![alt text][id]
780 | #
781 | $text = preg_replace_callback('{
782 | ( # wrap whole match in $1
783 | !\[
784 | ('.$this->nested_brackets_re.') # alt text = $2
785 | \]
786 |
787 | [ ]? # one optional space
788 | (?:\n[ ]*)? # one optional newline followed by spaces
789 |
790 | \[
791 | (.*?) # id = $3
792 | \]
793 |
794 | )
795 | }xs',
796 | array(&$this, '_doImages_reference_callback'), $text);
797 |
798 | #
799 | # Next, handle inline images: 
800 | # Don't forget: encode * and _
801 | #
802 | $text = preg_replace_callback('{
803 | ( # wrap whole match in $1
804 | !\[
805 | ('.$this->nested_brackets_re.') # alt text = $2
806 | \]
807 | \s? # One optional whitespace character
808 | \( # literal paren
809 | [ \n]*
810 | (?:
811 | <(\S*)> # src url = $3
812 | |
813 | ('.$this->nested_url_parenthesis_re.') # src url = $4
814 | )
815 | [ \n]*
816 | ( # $5
817 | ([\'"]) # quote char = $6
818 | (.*?) # title = $7
819 | \6 # matching quote
820 | [ \n]*
821 | )? # title is optional
822 | \)
823 | )
824 | }xs',
825 | array(&$this, '_doImages_inline_callback'), $text);
826 |
827 | return $text;
828 | }
829 | function _doImages_reference_callback($matches) {
830 | $whole_match = $matches[1];
831 | $alt_text = $matches[2];
832 | $link_id = strtolower($matches[3]);
833 |
834 | if ($link_id == "") {
835 | $link_id = strtolower($alt_text); # for shortcut links like ![this][].
836 | }
837 |
838 | $alt_text = $this->encodeAttribute($alt_text);
839 | if (isset($this->urls[$link_id])) {
840 | $url = $this->encodeAttribute($this->urls[$link_id]);
841 | $result = "
titles[$link_id])) {
843 | $title = $this->titles[$link_id];
844 | $title = $this->encodeAttribute($title);
845 | $result .= " title=\"$title\"";
846 | }
847 | $result .= $this->empty_element_suffix;
848 | $result = $this->hashPart($result);
849 | }
850 | else {
851 | # If there's no such link ID, leave intact:
852 | $result = $whole_match;
853 | }
854 |
855 | return $result;
856 | }
857 | function _doImages_inline_callback($matches) {
858 | $whole_match = $matches[1];
859 | $alt_text = $matches[2];
860 | $url = $matches[3] == '' ? $matches[4] : $matches[3];
861 | $title =& $matches[7];
862 |
863 | $alt_text = $this->encodeAttribute($alt_text);
864 | $url = $this->encodeAttribute($url);
865 | $result = "
encodeAttribute($title);
868 | $result .= " title=\"$title\""; # $title already quoted
869 | }
870 | $result .= $this->empty_element_suffix;
871 |
872 | return $this->hashPart($result);
873 | }
874 |
875 |
876 | function doHeaders($text) {
877 | # Setext-style headers:
878 | # Header 1
879 | # ========
880 | #
881 | # Header 2
882 | # --------
883 | #
884 | $text = preg_replace_callback('{ ^(.+?)[ ]*\n(=+|-+)[ ]*\n+ }mx',
885 | array(&$this, '_doHeaders_callback_setext'), $text);
886 |
887 | # atx-style headers:
888 | # # Header 1
889 | # ## Header 2
890 | # ## Header 2 with closing hashes ##
891 | # ...
892 | # ###### Header 6
893 | #
894 | $text = preg_replace_callback('{
895 | ^(\#{1,6}) # $1 = string of #\'s
896 | [ ]*
897 | (.+?) # $2 = Header text
898 | [ ]*
899 | \#* # optional closing #\'s (not counted)
900 | \n+
901 | }xm',
902 | array(&$this, '_doHeaders_callback_atx'), $text);
903 |
904 | return $text;
905 | }
906 | function _doHeaders_callback_setext($matches) {
907 | # Terrible hack to check we haven't found an empty list item.
908 | if ($matches[2] == '-' && preg_match('{^-(?: |$)}', $matches[1]))
909 | return $matches[0];
910 |
911 | $level = $matches[2]{0} == '=' ? 1 : 2;
912 | $block = "".$this->runSpanGamut($matches[1])."";
913 | return "\n" . $this->hashBlock($block) . "\n\n";
914 | }
915 | function _doHeaders_callback_atx($matches) {
916 | $level = strlen($matches[1]);
917 | $block = "".$this->runSpanGamut($matches[2])."";
918 | return "\n" . $this->hashBlock($block) . "\n\n";
919 | }
920 |
921 |
922 | function doLists($text) {
923 | #
924 | # Form HTML ordered (numbered) and unordered (bulleted) lists.
925 | #
926 | $less_than_tab = $this->tab_width - 1;
927 |
928 | # Re-usable patterns to match list item bullets and number markers:
929 | $marker_ul_re = '[*+-]';
930 | $marker_ol_re = '\d+[\.]';
931 | $marker_any_re = "(?:$marker_ul_re|$marker_ol_re)";
932 |
933 | $markers_relist = array(
934 | $marker_ul_re => $marker_ol_re,
935 | $marker_ol_re => $marker_ul_re,
936 | );
937 |
938 | foreach ($markers_relist as $marker_re => $other_marker_re) {
939 | # Re-usable pattern to match any entirel ul or ol list:
940 | $whole_list_re = '
941 | ( # $1 = whole list
942 | ( # $2
943 | ([ ]{0,'.$less_than_tab.'}) # $3 = number of spaces
944 | ('.$marker_re.') # $4 = first list item marker
945 | [ ]+
946 | )
947 | (?s:.+?)
948 | ( # $5
949 | \z
950 | |
951 | \n{2,}
952 | (?=\S)
953 | (?! # Negative lookahead for another list item marker
954 | [ ]*
955 | '.$marker_re.'[ ]+
956 | )
957 | |
958 | (?= # Lookahead for another kind of list
959 | \n
960 | \3 # Must have the same indentation
961 | '.$other_marker_re.'[ ]+
962 | )
963 | )
964 | )
965 | '; // mx
966 |
967 | # We use a different prefix before nested lists than top-level lists.
968 | # See extended comment in _ProcessListItems().
969 |
970 | if ($this->list_level) {
971 | $text = preg_replace_callback('{
972 | ^
973 | '.$whole_list_re.'
974 | }mx',
975 | array(&$this, '_doLists_callback'), $text);
976 | }
977 | else {
978 | $text = preg_replace_callback('{
979 | (?:(?<=\n)\n|\A\n?) # Must eat the newline
980 | '.$whole_list_re.'
981 | }mx',
982 | array(&$this, '_doLists_callback'), $text);
983 | }
984 | }
985 |
986 | return $text;
987 | }
988 | function _doLists_callback($matches) {
989 | # Re-usable patterns to match list item bullets and number markers:
990 | $marker_ul_re = '[*+-]';
991 | $marker_ol_re = '\d+[\.]';
992 | $marker_any_re = "(?:$marker_ul_re|$marker_ol_re)";
993 |
994 | $list = $matches[1];
995 | $list_type = preg_match("/$marker_ul_re/", $matches[4]) ? "ul" : "ol";
996 |
997 | $marker_any_re = ( $list_type == "ul" ? $marker_ul_re : $marker_ol_re );
998 |
999 | $list .= "\n";
1000 | $result = $this->processListItems($list, $marker_any_re);
1001 |
1002 | $result = $this->hashBlock("<$list_type>\n" . $result . "$list_type>");
1003 | return "\n". $result ."\n\n";
1004 | }
1005 |
1006 | var $list_level = 0;
1007 |
1008 | function processListItems($list_str, $marker_any_re) {
1009 | #
1010 | # Process the contents of a single ordered or unordered list, splitting it
1011 | # into individual list items.
1012 | #
1013 | # The $this->list_level global keeps track of when we're inside a list.
1014 | # Each time we enter a list, we increment it; when we leave a list,
1015 | # we decrement. If it's zero, we're not in a list anymore.
1016 | #
1017 | # We do this because when we're not inside a list, we want to treat
1018 | # something like this:
1019 | #
1020 | # I recommend upgrading to version
1021 | # 8. Oops, now this line is treated
1022 | # as a sub-list.
1023 | #
1024 | # As a single paragraph, despite the fact that the second line starts
1025 | # with a digit-period-space sequence.
1026 | #
1027 | # Whereas when we're inside a list (or sub-list), that line will be
1028 | # treated as the start of a sub-list. What a kludge, huh? This is
1029 | # an aspect of Markdown's syntax that's hard to parse perfectly
1030 | # without resorting to mind-reading. Perhaps the solution is to
1031 | # change the syntax rules such that sub-lists must start with a
1032 | # starting cardinal number; e.g. "1." or "a.".
1033 |
1034 | $this->list_level++;
1035 |
1036 | # trim trailing blank lines:
1037 | $list_str = preg_replace("/\n{2,}\\z/", "\n", $list_str);
1038 |
1039 | $list_str = preg_replace_callback('{
1040 | (\n)? # leading line = $1
1041 | (^[ ]*) # leading whitespace = $2
1042 | ('.$marker_any_re.' # list marker and space = $3
1043 | (?:[ ]+|(?=\n)) # space only required if item is not empty
1044 | )
1045 | ((?s:.*?)) # list item text = $4
1046 | (?:(\n+(?=\n))|\n) # tailing blank line = $5
1047 | (?= \n* (\z | \2 ('.$marker_any_re.') (?:[ ]+|(?=\n))))
1048 | }xm',
1049 | array(&$this, '_processListItems_callback'), $list_str);
1050 |
1051 | $this->list_level--;
1052 | return $list_str;
1053 | }
1054 | function _processListItems_callback($matches) {
1055 | $item = $matches[4];
1056 | $leading_line =& $matches[1];
1057 | $leading_space =& $matches[2];
1058 | $marker_space = $matches[3];
1059 | $tailing_blank_line =& $matches[5];
1060 |
1061 | if ($leading_line || $tailing_blank_line ||
1062 | preg_match('/\n{2,}/', $item))
1063 | {
1064 | # Replace marker with the appropriate whitespace indentation
1065 | $item = $leading_space . str_repeat(' ', strlen($marker_space)) . $item;
1066 | $item = $this->runBlockGamut($this->outdent($item)."\n");
1067 | }
1068 | else {
1069 | # Recursion for sub-lists:
1070 | $item = $this->doLists($this->outdent($item));
1071 | $item = preg_replace('/\n+$/', '', $item);
1072 | $item = $this->runSpanGamut($item);
1073 | }
1074 |
1075 | return "" . $item . "\n";
1076 | }
1077 |
1078 |
1079 | function doCodeBlocks($text) {
1080 | #
1081 | # Process Markdown `` blocks.
1082 | #
1083 | $text = preg_replace_callback('{
1084 | (?:\n\n|\A\n?)
1085 | ( # $1 = the code block -- one or more lines, starting with a space/tab
1086 | (?>
1087 | [ ]{'.$this->tab_width.'} # Lines must start with a tab or a tab-width of spaces
1088 | .*\n+
1089 | )+
1090 | )
1091 | ((?=^[ ]{0,'.$this->tab_width.'}\S)|\Z) # Lookahead for non-space at line-start, or end of doc
1092 | }xm',
1093 | array(&$this, '_doCodeBlocks_callback'), $text);
1094 |
1095 | return $text;
1096 | }
1097 | function _doCodeBlocks_callback($matches) {
1098 | $codeblock = $matches[1];
1099 |
1100 | $codeblock = $this->outdent($codeblock);
1101 | $codeblock = htmlspecialchars($codeblock, ENT_NOQUOTES);
1102 |
1103 | # trim leading newlines and trailing newlines
1104 | $codeblock = preg_replace('/\A\n+|\n+\z/', '', $codeblock);
1105 |
1106 | $codeblock = "$codeblock\n
";
1107 | return "\n\n".$this->hashBlock($codeblock)."\n\n";
1108 | }
1109 |
1110 |
1111 | function makeCodeSpan($code) {
1112 | #
1113 | # Create a code span markup for $code. Called from handleSpanToken.
1114 | #
1115 | $code = htmlspecialchars(trim($code), ENT_NOQUOTES);
1116 | return $this->hashPart("$code");
1117 | }
1118 |
1119 |
1120 | var $em_relist = array(
1121 | '' => '(?:(? '(?<=\S|^)(? '(?<=\S|^)(? '(?:(? '(?<=\S|^)(? '(?<=\S|^)(? '(?:(? '(?<=\S|^)(? '(?<=\S|^)(?em_relist as $em => $em_re) {
1143 | foreach ($this->strong_relist as $strong => $strong_re) {
1144 | # Construct list of allowed token expressions.
1145 | $token_relist = array();
1146 | if (isset($this->em_strong_relist["$em$strong"])) {
1147 | $token_relist[] = $this->em_strong_relist["$em$strong"];
1148 | }
1149 | $token_relist[] = $em_re;
1150 | $token_relist[] = $strong_re;
1151 |
1152 | # Construct master expression from list.
1153 | $token_re = '{('. implode('|', $token_relist) .')}';
1154 | $this->em_strong_prepared_relist["$em$strong"] = $token_re;
1155 | }
1156 | }
1157 | }
1158 |
1159 | function doItalicsAndBold($text) {
1160 | $token_stack = array('');
1161 | $text_stack = array('');
1162 | $em = '';
1163 | $strong = '';
1164 | $tree_char_em = false;
1165 |
1166 | while (1) {
1167 | #
1168 | # Get prepared regular expression for seraching emphasis tokens
1169 | # in current context.
1170 | #
1171 | $token_re = $this->em_strong_prepared_relist["$em$strong"];
1172 |
1173 | #
1174 | # Each loop iteration search for the next emphasis token.
1175 | # Each token is then passed to handleSpanToken.
1176 | #
1177 | $parts = preg_split($token_re, $text, 2, PREG_SPLIT_DELIM_CAPTURE);
1178 | $text_stack[0] .= $parts[0];
1179 | $token =& $parts[1];
1180 | $text =& $parts[2];
1181 |
1182 | if (empty($token)) {
1183 | # Reached end of text span: empty stack without emitting.
1184 | # any more emphasis.
1185 | while ($token_stack[0]) {
1186 | $text_stack[1] .= array_shift($token_stack);
1187 | $text_stack[0] .= array_shift($text_stack);
1188 | }
1189 | break;
1190 | }
1191 |
1192 | $token_len = strlen($token);
1193 | if ($tree_char_em) {
1194 | # Reached closing marker while inside a three-char emphasis.
1195 | if ($token_len == 3) {
1196 | # Three-char closing marker, close em and strong.
1197 | array_shift($token_stack);
1198 | $span = array_shift($text_stack);
1199 | $span = $this->runSpanGamut($span);
1200 | $span = "$span";
1201 | $text_stack[0] .= $this->hashPart($span);
1202 | $em = '';
1203 | $strong = '';
1204 | } else {
1205 | # Other closing marker: close one em or strong and
1206 | # change current token state to match the other
1207 | $token_stack[0] = str_repeat($token{0}, 3-$token_len);
1208 | $tag = $token_len == 2 ? "strong" : "em";
1209 | $span = $text_stack[0];
1210 | $span = $this->runSpanGamut($span);
1211 | $span = "<$tag>$span$tag>";
1212 | $text_stack[0] = $this->hashPart($span);
1213 | $$tag = ''; # $$tag stands for $em or $strong
1214 | }
1215 | $tree_char_em = false;
1216 | } else if ($token_len == 3) {
1217 | if ($em) {
1218 | # Reached closing marker for both em and strong.
1219 | # Closing strong marker:
1220 | for ($i = 0; $i < 2; ++$i) {
1221 | $shifted_token = array_shift($token_stack);
1222 | $tag = strlen($shifted_token) == 2 ? "strong" : "em";
1223 | $span = array_shift($text_stack);
1224 | $span = $this->runSpanGamut($span);
1225 | $span = "<$tag>$span$tag>";
1226 | $text_stack[0] .= $this->hashPart($span);
1227 | $$tag = ''; # $$tag stands for $em or $strong
1228 | }
1229 | } else {
1230 | # Reached opening three-char emphasis marker. Push on token
1231 | # stack; will be handled by the special condition above.
1232 | $em = $token{0};
1233 | $strong = "$em$em";
1234 | array_unshift($token_stack, $token);
1235 | array_unshift($text_stack, '');
1236 | $tree_char_em = true;
1237 | }
1238 | } else if ($token_len == 2) {
1239 | if ($strong) {
1240 | # Unwind any dangling emphasis marker:
1241 | if (strlen($token_stack[0]) == 1) {
1242 | $text_stack[1] .= array_shift($token_stack);
1243 | $text_stack[0] .= array_shift($text_stack);
1244 | }
1245 | # Closing strong marker:
1246 | array_shift($token_stack);
1247 | $span = array_shift($text_stack);
1248 | $span = $this->runSpanGamut($span);
1249 | $span = "$span";
1250 | $text_stack[0] .= $this->hashPart($span);
1251 | $strong = '';
1252 | } else {
1253 | array_unshift($token_stack, $token);
1254 | array_unshift($text_stack, '');
1255 | $strong = $token;
1256 | }
1257 | } else {
1258 | # Here $token_len == 1
1259 | if ($em) {
1260 | if (strlen($token_stack[0]) == 1) {
1261 | # Closing emphasis marker:
1262 | array_shift($token_stack);
1263 | $span = array_shift($text_stack);
1264 | $span = $this->runSpanGamut($span);
1265 | $span = "$span";
1266 | $text_stack[0] .= $this->hashPart($span);
1267 | $em = '';
1268 | } else {
1269 | $text_stack[0] .= $token;
1270 | }
1271 | } else {
1272 | array_unshift($token_stack, $token);
1273 | array_unshift($text_stack, '');
1274 | $em = $token;
1275 | }
1276 | }
1277 | }
1278 | return $text_stack[0];
1279 | }
1280 |
1281 |
1282 | function doBlockQuotes($text) {
1283 | $text = preg_replace_callback('/
1284 | ( # Wrap whole match in $1
1285 | (?>
1286 | ^[ ]*>[ ]? # ">" at the start of a line
1287 | .+\n # rest of the first line
1288 | (.+\n)* # subsequent consecutive lines
1289 | \n* # blanks
1290 | )+
1291 | )
1292 | /xm',
1293 | array(&$this, '_doBlockQuotes_callback'), $text);
1294 |
1295 | return $text;
1296 | }
1297 | function _doBlockQuotes_callback($matches) {
1298 | $bq = $matches[1];
1299 | # trim one level of quoting - trim whitespace-only lines
1300 | $bq = preg_replace('/^[ ]*>[ ]?|^[ ]+$/m', '', $bq);
1301 | $bq = $this->runBlockGamut($bq); # recurse
1302 |
1303 | $bq = preg_replace('/^/m', " ", $bq);
1304 | # These leading spaces cause problem with content,
1305 | # so we need to fix that:
1306 | $bq = preg_replace_callback('{(\s*.+?
)}sx',
1307 | array(&$this, '_doBlockQuotes_callback2'), $bq);
1308 |
1309 | return "\n". $this->hashBlock("\n$bq\n
")."\n\n";
1310 | }
1311 | function _doBlockQuotes_callback2($matches) {
1312 | $pre = $matches[1];
1313 | $pre = preg_replace('/^ /m', '', $pre);
1314 | return $pre;
1315 | }
1316 |
1317 |
1318 | function formParagraphs($text) {
1319 | #
1320 | # Params:
1321 | # $text - string to process with html tags
1322 | #
1323 | # Strip leading and trailing lines:
1324 | $text = preg_replace('/\A\n+|\n+\z/', '', $text);
1325 |
1326 | $grafs = preg_split('/\n{2,}/', $text, -1, PREG_SPLIT_NO_EMPTY);
1327 |
1328 | #
1329 | # Wrap
tags and unhashify HTML blocks
1330 | #
1331 | foreach ($grafs as $key => $value) {
1332 | if (!preg_match('/^B\x1A[0-9]+B$/', $value)) {
1333 | # Is a paragraph.
1334 | $value = $this->runSpanGamut($value);
1335 | $value = preg_replace('/^([ ]*)/', "
", $value);
1336 | $value .= "
";
1337 | $grafs[$key] = $this->unhash($value);
1338 | }
1339 | else {
1340 | # Is a block.
1341 | # Modify elements of @grafs in-place...
1342 | $graf = $value;
1343 | $block = $this->html_hashes[$graf];
1344 | $graf = $block;
1345 | // if (preg_match('{
1346 | // \A
1347 | // ( # $1 = tag
1348 | //
]*
1350 | // \b
1351 | // markdown\s*=\s* ([\'"]) # $2 = attr quote char
1352 | // 1
1353 | // \2
1354 | // [^>]*
1355 | // >
1356 | // )
1357 | // ( # $3 = contents
1358 | // .*
1359 | // )
1360 | // (
) # $4 = closing tag
1361 | // \z
1362 | // }xs', $block, $matches))
1363 | // {
1364 | // list(, $div_open, , $div_content, $div_close) = $matches;
1365 | //
1366 | // # We can't call Markdown(), because that resets the hash;
1367 | // # that initialization code should be pulled into its own sub, though.
1368 | // $div_content = $this->hashHTMLBlocks($div_content);
1369 | //
1370 | // # Run document gamut methods on the content.
1371 | // foreach ($this->document_gamut as $method => $priority) {
1372 | // $div_content = $this->$method($div_content);
1373 | // }
1374 | //
1375 | // $div_open = preg_replace(
1376 | // '{\smarkdown\s*=\s*([\'"]).+?\1}', '', $div_open);
1377 | //
1378 | // $graf = $div_open . "\n" . $div_content . "\n" . $div_close;
1379 | // }
1380 | $grafs[$key] = $graf;
1381 | }
1382 | }
1383 |
1384 | return implode("\n\n", $grafs);
1385 | }
1386 |
1387 |
1388 | function encodeAttribute($text) {
1389 | #
1390 | # Encode text for a double-quoted HTML attribute. This function
1391 | # is *not* suitable for attributes enclosed in single quotes.
1392 | #
1393 | $text = $this->encodeAmpsAndAngles($text);
1394 | $text = str_replace('"', '"', $text);
1395 | return $text;
1396 | }
1397 |
1398 |
1399 | function encodeAmpsAndAngles($text) {
1400 | #
1401 | # Smart processing for ampersands and angle brackets that need to
1402 | # be encoded. Valid character entities are left alone unless the
1403 | # no-entities mode is set.
1404 | #
1405 | if ($this->no_entities) {
1406 | $text = str_replace('&', '&', $text);
1407 | } else {
1408 | # Ampersand-encoding based entirely on Nat Irons's Amputator
1409 | # MT plugin:
1410 | $text = preg_replace('/&(?!#?[xX]?(?:[0-9a-fA-F]+|\w+);)/',
1411 | '&', $text);;
1412 | }
1413 | # Encode remaining <'s
1414 | $text = str_replace('<', '<', $text);
1415 |
1416 | return $text;
1417 | }
1418 |
1419 |
1420 | function doAutoLinks($text) {
1421 | $text = preg_replace_callback('{<((https?|ftp|dict):[^\'">\s]+)>}i',
1422 | array(&$this, '_doAutoLinks_url_callback'), $text);
1423 |
1424 | # Email addresses:
1425 | $text = preg_replace_callback('{
1426 | <
1427 | (?:mailto:)?
1428 | (
1429 | (?:
1430 | [-!#$%&\'*+/=?^_`.{|}~\w\x80-\xFF]+
1431 | |
1432 | ".*?"
1433 | )
1434 | \@
1435 | (?:
1436 | [-a-z0-9\x80-\xFF]+(\.[-a-z0-9\x80-\xFF]+)*\.[a-z]+
1437 | |
1438 | \[[\d.a-fA-F:]+\] # IPv4 & IPv6
1439 | )
1440 | )
1441 | >
1442 | }xi',
1443 | array(&$this, '_doAutoLinks_email_callback'), $text);
1444 |
1445 | return $text;
1446 | }
1447 | function _doAutoLinks_url_callback($matches) {
1448 | $url = $this->encodeAttribute($matches[1]);
1449 | $link = "$url";
1450 | return $this->hashPart($link);
1451 | }
1452 | function _doAutoLinks_email_callback($matches) {
1453 | $address = $matches[1];
1454 | $link = $this->encodeEmailAddress($address);
1455 | return $this->hashPart($link);
1456 | }
1457 |
1458 |
1459 | function encodeEmailAddress($addr) {
1460 | #
1461 | # Input: an email address, e.g. "foo@example.com"
1462 | #
1463 | # Output: the email address as a mailto link, with each character
1464 | # of the address encoded as either a decimal or hex entity, in
1465 | # the hopes of foiling most address harvesting spam bots. E.g.:
1466 | #
1467 | # foo@exampl
1470 | # e.com
1471 | #
1472 | # Based by a filter by Matthew Wickline, posted to BBEdit-Talk.
1473 | # With some optimizations by Milian Wolff.
1474 | #
1475 | $addr = "mailto:" . $addr;
1476 | $chars = preg_split('/(? $char) {
1480 | $ord = ord($char);
1481 | # Ignore non-ascii chars.
1482 | if ($ord < 128) {
1483 | $r = ($seed * (1 + $key)) % 100; # Pseudo-random function.
1484 | # roughly 10% raw, 45% hex, 45% dec
1485 | # '@' *must* be encoded. I insist.
1486 | if ($r > 90 && $char != '@') /* do nothing */;
1487 | else if ($r < 45) $chars[$key] = ''.dechex($ord).';';
1488 | else $chars[$key] = ''.$ord.';';
1489 | }
1490 | }
1491 |
1492 | $addr = implode('', $chars);
1493 | $text = implode('', array_slice($chars, 7)); # text without `mailto:`
1494 | $addr = "$text";
1495 |
1496 | return $addr;
1497 | }
1498 |
1499 |
1500 | function parseSpan($str) {
1501 | #
1502 | # Take the string $str and parse it into tokens, hashing embeded HTML,
1503 | # escaped characters and handling code spans.
1504 | #
1505 | $output = '';
1506 |
1507 | $span_re = '{
1508 | (
1509 | \\\\'.$this->escape_chars_re.'
1510 | |
1511 | (?no_markup ? '' : '
1514 | |
1515 | # comment
1516 | |
1517 | <\?.*?\?> | <%.*?%> # processing instruction
1518 | |
1519 | <[/!$]?[-a-zA-Z0-9:_]+ # regular tags
1520 | (?>
1521 | \s
1522 | (?>[^"\'>]+|"[^"]*"|\'[^\']*\')*
1523 | )?
1524 | >
1525 | ').'
1526 | )
1527 | }xs';
1528 |
1529 | while (1) {
1530 | #
1531 | # Each loop iteration seach for either the next tag, the next
1532 | # openning code span marker, or the next escaped character.
1533 | # Each token is then passed to handleSpanToken.
1534 | #
1535 | $parts = preg_split($span_re, $str, 2, PREG_SPLIT_DELIM_CAPTURE);
1536 |
1537 | # Create token from text preceding tag.
1538 | if ($parts[0] != "") {
1539 | $output .= $parts[0];
1540 | }
1541 |
1542 | # Check if we reach the end.
1543 | if (isset($parts[1])) {
1544 | $output .= $this->handleSpanToken($parts[1], $parts[2]);
1545 | $str = $parts[2];
1546 | }
1547 | else {
1548 | break;
1549 | }
1550 | }
1551 |
1552 | return $output;
1553 | }
1554 |
1555 |
1556 | function handleSpanToken($token, &$str) {
1557 | #
1558 | # Handle $token provided by parseSpan by determining its nature and
1559 | # returning the corresponding value that should replace it.
1560 | #
1561 | switch ($token{0}) {
1562 | case "\\":
1563 | return $this->hashPart("". ord($token{1}). ";");
1564 | case "`":
1565 | # Search for end marker in remaining text.
1566 | if (preg_match('/^(.*?[^`])'.preg_quote($token).'(?!`)(.*)$/sm',
1567 | $str, $matches))
1568 | {
1569 | $str = $matches[2];
1570 | $codespan = $this->makeCodeSpan($matches[1]);
1571 | return $this->hashPart($codespan);
1572 | }
1573 | return $token; // return as text since no ending marker found.
1574 | default:
1575 | return $this->hashPart($token);
1576 | }
1577 | }
1578 |
1579 |
1580 | function outdent($text) {
1581 | #
1582 | # Remove one level of line-leading tabs or spaces
1583 | #
1584 | return preg_replace('/^(\t|[ ]{1,'.$this->tab_width.'})/m', '', $text);
1585 | }
1586 |
1587 |
1588 | # String length function for detab. `_initDetab` will create a function to
1589 | # hanlde UTF-8 if the default function does not exist.
1590 | var $utf8_strlen = 'mb_strlen';
1591 |
1592 | function detab($text) {
1593 | #
1594 | # Replace tabs with the appropriate amount of space.
1595 | #
1596 | # For each line we separate the line in blocks delemited by
1597 | # tab characters. Then we reconstruct every line by adding the
1598 | # appropriate number of space between each blocks.
1599 |
1600 | $text = preg_replace_callback('/^.*\t.*$/m',
1601 | array(&$this, '_detab_callback'), $text);
1602 |
1603 | return $text;
1604 | }
1605 | function _detab_callback($matches) {
1606 | $line = $matches[0];
1607 | $strlen = $this->utf8_strlen; # strlen function for UTF-8.
1608 |
1609 | # Split in blocks.
1610 | $blocks = explode("\t", $line);
1611 | # Add each blocks to the line.
1612 | $line = $blocks[0];
1613 | unset($blocks[0]); # Do not add first block twice.
1614 | foreach ($blocks as $block) {
1615 | # Calculate amount of space, insert spaces, insert block.
1616 | $amount = $this->tab_width -
1617 | $strlen($line, 'UTF-8') % $this->tab_width;
1618 | $line .= str_repeat(" ", $amount) . $block;
1619 | }
1620 | return $line;
1621 | }
1622 | function _initDetab() {
1623 | #
1624 | # Check for the availability of the function in the `utf8_strlen` property
1625 | # (initially `mb_strlen`). If the function is not available, create a
1626 | # function that will loosely count the number of UTF-8 characters with a
1627 | # regular expression.
1628 | #
1629 | if (function_exists($this->utf8_strlen)) return;
1630 | $this->utf8_strlen = create_function('$text', 'return preg_match_all(
1631 | "/[\\\\x00-\\\\xBF]|[\\\\xC0-\\\\xFF][\\\\x80-\\\\xBF]*/",
1632 | $text, $m);');
1633 | }
1634 |
1635 |
1636 | function unhash($text) {
1637 | #
1638 | # Swap back in all the tags hashed by _HashHTMLBlocks.
1639 | #
1640 | return preg_replace_callback('/(.)\x1A[0-9]+\1/',
1641 | array(&$this, '_unhash_callback'), $text);
1642 | }
1643 | function _unhash_callback($matches) {
1644 | return $this->html_hashes[$matches[0]];
1645 | }
1646 |
1647 | }
1648 |
1649 | /*
1650 |
1651 | PHP Markdown
1652 | ============
1653 |
1654 | Description
1655 | -----------
1656 |
1657 | This is a PHP translation of the original Markdown formatter written in
1658 | Perl by John Gruber.
1659 |
1660 | Markdown is a text-to-HTML filter; it translates an easy-to-read /
1661 | easy-to-write structured text format into HTML. Markdown's text format
1662 | is most similar to that of plain text email, and supports features such
1663 | as headers, *emphasis*, code blocks, blockquotes, and links.
1664 |
1665 | Markdown's syntax is designed not as a generic markup language, but
1666 | specifically to serve as a front-end to (X)HTML. You can use span-level
1667 | HTML tags anywhere in a Markdown document, and you can use block level
1668 | HTML tags (like and
as well).
1669 |
1670 | For more information about Markdown's syntax, see:
1671 |
1672 |
1673 |
1674 |
1675 | Bugs
1676 | ----
1677 |
1678 | To file bug reports please send email to:
1679 |
1680 |
1681 |
1682 | Please include with your report: (1) the example input; (2) the output you
1683 | expected; (3) the output Markdown actually produced.
1684 |
1685 |
1686 | Version History
1687 | ---------------
1688 |
1689 | See the readme file for detailed release notes for this version.
1690 |
1691 |
1692 | Copyright and License
1693 | ---------------------
1694 |
1695 | PHP Markdown
1696 | Copyright (c) 2004-2009 Michel Fortin
1697 |
1698 | All rights reserved.
1699 |
1700 | Based on Markdown
1701 | Copyright (c) 2003-2006 John Gruber
1702 |
1703 | All rights reserved.
1704 |
1705 | Redistribution and use in source and binary forms, with or without
1706 | modification, are permitted provided that the following conditions are
1707 | met:
1708 |
1709 | * Redistributions of source code must retain the above copyright notice,
1710 | this list of conditions and the following disclaimer.
1711 |
1712 | * Redistributions in binary form must reproduce the above copyright
1713 | notice, this list of conditions and the following disclaimer in the
1714 | documentation and/or other materials provided with the distribution.
1715 |
1716 | * Neither the name "Markdown" nor the names of its contributors may
1717 | be used to endorse or promote products derived from this software
1718 | without specific prior written permission.
1719 |
1720 | This software is provided by the copyright holders and contributors "as
1721 | is" and any express or implied warranties, including, but not limited
1722 | to, the implied warranties of merchantability and fitness for a
1723 | particular purpose are disclaimed. In no event shall the copyright owner
1724 | or contributors be liable for any direct, indirect, incidental, special,
1725 | exemplary, or consequential damages (including, but not limited to,
1726 | procurement of substitute goods or services; loss of use, data, or
1727 | profits; or business interruption) however caused and on any theory of
1728 | liability, whether in contract, strict liability, or tort (including
1729 | negligence or otherwise) arising in any way out of the use of this
1730 | software, even if advised of the possibility of such damage.
1731 |
1732 | */
1733 | ?>
--------------------------------------------------------------------------------