├── .gitignore ├── Makefile ├── README.rst ├── assorted-text.txt ├── cc-attribution-sharealike-88x31.png ├── markup-history-extended-notes.rst ├── markup-history-long-16x9.pdf ├── markup-history-long-4x3.pdf ├── markup-history-long-wide.rst ├── markup-history-long.rst ├── markup-history-short-4x3.pdf ├── markup-history-short.rst ├── notes-per-slide-long.pdf ├── notes-per-slide-long.rst ├── notes-per-slide-short.pdf ├── notes-per-slide-short.rst └── unused-slides.txt /.gitignore: -------------------------------------------------------------------------------- 1 | # Editor artefacts 2 | *.swo 3 | *.swp 4 | *~ 5 | 6 | # Generated files 7 | *.html 8 | 9 | # Hovercraft artefacts 10 | slides/ 11 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | # This version of the Makefile assumes that pandoc and (enough of) TeX are 2 | # available. 3 | 4 | .PHONY: default 5 | default: html pdf 6 | 7 | # We don't try to provide an HTML version of the slides in this version 8 | # - use the PDF produces by 'slides' instead. 9 | # For various reasons, pandoc won't render markup-history-extended-notes.rst 10 | # as PDF, so we don't bother. 11 | .PHONY: html 12 | html: 13 | rst2html.py README.rst README.html 14 | # github's rendering of reStructuredText uses Linguist to syntax 15 | # highlight code. Unfortunately, it knows the name "roff" for roff 16 | # style code, whilst pygments, used by rst2html.py, does not recognise 17 | # that name. So we shall make do by converting on the fly... 18 | sed -e 's/:: roff/:: groff/' markup-history-extended-notes.rst | rst2html.py > markup-history-extended-notes.html 19 | 20 | # The available aspect ratio of slides (for beamer only) are 1610 for 16:10, 21 | # 169 for 16:9, 149 for 14:9, 141 for 1.41:1, 54 for 5:4, 43 for 4:3 which is 22 | # the default, and 32 # for 3:2. It's probably enough to go for the following 23 | # pair of resolutions. 24 | # We also make the notes-per-slide as PDF, because we can and it might be useful. 25 | .PHONY: pdf 26 | pdf: slides notes 27 | 28 | .PHONY: slides 29 | slides: markup-history-long-4x3.pdf markup-history-long-16x9.pdf markup-history-short-4x3.pdf 30 | 31 | .PHONY: notes 32 | notes: notes-per-slide-long.pdf notes-per-slide-short.pdf 33 | 34 | markup-history-long-4x3.pdf: markup-history-long.rst 35 | pandoc $< -t beamer -o $@ -V aspectratio:43 36 | 37 | markup-history-long-16x9.pdf: markup-history-long-wide.rst 38 | pandoc $< -t beamer -o $@ -V aspectratio:169 39 | 40 | markup-history-short-4x3.pdf: markup-history-short.rst 41 | pandoc $< -t beamer -o $@ -V aspectratio:43 42 | 43 | notes-per-slide-long.pdf: notes-per-slide-long.rst 44 | pandoc $< -o $@ -V papersize:a4 45 | 46 | notes-per-slide-short.pdf: notes-per-slide-short.rst 47 | pandoc $< -o $@ -V papersize:a4 48 | 49 | .PHONY: clean 50 | clean: 51 | rm -f *.html 52 | 53 | .PHONY: help 54 | help: 55 | @echo 'make same as: make html slides' 56 | @echo 'make pdf create markup-history-*-[4x3|16x9].pdf and notes-per-slide-*.pdf' 57 | @echo 'make html create HTML files using rst2html' 58 | @echo 'make clean delete HTML files' 59 | -------------------------------------------------------------------------------- /README.rst: -------------------------------------------------------------------------------- 1 | A history of markup languages 2 | ============================= 3 | 4 | This is a slideshow originally produced for `PyCon UK 2017`_. 5 | 6 | History 7 | ~~~~~~~ 8 | * The version presented at `Write the Docs Prague 2018`_ can be found at tag 9 | WriteTheDocs_Prague_2018_ 10 | 11 | The "short" version of the slides is as actually presented at 12 | `Write the Docs Prague 2018`_. It is meant to be about 30 minutes long. 13 | You can see it `as recorded at WtD Prague 2018`_. This verson *was* 30 14 | minutes long. 15 | 16 | The "long" version of the slides here is as presented at `Write the Docs 17 | Cambridge`_ in `February 2018`_. It is meant to be about 40-45 minutes 18 | long. 19 | 20 | I also gave this longer version of the talk at work in May 2018. 21 | 22 | * The version presented at `PyCon UK 2017`_ can be found at tag pycon-uk-2017_. 23 | You can see it `as recorded at PyCon UK 2017`_. This version was 30 minutes long. 24 | 25 | * The earlier version given to CamPUG_ in `October 2017`_ can be found at tag 26 | campug-oct-2017_. It was about 45 minutes long. 27 | 28 | The files 29 | ~~~~~~~~~ 30 | All sources are in reStructuredText_, and thus intended to be readable as 31 | plain text. 32 | 33 | * The sources for the slides are in ``_ and 34 | ``_ (customised for 4x3 and 16x9 respectively, 35 | although they're actually the same bar some formatting). 36 | * Notes per slide (for the presenter) are separated out into ``_. 37 | * Extended notes (with links) are in ``_. 38 | 39 | (Note that github will present the ``.rst`` files in rendered form as HTML, 40 | albeit using their own styling (which makes notes a bit odd). If you want 41 | to see the original reStructuredText source, you have to click on the "Raw" 42 | link at the top of the file's page.) 43 | 44 | Since this version of the talk uses PDF slides, which I produce via pandoc_ 45 | and TeX_, I'm including the resultant PDF files in the repository. These 46 | will not necessarily be as up-to-date as the source files, so check their 47 | timestamps. 48 | 49 | There are two versions of the slides here - the shorter (about 25 minutes) 50 | version, and the longer (about 45 minutes) version. 51 | 52 | * Longer: 53 | 54 | * The 4x3 aspect ratio slides are ``_. 55 | * The 16x9 aspect ratio slides are ``_. 56 | * There is a PDF version of the notes per slide in ``_. 57 | 58 | * Shorter: 59 | 60 | * The 4x3 aspect ratio slides are ``_. 61 | * There is a PDF version of the notes per slide in ``_. 62 | 63 | Making the PDF and HTML files 64 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 65 | For convenience, you can use the Makefile to create the PDF slides, create the 66 | HTML version of the extended notes, and so on. For instance:: 67 | 68 | $ make pdf 69 | 70 | will make the PDF files. 71 | 72 | For what the Makefile can do, use:: 73 | 74 | $ make help 75 | 76 | Requirements to build the documents: 77 | 78 | * pandoc_ and TeX_ (on mac, BasicTeX should be enough). Not needed if you're 79 | just going to do ``make html``. 80 | * docutils_ (for reStructuredText_). 81 | 82 | and an appropriate ``make`` program if you want to use the Makefile. 83 | 84 | .. _`Write the Docs Prague 2018`: https://www.writethedocs.org/conf/prague/2018/ 85 | .. _`PyCon UK 2017`: http://2017.pyconuk.org/ 86 | .. _CamPUG: https://www.meetup.com/CamPUG/ 87 | .. _`write the docs cambridge`: https://www.meetup.com/Write-The-Docs-Cambridge/events/246750191/ 88 | .. _`February 2018`: https://www.meetup.com/Write-The-Docs-Cambridge/events/246750191/ 89 | .. _`October 2017`: https://www.meetup.com/CamPUG/events/tpcsxlywnbfb/ 90 | .. _`as recorded at PyCon UK 2017`: https://www.youtube.com/watch?v=qQMXPXzrE_s 91 | .. _`as recorded at WtD Prague 2018`: https://www.youtube.com/watch?v=P-7hwjocEpM&list=PLZAeFn6dfHplRZcYDQjST22bAVeeWML4d&t=0s&index=22 92 | .. _campug-oct-2017: https://github.com/tibs/markup-history/tree/campug-oct-2017 93 | .. _pycon-uk-2017: https://github.com/tibs/markup-history/tree/pycon-uk-2017 94 | .. _WriteTheDocs_Prague_2018: https://github.com/tibs/markup-history/tree/WriteTheDocs_Prague_2018 95 | .. _pandoc: https://pandoc.org/ 96 | .. _docutils: http://docutils.sourceforge.net/ 97 | .. _reStructuredText: http://docutils.sourceforge.net/rst.html 98 | .. _TeX: https://www.ctan.org/starter 99 | 100 | -------- 101 | 102 | |cc-attr-sharealike| 103 | 104 | This slideshow and its related files are released under a `Creative Commons 105 | Attribution-ShareAlike 4.0 International License`_. 106 | 107 | .. |cc-attr-sharealike| image:: cc-attribution-sharealike-88x31.png 108 | :alt: CC-Attribution-ShareAlike image 109 | 110 | .. _`Creative Commons Attribution-ShareAlike 4.0 International License`: http://creativecommons.org/licenses/by-sa/4.0/ 111 | 112 | .. vim: set filetype=rst tabstop=8 softtabstop=2 shiftwidth=2 expandtab: 113 | -------------------------------------------------------------------------------- /assorted-text.txt: -------------------------------------------------------------------------------- 1 | http://history-computer.com/Internet/Birth/Goldfarb.html 2 | How I'm going to split the world 3 | ================================ 4 | 5 | There are different sorts of markup, with different purposes, but I am going 6 | to concentrate on three main types, which I shall categorise as: 7 | 8 | 1. Presentational markup 9 | 10 | This uses markup to direct the presentation of the text, for instance as a 11 | man page, on a screen, or on a typeset page. 12 | 13 | Traditional examples are the ``*roff`` family (roff, nroff, troff, groff 14 | and the various sorts of RUNOFF) and the |TeX| family (|TeX|, |LaTeX|, etc.) 15 | 16 | 2. Semantic markup 17 | 18 | This uses markup to add meaning to the text, typically by annotating 19 | textual elements to say what they are (e.g., part number, book title, 20 | address). 21 | 22 | Traditional examples are the markups described by SGML (or, lighter weight, 23 | XML), including Docbook. 24 | 25 | 3. Readable plaintext 26 | 27 | This attempts to get some of the benefits of one of the other types 28 | (typically concentrating on presentation) whilst preserving the readability 29 | of the original text. Interestingly, these have been around about as long 30 | as the other forms, but are less talked about. 31 | 32 | The ur-language for this form is probably setext, but obvious examples 33 | include StructuredText, reStructuredText and markdown. 34 | 35 | The art of design of these markups is judging what capabilities are wanted 36 | - the more things the markup can represent, the more complex the 37 | restrictions on what may be typed as free text, and what has accidental 38 | meaning. 39 | 40 | .. note:: The classic aim is to be "close to what one would write in an 41 | email" - a stricture that meant more when emails could only contain 42 | ASCII text. 43 | 44 | .. note:: I think this is why some people object to reStructuredText but 45 | are happy with markdown. reStructuredText aims to provide a lot of 46 | capability (such that one rarely needs to stray into something like 47 | |TeX| or Docbook), whereas markdown puts that balance a lot closer to 48 | plain text. 49 | 50 | .. note:: I think that means it is also sensible to separate out the three 51 | timelines. There will obviously have been cross-fertilisation, but it's 52 | probably much simpler not to mix them up, because that can lead to an 53 | artificial expectation of causation across the timelines, which I am 54 | suspicious of. 55 | 56 | (I'm not convinced the roffs and the SGMLs were well related in the early 57 | days, for instance. And that's irrespective of whether the different 58 | developers knew of each others work.) 59 | 60 | Presentational markup 61 | ===================== 62 | The early days 63 | -------------- 64 | This started with output to teletype (underlining, bolding, and otherwise 65 | overstruct characters) and to line printers. Eventually, output to 66 | monotype/linotype/etc. got added in. 67 | 68 | For instance: nroff/troff, DSR (Digital Standard Runnoff) 69 | 70 | The need was to type basic alphanumerics and symbols (i.e., ASCII or EBCDIC) 71 | but output to something with the ability to represent more. For teletypes, 72 | this might just mean the use of the backspace character to allow overwriting 73 | text - but doing that in the original text file would not necessarily be 74 | portable or readable. 75 | 76 | Needs: 77 | 78 | * Use portable character sets (not necessarily only ASCII and EBCDIC!) 79 | * Don't want to type in the "magic codes" to do unerlining, etc., especially 80 | as they're not necessarily going to be the same codes for different output 81 | devices. 82 | 83 | Programmable markup 84 | ------------------- 85 | 86 | .. note:: Wikipedia calls this "Procedural markup" 87 | 88 | There is an important subset of presentation markup, which is actually a 89 | progamming language that privides markup. The obvious examples are |TeX| and 90 | Postscript (and to a lesser extent, PDF). 91 | 92 | |TeX| is essentially a macro expansion language, and this means that the 93 | (perhaps more familiar) |LaTeX| is written in |TeX| itself. 94 | 95 | Postscript is perhaps not normally thought of as a markup lanugage, 96 | but is essentially a Forth derivative that works on text to produce a 97 | printable output. 98 | 99 | As such, both of these languages can be used to do non-text processing as well, 100 | although perhaps not in a manner that feels natural (to their intent). 101 | 102 | PDF incorporates a subset of Postscript, but is much more page oriented (pages are 103 | independent) and less general in its applicability, so is arguably not quite 104 | in our area of interest. 105 | 106 | http://wiki.c2.com/?ForthPostscriptRelationship discusses whether Postscript 107 | *is a* Forth, or is just similar to Forth (basically, the latter seems more 108 | sensible). 109 | 110 | Semantic markup 111 | =============== 112 | 113 | .. note:: Wikipedia calls this Descriptive markup 114 | 115 | * SGML (and DTDS) 116 | 117 | leading to: 118 | 119 | * HTML 120 | * XML 121 | * XHTML 122 | * Docbook 123 | 124 | and so on. 125 | 126 | (SGML originally derived from GML) 127 | 128 | Readable plaintext 129 | ================== 130 | 131 | .. note:: Wikipedia seems to put these together with such things as wiki 132 | markup as Lightweight markup. I'd argue there's a difference between 133 | lightweight markup and the subset therein which is readable, and it's that 134 | latter subset I'm most interested in. 135 | 136 | .. note:: It would be nice to get an actual timeline from setext to structured 137 | text to reStructuredText and any other intermediaries. 138 | 139 | setext -> structured text 140 | 141 | The big ideas of reStructuredText: 142 | 143 | 1. prioritise readability of the source text 144 | 2. not to specify the form of the output (i.e., don't just assume HTML) 145 | 3. be well specified 146 | 147 | Other examples: 148 | 149 | * markdown (I'd argue less readable, because it's meant to be slightly easier 150 | to write, and it originally was designed for output to HTML, and it's 151 | *definitely* not well specified) 152 | 153 | * asciidoctor (how does this differ on those three axes?) 154 | 155 | Talking points for the slideshow 156 | ================================ 157 | 158 | "Why markup languages are older than you think, and some of the major 159 | examples" 160 | 161 | All four have different reasons for existing, but clearly influence each 162 | other. 163 | 164 | *So*, for each pick a major example - perhaps: 165 | 166 | * nroff/troff (different programs, but same input format - does *it* have a 167 | name?) 168 | * SGML/HTML/XML and maybe a brief nod to Docbook 169 | * |TeX|/|LaTeX| (more people use |LaTeX| than bare |TeX|) 170 | * setext -> structured text -> reStructuredText 171 | 172 | Want dates for each 173 | 174 | Driving forces: 175 | 176 | - I want portable documentation 177 | - I want good (but controllable) typesetting 178 | - I want to mark up the meaning of the elements of my text, for analysis 179 | - I want readable source 180 | 181 | 182 | Older fragments of a Timeline 183 | ============================= 184 | need to put in setext, markdown, nroff/troff/groff, RUNOFF 185 | 186 | * 1964 RUNOFF https://en.wikipedia.org/wiki/TYPSET_and_RUNOFF 187 | 188 | - also, the RUNOFF program (`wikipedia - Runoff`_) 189 | 190 | * 1969 roff 191 | * nroff (newer roff) https://en.wikipedia.org/wiki/Nroff 192 | * troff (typesetter roff) https://en.wikipedia.org/wiki/Troff 193 | 194 | - in various versions, and with increasing capabilities. nroff basically 195 | ignores what it doesn't understand when reading the same input. 196 | 197 | * 1990s groff (GNU roff) 198 | 199 | http://h20565.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c04623260 is 200 | the OpenVMS Digital Standard Runoff Reference Manual from May 1993. 201 | 202 | and 203 | 204 | * 1967 - GenCode, William W. Tunnicliffe ("generic coding") - publishing industry. 205 | * 1969 - GML, Charles Goldfarb 206 | * 1974 (is that date right?) - SGML 207 | * 1978 ?? - TeX 208 | * 1980 - Scribe, Brian Reid, doctoral thesis 209 | * 1984 ?? - LaTeX 210 | * 1986 - SGML ISO Standard ISO 8879 211 | * 1989 ?? - HTML 212 | * setext - introduced 1991 213 | * 1996 - XML 214 | * StructuredText - introduced through Zope 215 | * reStructuredText - 216 | * MathML - 217 | 218 | .. _`wikipedia - Runoff`: https://en.wikipedia.org/wiki/Runoff_(program) 219 | 220 | 221 | 222 | Mumblings 223 | ========= 224 | 225 | These may or may not get used. 226 | 227 | 228 | Readability versus writeability 229 | ------------------------------- 230 | There is an obvious tradeoff to be made between how readable a format is, and 231 | how simple it is to write. For instance, delimiting headers by leading 232 | characters:: 233 | 234 | # Header 1 235 | ## Header 2 236 | 237 | is much simpler than having to type underlines:: 238 | 239 | Header 1 240 | ======== 241 | 242 | Header 2 243 | -------- 244 | 245 | Also, having a pre-defined set of underlines (e.g., ``===`` always means title, 246 | ``---`` means subtitle, etc.) is easier to learn and easier to use than 247 | allowing any underlining choice (provided it is consistent within a document), 248 | but may be considered to remove the author's choice on which form of 249 | underlining reads best *in this particular document*. 250 | 251 | As in so many things, the Zen of Python still has applicability - it is then 252 | left up to the reader how well it has been followed. 253 | 254 | The advantages of having a competent specification 255 | -------------------------------------------------- 256 | It is normally regarded as a good thing to have multiple implementations of a 257 | specification - not least because it helps to iron out misunderstandings of 258 | what that specification means. Standardisation can help to mediate that 259 | understanding (although not always as much as one might hope), and Python gets 260 | by quite well by just having people communicate a lot and having a reasonable 261 | test suite for the language. 262 | 263 | It's not always an obviously good thing, though. 264 | 265 | There are many forms of markdown, but the original implementation of markdown 266 | is essentially frozen, as is the original documentation, and that "definition" 267 | of markdown is both ambiguous, and does not address various tasks that people 268 | want to do. Nor is the original author willing to help with this situation [3]_. 269 | This means that different markdown implementations provide their 270 | own decisions on the ambiguous parts, and provide their own extensions. 271 | Unfortunately this means that markdown text is not necessarily portable 272 | between implementations. In practice, however, this probably doesn't matter 273 | much, as the use of markdown is often for documentation that belongs to a 274 | particular site or user environment, and interoperability within that 275 | site/environment is enough. 276 | 277 | In contrast, reStructuredText is quite well specified (David Goodger having a 278 | background in SGML systems, after all). This means that the various 279 | implementations of reStructuredText can be specific about what they do or 280 | don't support, and in general interoperability should be better, or at least 281 | more predictable. 282 | 283 | Incidentally, it probably also makes it possible to produce a general linter for 284 | reStructuredText - i.e., a program to inspect the text for errors before 285 | running it through docutils to produce output - which is harder to do in a 286 | portable manner for markdown, because there are so many markdowns. 287 | 288 | .. [1] both not escaped in this text, of course 289 | .. [2] the answer is, of course, "whichever is suitable" / "whichever you 290 | choose", although I would suggest that for a large public project (gov.uk, 291 | documentation for the RaspberryPi) markdown should be adopted, as it is 292 | simpler, whilst for more challenging uses (or people who prefer more 293 | challenging markup), reStructuredText is good. And as reStructuredText 294 | suggests that "if you need to do things it doesn't support, use something 295 | else", then I think the same can apply to markdown and (perhaps) moving on 296 | to reStructuredText. 297 | .. [3] whilst I personally find that hard to understand, it's not as if we're 298 | *paying* anything for the privilege of using markdown, we're using 299 | something given freely as it is/was. 300 | 301 | 302 | Comparisons 303 | ----------- 304 | Comparing markdown, reStructuredText and AsciiDoc (to pick three). 305 | 306 | *Is this section worth anything? I'm not actually convinced.* 307 | 308 | NB: check whether AsciiDoctor also always goes through docbook 309 | 310 | ====================== ============ ==================== ======================== 311 | **Concept** **markdown** **reStructuredText** **AsciiDoc** 312 | ---------------------- ------------ -------------------- ------------------------ 313 | readability a main aim the main aim a main aim 314 | closely specified no yes yes 315 | output to various various docbook and then various 316 | inline HTML yes delimited [#a]_ delimited [#b]_ 317 | nested inline markup ? no yes 318 | non-trivial list items no yes yes 319 | extensible no directives macros 320 | conditional text no no no 321 | executable text no no [#c]_ yes 322 | tables not standard yes yes 323 | ====================== ============ ==================== ======================== 324 | 325 | 326 | .. [#a] reStructuredText allows the writer to add HTML via a directive, 327 | but it will only be used if the output is to HTML. 328 | .. [#b] AsciiDoc produces HTML via Docbook, and Docbook provides a way of 329 | including a file of raw HTML into the HTML output. 330 | .. [#c] this is a very conscious decision by reStructuredText 331 | 332 | On lists 333 | -------- 334 | Or, actually, treating text blocks as composable first class entities. 335 | 336 | Why do so many markup creators believe that lists are just made up 337 | of list items with no internal structure? Do they really never want to 338 | put code into list items, or have more than one paragraph per item? Given 339 | the experience of the lengths people will go to in those wiki formats that 340 | are similarly crippled to make it *look* as if one can do these things, 341 | this would appear always to be a mistake. Back in the origianl wikiwikiweb, 342 | I think that it was quite deliberate - if one looks at the types of page 343 | being written in that wiki, there was no intent to have anything beyond a 344 | sort of "notation" page - it wasn't intended for writing whole documents. 345 | For other wikis, I suspect many have copied that limitation without 346 | thinking about the implications. For actual markup formats, though 347 | (expecially those targetting HTML, which is many of them), it's rather 348 | harder to understand the limitation. 349 | 350 | .. vim: set filetype=rst tabstop=8 softtabstop=2 shiftwidth=2 expandtab: 351 | -------------------------------------------------------------------------------- /cc-attribution-sharealike-88x31.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/tibs/markup-history/5da8f41ec7b331dd94c9c498e971ef958b8c14e9/cc-attribution-sharealike-88x31.png -------------------------------------------------------------------------------- /markup-history-extended-notes.rst: -------------------------------------------------------------------------------- 1 | ============================= 2 | A history of markup languages 3 | ============================= 4 | 5 | By Tibs / Tony Ibbs 6 | 7 | .. note:: These are the extended notes to go with the slideshow, together 8 | with links which I hope will be useful, and perhaps act as a start on 9 | finding out more about markup languages. 10 | 11 | .. We can represent TeX and LaTeX as simple text: 12 | 13 | .. |TeX| replace:: TeX 14 | 15 | .. |LaTeX| replace:: LaTeX 16 | 17 | .. contents:: 18 | 19 | Introduction 20 | ============ 21 | 22 | We've always added markup to text. 23 | 24 | When writing by hand, we underline to indicate emphasis. 25 | 26 | Typewritten manuscripts would similarly use underlining (of various sorts) to 27 | indicate titles, emphasis, and so on. 28 | 29 | Play scripts use various abbreviations and conventions to distinguish dialogue, 30 | effects and stage arrangements. 31 | 32 | This "history" (or "opinionated overview") will hopefully serve as a starting 33 | point for finding out more about markup languages, why they exist and what a 34 | wide range of possibilities they represent. 35 | 36 | I'm taking "markup language" to mean a way of marking up plain text so it can 37 | be turned into something more interesting (for instance, output to a 38 | typesetter), or so it can be analysed more easily. 39 | 40 | Interestingly, almost all of the markup formats I'm going to discuss are still 41 | in use today. 42 | 43 | .. note:: Basically, these notes are a summary of some of the more obvious 44 | bits of the history of document markup. It's the sort of stuff that one can 45 | find with a little bit of memory and some googling. That means that there 46 | is doubtless interesting stuff I've missed, and there is also bound to be 47 | *exciting* work going on **right now** that is not mentioned anywhere here 48 | - for instance, I think Pollen_ looks quite interesting. 49 | 50 | .. _Pollen: http://docs.racket-lang.org/pollen/ 51 | 52 | Types of markup 53 | =============== 54 | 55 | It is common to distiguish two main ways of marking up text: 56 | 57 | * Semantic - what it means 58 | * Presentational - how it should be shown 59 | 60 | Semantic markup is intended to give more information about the meaning of the 61 | text. This may be an end in and of itself, for reasons as disparate as 62 | allowing extraction of information about aeroplane parts or determining the 63 | parts of speech in a corpus. 64 | 65 | Presentational markup indicates how the text should be presented, for instance 66 | as a man page, or printed using a typesetter. 67 | 68 | (Even at the beginning of our timeline, people had access to typesetters, and 69 | wanted to drive them.) 70 | 71 | These two aren't necessarily entirely distinct, though: one of the important 72 | early realisations was that presentational markup benefits from some degree of 73 | semantics. So, for instance, it is more useful to say "heading", than 74 | "bold font X at size Y". 75 | 76 | We can also separate out two other types of markup 77 | 78 | * Lightweight markup us designed to be simple to type, and hopefully easier to 79 | read. 80 | 81 | * Programmable markup (wikipedia calls this "procedural") is actually merging 82 | a programming language with the text. The best know of these is |TeX|. 83 | 84 | Things I'm ignoring 85 | =================== 86 | 87 | I'm ignoring anything that isn't just text, so: 88 | 89 | * Music 90 | * Mathematics 91 | * Pictures/diagrams/graphs 92 | * Bibliographies and indices 93 | * All sorts of other things 94 | 95 | Further, there are many more markup formats than I discuss here (for instance, 96 | and perhaps unsurprisingly, people have been inventing "easier" ways to write 97 | HTML documents since the early days of HTML). 98 | 99 | My own experience 100 | ================= 101 | I believe I first used a markup language when writing up my final year 102 | Computer Science project, in 1981. This would have been a Cambridge written 103 | format that ran on the local mainframe. 104 | 105 | Later on, at work, I came across DEC's RUNOFF, which became Digital Standard 106 | Runoff (DSR). In the 1980s I wrote a partial bibliography of Joan Aiken using 107 | DSR, but unfortunately no longer have the original sources, as I converted it 108 | to HTML. 109 | 110 | HTML I wrote almost from its inception - back then it was quite common to 111 | write HTML by hand (it was a much simpler thing than it is today). 112 | 113 | I first wrote |TeX| at work as well, and introduced the use of |LaTeX| for our 114 | in-house API documentation. Personally, I preferred |TeX| to |LaTeX|, but 115 | realise I was in the minority. 116 | 117 | When Python converted its documentation from |LaTeX|, I originally thought 118 | this was a bad idea, as clearly (!) anyone could learn |LaTeX| (which was 119 | originally used, before the adoption of reStructuredText). It was explained to 120 | me, though, that the problem wasn't that people couldn't learn |LaTeX|, it was 121 | that they'd look at it and say "I don't *want* to learn that, I can't see why 122 | I should". Which made me change my mind. 123 | 124 | Nowadays, reStructuredText is my "how to write text" format for almost all 125 | my own purposes, and like everyone I can also write markdown when necessary 126 | (although not with any great understanding of its edge cases). 127 | 128 | A timeline 129 | ========== 130 | 131 | * 1964 TYPSET and RUNOFF 132 | * 1967 William Tunincliffe: "The separation of the information content of 133 | documents from their format". Goldfarb credits him with starting the generic 134 | coding movement (i.e., the idea of using descriptive tags like "heading" 135 | rather than "format-17") with this presentation given at a meeting of the 136 | Canadian Government Printing Office in September 1967 137 | * 1969 GML (Goldfarb, Mosher, Lorie) at IBM 138 | * "1970s" roff, script, runoff, document 139 | * 1976 nroff and troff (Ossanna) 140 | * 1978 bib and refer 141 | * 1977/1978 |TeX| and Metafont ("classic" version, written in SAIL, Knuth and others) 142 | * 1978-1980 Scribe (Reid) 143 | * 1982 |TeX| and Metafont in WEB/Pascal 144 | * 1983-1985 |LaTeX| (Lamport) 145 | * 1984 Postscript (`Wikipedia on PostScript`_ has 1982-1984) 146 | * 1986 ISO standard SGML (although the first working draft was in 1980) 147 | * 1987 TEI 148 | * 1991 Tim Berners-Lee wrote the "HTML Tags" document, proposing what was 149 | essentially HTML, built on SGML 150 | * 1989-1991 HTML and HTTP (Berners-Lee) 151 | * 1991 setext, (Feldman) for use in the TidBITS electronic newsletter 152 | * 1991 Docbook 153 | * 1993 PDF (Adobe Systems) 154 | * 1994/1995 WikiWikiWeb (Cunningham) the first wiki 155 | * 1994 Perl 5.000 introduces POD 156 | * 1995 Java appears, and with it javadoc 157 | * 1996 StructuredText (Fulton, Zope Corporation / Digital Creations) 158 | * 1997 XML 159 | * 2000 Digital Creations begins development of StructuredTextNG 160 | * 2000 First draft of reStructuredText spec posted to Doc-Utils SIG (Goodger) 161 | * 2001-2002 reStructuredText and Docutils 162 | * 2001-2005 DITA 163 | * 2002 PEP 287 "reStructuredText Standard Docstring Format" 164 | * 2002 AsciiDoc (Rackham) 165 | * 2004 markdown (Gruber and Swartz) 166 | * 2013 Asciidoctor (Waldron and others) 167 | 168 | Various sources were used in creating the timeline, but a special mention has 169 | to go to `25 Years of |TeX| and METAFONT\: Looking Back and Looking 170 | Forward`_, and of course to Wikipedia. 171 | 172 | General 173 | ======= 174 | These are some interesting general links. 175 | 176 | * `Wikipedia on Markup Language`_ - In general, this is a good place to start. 177 | See the taxonomy of (three) types therein, and the history section. 178 | * `Wikipedia List of document markup languages`_ - always fun to look through. 179 | Notice how several of the "Well-known document markup languages" are 180 | essentially HTML variants. 181 | * `Charles Goldfarb — the Godfather of Markup Languages`_, Georgi Dalako, 182 | undated. A quick introduction to one of the important influences on this 183 | field. 184 | * `Don Knuth's homepage`_, the homepage of Dona;d E. Knuth, Professor Emeritus 185 | at Stanford University. There are so many reasons to browse these pages. 186 | * `An informal look into the history of digital typography`_, David Walden, 2016. 187 | This is a good introduction, starting with letter presses and moving on into 188 | the digital world. Read it for a look at where markup came from, and what it 189 | is driving. 190 | * `From boiling lead and black art\: An essay on the history of mathematical typography`_, 191 | Eddie Smith, 2017, is a lovely article on mathematical typesetting, from the 192 | invention of the printing press to |TeX|. 193 | 194 | .. _`Wikipedia on Markup Language`: https://en.wikipedia.org/wiki/Markup_language 195 | .. _`Wikipedia List of document markup languages`: https://en.wikipedia.org/wiki/List_of_document_markup_languages 196 | .. _`Charles Goldfarb — the Godfather of Markup Languages`: http://history-computer.com/Internet/Birth/Goldfarb.html 197 | .. _`Don Knuth's homepage`: http://www-cs-faculty.stanford.edu/~knuth/ 198 | .. _`An informal look into the history of digital typography`: http://www.tug.org/tug2016/walden-digital.pdf 199 | .. _`From boiling lead and black art\: An essay on the history of mathematical typography`: http://www.practicallyefficient.com/2017/10/13/from-boiling-lead-and-black-art.html 200 | 201 | * `Wikipedia on docstrings`_. My memory is that Python docstrings were 202 | inspired by the existence of docstrings in Emacs Lisp. This wikipedia page 203 | gives examples from several different programming languages. 204 | * `Docstring Convention: Python vs Emacs Lisp`_, Xah Lee, 2014. This compares 205 | the difference in how one is meant to write good dosctrings in the two 206 | different programming languages. 207 | 208 | .. _`Wikipedia on docstrings`: https://en.wikipedia.org/wiki/Docstring 209 | .. _`Docstring Convention: Python vs Emacs Lisp`: http://xahlee.info/comp/python_vs_elisp_docstring_convention.html 210 | 211 | * `SGML and PDF--Why We Need Both`_, Bill Kasdorf, Volume 3, Issue 4: *Moving 212 | from Print to Electronic Publishing*, June, 1998. This essentially talks about the 213 | difference between semantic and presentational representation. I'm not sure 214 | that it would occur to anyone now-a-days to ask the question this article 215 | proposes, but the answer is definitely still valuable. 216 | 217 | .. _`SGML and PDF--Why We Need Both`: https://quod.lib.umich.edu/j/jep/3336451.0003.406?view=text;rgn=main 218 | 219 | RUNOFF and its descendants 220 | ========================== 221 | 222 | :1964 RUNOFF: *Presentational* 223 | :1970s \*roff: *Presentational*. Still in use. 224 | :1990 groff: *Presentational*. Still in use. 225 | 226 | RUNOFF 227 | ------ 228 | The original RUNOFF and TYPSET were written by Jerome H. Saltzer for CTSS_ 229 | (Compatible Time Sharing System). Between them, they provided simple text 230 | layout and pagination, including right justification. 231 | 232 | This example is (more or less) from the original TYPSET/RUNOFF documentation: 233 | 234 | .. code:: roff 235 | 236 | .LINE LENGTH 60 237 | .LEFT MARGIN 0 238 | .PARAGRAPH 5 239 | Call us on our toll free number 240 | 241 | .CENTER 242 | 1-800-555-5555 243 | 244 | and we will respond as soon as convenient. 245 | 246 | Commands start with a dot in the first column - this makes sense as it's not 247 | usual to start a line of English text with a dot. 248 | 249 | Commands could be abbreviated, which would have been important with the 250 | keyboards in use at the time. Inline commands can be used to shift the "case", 251 | for instance in and out of bold case. 252 | 253 | The following is an example of Digital Standard Runoff (DSR), showing that the 254 | name had an enduring meaning. I used to use DSR on VMS in the 1980s/90s. 255 | 256 | .. code:: roff 257 | 258 | .TITLE A simpler DSR example 259 | .CHAPTER This is a chapter 260 | 261 | This is the first paragraph. 262 | .LIST 263 | .LIST ELEMENT;This is a list element. We have *bold\* and &underline\&. 264 | .LIST ELEMENT;This is another list element. I like interrobangs ?%! 265 | .END LIST 266 | 267 | Abbreviated forms are still available, e.g., ``.ls`` instead of 268 | ``.list``, and ``.le;`` instead of ``list element;``. 269 | 270 | RUNOFF was ported to BCPL and Multics, and became the ancestor to roff and 271 | thus, ultimately, all of the roff family. 272 | 273 | The roffs 274 | --------- 275 | 276 | roff started as a transliteration of the BCPL version of runoff, for UNIX, 277 | around 1970. 278 | 279 | The roff family are typically used with macro processors, allowing more domain 280 | specific commands to be converted into the actual roff commands. This means 281 | that the system as a whole can be regarded as essentially programmable, 282 | even though the roff program itself is not. 283 | 284 | The example given here (from Lars Wirzenius' `Writing manual pages`_) 285 | is a (fake) man page, using the ``man`` macro package: 286 | 287 | .. code:: roff 288 | 289 | .TH CORRUPT 1 290 | .SH NAME 291 | corrupt \- modify files by randomly changing bits 292 | .SH SYNOPSIS 293 | .B corrupt 294 | [\fB\-n\fR \fIBITS\fR] 295 | [\fB\-\-bits\fR \fIBITS\fR] 296 | .IR file ... 297 | .SH DESCRIPTION 298 | .B corrupt 299 | modifies files by toggling a randomly chosen bit. 300 | .SH OPTIONS 301 | .TP 302 | .BR \-n ", " \-\-bits =\fIBITS\fR 303 | Set the number of bits to modify. Default is one bit. 304 | 305 | In this example, ``.TH`` = title, ``.SH`` = sub-heading, ``.B`` = bold, other 306 | font usages (e.g., normal font and underlining) are indicated by the ``\f`` 307 | sequences. 308 | 309 | Today, the dominant roff program is probably ``groff``, or GNU roff. Here is 310 | an example of groff: 311 | 312 | .. code:: roff 313 | 314 | ..INCLUDE mission-statement-strings.mom 315 | .TITLE "\*[Groff-Mission-Statement] 316 | .SUBTITLE "\*[2014] 317 | .INCLUDE mission-statement-style.mom 318 | .PP 319 | As the most widely deployed implementation of troff in use today, 320 | groff holds an important place in the Unix universe. Frequently 321 | and erroneously dismissed as a legacy program for formatting 322 | Unix manuals (manpages), groff is in fact a sophisticated system 323 | for producing high-quality typeset material, from business 324 | correspondence to complex, technical reports and plate-ready books. 325 | \*[BU3]With an impressive record for backward compatibility, it 326 | continues to evolve and play a leading role in the development of 327 | free typesetting software. 328 | 329 | Interesting links: 330 | 331 | * `Wikipedia on TYPSET and RUNOFF`_ 332 | * CTSS_ (the Compatible Time Sharing System) which is the machine on which the 333 | first RUNOFF ran. 334 | * `Wikipedia on Runoff`_ 335 | * `Wikipedia on roff`_ 336 | * `Wikipedia on nroff`_ ("newer roff") 337 | * `Wikipedia on troff`_ ("typesetter roff") 338 | * `Wikipedia on groff`_ ("GNU troff") 339 | * A repository of `Historical documents from classical RUNOFF and files using the RUNOFF language`_ 340 | * The `OpenVMS Digital Standard Runoff Reference Manual`_ from May 1993. 341 | * The manpage ``ROFF(7)``: `roff - concepts and history 342 | of roff typesetting`_, part of the `groff`_ distribution. It has an overview 343 | of the history of the roffs, and a summary of how they work. 344 | * `History of UNIX Manpages`_, Kristaps Dzonsons, 2011. The history of the 345 | UNIX manpage "based on source code, manuals, and first-hand accounts". 346 | Also traces the naming of programs RUNOFF through roff, SCRIPT, compose, 347 | roff (a different thing), nroff and so on. 348 | * The Groff_ manual 349 | * `Groff and mom\: an overview`_, Peter Schaffer, 2017 350 | * Mom_, macros for GNU troff, Peter Schaffter. mom is a flexible typesetting 351 | and document formatting package that allows you to create high-quality 352 | Portable Document Format (.pdf) or PostScript (.ps) files. It is a macro set 353 | that sits on top of groff_. 354 | * `Writing manual pages`_, Lars Wirzenius, 2016 355 | * From `Unix history`_, `William Stewart`_, 1996-2014:V 356 | 357 | In the spring of 1971, the interest in Unix began to grow, so instead of 358 | writing a new text-processing system as originally proposed, Thompson and 359 | Ritchie translated the existing "roff" text formatter from the PDP-7 to the 360 | PDP-11 and made it available to the Patent department on their new Unix 361 | system. This practical success helped convince Bell Labs of the value of 362 | Unix, and shortly thereafter they bought the team one of the first, powerful 363 | PDP-11/45 minicomputers to continue their development. A series of 364 | progressively better "editions" of Unix were then released. 365 | 366 | .. _`Wikipedia on TYPSET and RUNOFF`: https://en.wikipedia.org/wiki/TYPSET_and_RUNOFF 367 | .. _CTSS: https://en.wikipedia.org/wiki/Compatible_Time-Sharing_System 368 | .. _`Historical documents from classical RUNOFF and files using the RUNOFF language`: https://github.com/bwarken/RUNOFF_historical/ 369 | .. _`Wikipedia on Runoff`: https://en.wikipedia.org/wiki/Runoff_(program) 370 | .. _`Wikipedia on roff`: https://en.wikipedia.org/wiki/Roff_(computer_program) 371 | .. _`Wikipedia on nroff`: https://en.wikipedia.org/wiki/Nroff 372 | .. _`Wikipedia on troff`: https://en.wikipedia.org/wiki/Troff 373 | .. _`Wikipedia on groff`: https://en.wikipedia.org/wiki/Groff_(software) 374 | .. _`roff - concepts and history of roff typesetting`: https://linux.die.net/man/7/roff 375 | .. _`OpenVMS Digital Standard Runoff Reference Manual`: http://h20565.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c04623260 376 | .. _`Writing manual pages`: https://liw.fi/manpages/ 377 | .. _`History of UNIX Manpages`: http://manpages.bsd.lv/history.html 378 | .. _Groff: http://www.gnu.org/software/groff/ 379 | .. _`Groff and mom\: an overview`: https://www.gnu.org/software/groff/groff-and-mom.pdf 380 | .. _mom: http://www.schaffter.ca/mom/ 381 | .. _`Unix history`: https://www.livinginternet.com/i/iw_unix_dev.htm 382 | .. _`William Stewart`: http://williamstewart.com/ 383 | 384 | .. note:: Also, preceding RUNOFF, in 1963, there is `TJ-2`_: 385 | 386 | TJ-2 (Type Justifying Program) was published by Peter Samson in May 1963 387 | and is thought to be the first page layout program. ... TJ-2 was 388 | succeeded by TYPSET and RUNOFF, a pair of complementary programs written 389 | in 1964 for the CTSS operating system. TYPSET and RUNOFF soon evolved 390 | into runoff for Multics, which was in turn ported to Unix in the 1970s as 391 | roff. 392 | 393 | -- from the wikipedia page 394 | 395 | .. _`TJ-2`: `Wikipedia on TJ-2`_ 396 | .. _`Wikipedia on TJ-2`: https://en.wikipedia.org/wiki/TJ-2 397 | 398 | GML and SGML and XML 399 | ==================== 400 | 401 | :1969 GML: *Semantic* and *meta* 402 | :1986 SGML: *Semantic* and *meta* (DTDs) 403 | :1997 XML: *Semantic* and *meta* (various schema languages) 404 | 405 | GML 406 | --- 407 | 408 | 1969 GML 409 | 410 | GML stood for Generalized Markup Language, but also for the initials of the 411 | surnames of its inventors (Charles Goldfarb, Edward Mosher, Raymond Lorie). 412 | 413 | It was intended to be a mechanism for *describing* markup languages, rather 414 | than a markup language itself. 415 | 416 | Here is an example of GML, from `The Roots of SGML -- A Personal 417 | Recollection`_ by Charles F. Goldfarb. It uses the "starter set", implemented 418 | using macros in IBM's Script_: 419 | 420 | .. code:: 421 | 422 | :h1.Chapter 1: Introduction 423 | :p.GML supported hierarchical containers, such as 424 | :ol 425 | :li.Ordered lists (like this one), 426 | :li.Unordered lists, and 427 | :li.Definition lists 428 | :eol. 429 | as well as simple structures. 430 | :p.Markup minimization (later generalized and formalized in SGML), 431 | allowed the end-tags to be omitted for the "h1" and "p" elements. 432 | 433 | SGML 434 | ---- 435 | SGML is an ISO standard: "ISO 8879:1986 Information processing – Text and 436 | office systems – Standard Generalized Markup Language (SGML)". The `wikipedia 437 | page on SGML`_ gives more information on the standard and its related 438 | standards. 439 | 440 | .. _`wikipedia page on SGML`: `Wikipedia on SGML`_ 441 | 442 | SGML uses DTDs (Document Type Definitions) to describe the set of 443 | markup declarations that form a *document type* (e.g., SGML itself, XML, 444 | HTML). 445 | 446 | Shown is a DTD fragment for defining a simple list: 447 | 448 | .. code:: DTD 449 | 450 |