├── .gitignore ├── .vscode └── settings.json ├── LICENSE ├── NEWS.md ├── docs ├── figure │ └── ExampleStructure.png ├── langsci-gb4e.sty ├── pandoc-ling-old.lua ├── processVerbatim.lua ├── readme.docx ├── readme.epub ├── readme.html ├── readme_expex.pdf ├── readme_expex.tex ├── readme_gb4e.pdf ├── readme_gb4e.tex ├── readme_langsci-gb4e.pdf ├── readme_langsci-gb4e.tex ├── readme_linguex.pdf ├── readme_linguex.tex └── test.sh ├── figure ├── ExampleStructure.pages ├── ExampleStructure.pdf └── ExampleStructure.png ├── pandoc-ling.lua └── readme.md /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | -------------------------------------------------------------------------------- /.vscode/settings.json: -------------------------------------------------------------------------------- 1 | { 2 | "spellright.language": [], 3 | "spellright.documentTypes": [ 4 | "plaintext" 5 | ] 6 | } -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Creative Commons Legal Code 2 | 3 | CC0 1.0 Universal 4 | 5 | CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE 6 | LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN 7 | ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS 8 | INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES 9 | REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS 10 | PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM 11 | THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED 12 | HEREUNDER. 13 | 14 | Statement of Purpose 15 | 16 | The laws of most jurisdictions throughout the world automatically confer 17 | exclusive Copyright and Related Rights (defined below) upon the creator 18 | and subsequent owner(s) (each and all, an "owner") of an original work of 19 | authorship and/or a database (each, a "Work"). 20 | 21 | Certain owners wish to permanently relinquish those rights to a Work for 22 | the purpose of contributing to a commons of creative, cultural and 23 | scientific works ("Commons") that the public can reliably and without fear 24 | of later claims of infringement build upon, modify, incorporate in other 25 | works, reuse and redistribute as freely as possible in any form whatsoever 26 | and for any purposes, including without limitation commercial purposes. 27 | These owners may contribute to the Commons to promote the ideal of a free 28 | culture and the further production of creative, cultural and scientific 29 | works, or to gain reputation or greater distribution for their Work in 30 | part through the use and efforts of others. 31 | 32 | For these and/or other purposes and motivations, and without any 33 | expectation of additional consideration or compensation, the person 34 | associating CC0 with a Work (the "Affirmer"), to the extent that he or she 35 | is an owner of Copyright and Related Rights in the Work, voluntarily 36 | elects to apply CC0 to the Work and publicly distribute the Work under its 37 | terms, with knowledge of his or her Copyright and Related Rights in the 38 | Work and the meaning and intended legal effect of CC0 on those rights. 39 | 40 | 1. Copyright and Related Rights. A Work made available under CC0 may be 41 | protected by copyright and related or neighboring rights ("Copyright and 42 | Related Rights"). Copyright and Related Rights include, but are not 43 | limited to, the following: 44 | 45 | i. the right to reproduce, adapt, distribute, perform, display, 46 | communicate, and translate a Work; 47 | ii. moral rights retained by the original author(s) and/or performer(s); 48 | iii. publicity and privacy rights pertaining to a person's image or 49 | likeness depicted in a Work; 50 | iv. rights protecting against unfair competition in regards to a Work, 51 | subject to the limitations in paragraph 4(a), below; 52 | v. rights protecting the extraction, dissemination, use and reuse of data 53 | in a Work; 54 | vi. database rights (such as those arising under Directive 96/9/EC of the 55 | European Parliament and of the Council of 11 March 1996 on the legal 56 | protection of databases, and under any national implementation 57 | thereof, including any amended or successor version of such 58 | directive); and 59 | vii. other similar, equivalent or corresponding rights throughout the 60 | world based on applicable law or treaty, and any national 61 | implementations thereof. 62 | 63 | 2. Waiver. To the greatest extent permitted by, but not in contravention 64 | of, applicable law, Affirmer hereby overtly, fully, permanently, 65 | irrevocably and unconditionally waives, abandons, and surrenders all of 66 | Affirmer's Copyright and Related Rights and associated claims and causes 67 | of action, whether now known or unknown (including existing as well as 68 | future claims and causes of action), in the Work (i) in all territories 69 | worldwide, (ii) for the maximum duration provided by applicable law or 70 | treaty (including future time extensions), (iii) in any current or future 71 | medium and for any number of copies, and (iv) for any purpose whatsoever, 72 | including without limitation commercial, advertising or promotional 73 | purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each 74 | member of the public at large and to the detriment of Affirmer's heirs and 75 | successors, fully intending that such Waiver shall not be subject to 76 | revocation, rescission, cancellation, termination, or any other legal or 77 | equitable action to disrupt the quiet enjoyment of the Work by the public 78 | as contemplated by Affirmer's express Statement of Purpose. 79 | 80 | 3. Public License Fallback. Should any part of the Waiver for any reason 81 | be judged legally invalid or ineffective under applicable law, then the 82 | Waiver shall be preserved to the maximum extent permitted taking into 83 | account Affirmer's express Statement of Purpose. In addition, to the 84 | extent the Waiver is so judged Affirmer hereby grants to each affected 85 | person a royalty-free, non transferable, non sublicensable, non exclusive, 86 | irrevocable and unconditional license to exercise Affirmer's Copyright and 87 | Related Rights in the Work (i) in all territories worldwide, (ii) for the 88 | maximum duration provided by applicable law or treaty (including future 89 | time extensions), (iii) in any current or future medium and for any number 90 | of copies, and (iv) for any purpose whatsoever, including without 91 | limitation commercial, advertising or promotional purposes (the 92 | "License"). The License shall be deemed effective as of the date CC0 was 93 | applied by Affirmer to the Work. Should any part of the License for any 94 | reason be judged legally invalid or ineffective under applicable law, such 95 | partial invalidity or ineffectiveness shall not invalidate the remainder 96 | of the License, and in such case Affirmer hereby affirms that he or she 97 | will not (i) exercise any of his or her remaining Copyright and Related 98 | Rights in the Work or (ii) assert any associated claims and causes of 99 | action with respect to the Work, in either case contrary to Affirmer's 100 | express Statement of Purpose. 101 | 102 | 4. Limitations and Disclaimers. 103 | 104 | a. No trademark or patent rights held by Affirmer are waived, abandoned, 105 | surrendered, licensed or otherwise affected by this document. 106 | b. Affirmer offers the Work as-is and makes no representations or 107 | warranties of any kind concerning the Work, express, implied, 108 | statutory or otherwise, including without limitation warranties of 109 | title, merchantability, fitness for a particular purpose, non 110 | infringement, or the absence of latent or other defects, accuracy, or 111 | the present or absence of errors, whether or not discoverable, all to 112 | the greatest extent permissible under applicable law. 113 | c. Affirmer disclaims responsibility for clearing rights of other persons 114 | that may apply to the Work or any use thereof, including without 115 | limitation any person's Copyright and Related Rights in the Work. 116 | Further, Affirmer disclaims responsibility for obtaining any necessary 117 | consents, permissions or other rights required for any use of the 118 | Work. 119 | d. Affirmer understands and acknowledges that Creative Commons is not a 120 | party to this document and has no duty or obligation with respect to 121 | this CC0 or use of the Work. 122 | -------------------------------------------------------------------------------- /NEWS.md: -------------------------------------------------------------------------------- 1 | # pandoc-ling 1.9 (upcoming) 2 | 3 | # pandoc-ling 1.8 4 | 5 | - allow for multiline headers (using `\\` as linebreak) 6 | - bugfix for preambles with interlinear examples (thx to fmatter #21) 7 | - bugfix for capitalisation of glossing (thx to bulbulistan #22) 8 | - bugfix to keep header and preamble always with rest, also without samePage (#23) 9 | - bugfix for other separators in glossing (thx to bulbulistan and 7o7omootsqwn #27) 10 | 11 | # pandoc-ling 1.7 12 | 13 | ## changes 14 | 15 | - allow for no space between number and suffix in latex export 16 | - bugfix latex-linguex (thx speechchemistry) 17 | 18 | # pandoc-ling 1.6 19 | 20 | ## changes 21 | 22 | - adding option `samePage` to determine whether examples are kept together on a page in Latex typesetting (#pr14, thanks to CLRafaelR) 23 | 24 | ## bugs 25 | 26 | - allow for one-letter elements in glossing (#pr13, thanks to CLRafaelR) 27 | 28 | # pandoc-ling 1.5 29 | 30 | ## changes 31 | 32 | - removing the colon from the internal ID to harmonize the cross-document referencing 33 | 34 | ## bugs 35 | 36 | - handling of `header-includes` improved: additional user-provided statements are just passed through. 37 | - various internal changes to match the updated functioning of lua inside pandoc 2.12 and newer. 38 | 39 | # pandoc-ling 1.4 40 | 41 | ## changes 42 | 43 | - adding option to use bullet lists for example entry. Will still be transformed into labelled list. 44 | 45 | ## bugs 46 | 47 | - fixed gb4e error prevented by adding `\noautomath` to preamble 48 | - fixed gb4e error with cross-referencing because of wrong placement of `\label` statement 49 | - fixed latex error when using special symbols in judgements (closes #4, thx @CLRafaelR) 50 | 51 | # pandoc-ling 1.3.1 52 | 53 | ## bugs 54 | 55 | - fixed bug with not-appearing header of interlinear in HTML output 56 | - fixed bug with judgements in single-line examples (closes #3, thx @CLRafaelR) 57 | 58 | # pandoc-ling 1.3 59 | 60 | ## changes 61 | 62 | - changed the ID system to "#ex:" for easier cross-document linking to examples 63 | - change @next and @last to lowercase for easier typing 64 | - added 'samepage' enclosures in latex so that examples do not break across pages 65 | 66 | ## bugs 67 | 68 | - resolved clash with pandoc-crossref 69 | - corrected wrong counting with unnumbered headings 70 | - fixed counter reset and \exewidth with gb4e 71 | - fixed bugs with linguex export 72 | 73 | # pandoc-ling 1.2 74 | 75 | ## changes 76 | 77 | - adding experimental beamer as possible export. It uses the same routines as basic latex export. needs more testing. 78 | 79 | ## bugs 80 | 81 | - fixed problem with parsing local options 82 | 83 | # pandoc-ling 1.1 84 | 85 | ## changes 86 | 87 | - adding experimental noFormat option to simply use the raw content of the example as a complete div to a single table cell and set number to vertically centred. For latex export, all lines are simply squashed together. needs more testing. 88 | 89 | # pandoc-ling 1.0 90 | 91 | First complete working version 92 | -------------------------------------------------------------------------------- /docs/figure/ExampleStructure.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/docs/figure/ExampleStructure.png -------------------------------------------------------------------------------- /docs/langsci-gb4e.sty: -------------------------------------------------------------------------------- 1 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 2 | %% File: langsci-gb4e.sty 3 | %% Author: Language Science Press (http://langsci-press.org) 4 | %% Date: 2020-03-17 13:12 UTC 5 | %% Purpose: This file contains an adapted version of the gb4e package 6 | %% for typetting linguistic examples. It also includes 7 | %% adapted versions of the cgloss and jambox packages 8 | %% Language: LaTeX 9 | %% Licence: 10 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 11 | 12 | \ProvidesPackage{langsci-gb4e}[2020/01/01] 13 | 14 | \usepackage{etoolbox} 15 | 16 | \newtoggle{cgloss} 17 | \toggletrue{cgloss} 18 | \newtoggle{jambox} 19 | \toggletrue{jambox} 20 | \DeclareOption{nocgloss}{\togglefalse{cgloss}} 21 | \DeclareOption{nojambox}{\togglefalse{jambox}} 22 | \DeclareOption*{\PackageWarning{examplepackage}{Unknown option ‘\CurrentOption’}} 23 | \ProcessOptions\relax 24 | 25 | % \def\gbVersion{4e} 26 | 27 | %%%%%%%%%%%%%%%%%%%%%%%% 28 | % Format of examples: % 29 | %%%%%%%%%%%%%%%%%%%%%%%% 30 | % \begin{exe} or \exbegin 31 | % (arab.) 32 | % \begin{xlist} or \xlist 33 | % (1st embedding, alph.) 34 | % \begin{xlisti} or \xlisti 35 | % (2st embedding, rom.) 36 | % \end{xlisti} or \endxlisti 37 | % 38 | % \end{xlist} or \endxlist 39 | % 40 | % \end{exe} or \exend 41 | % 42 | % Other sublist-styles: xlistA (Alph.), xlistI (Rom.), xlistn (arab) 43 | % 44 | % \ex (produces Number) 45 | % \ex (numbered example) 46 | % \ex[jdgmt]{sentence} (numbered example with judgement) 47 | % 48 | % \exi{ident} (produces identifier) 49 | % \exi{ident} (example numbered with identifier) 50 | % \exi{ident}[jdgmt]{sentence} (dito with judgement) 51 | % (\exr, \exp and \sn are defined in terms of \exi) 52 | % 53 | % \exr{label} (produces cross-referenced Num.) 54 | % \exr{label} (cross-referenced example) 55 | % \exr{label}[jdgmt]{sentence} (cross-referenced example with judgement) 56 | % 57 | % \exp{label} (same as 58 | % \exp{label} \exr but 59 | % \exp{label}[jdgmt]{sentence} with prime) 60 | % 61 | % \sn (unnumbered example) 62 | % \sn[jdgmt]{sentence} (unnumbered example with judgement) 63 | % 64 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 65 | % For my own lazyness (HANDLE WITH CARE---this works only 66 | % in boringly normal cases.... ): 67 | % 68 | % \ea works like \begin{exe}\ex or \begin{xlist}\ex, 69 | % depending on context 70 | % \z works like \end{exe} or \end{xlist}, dep on context 71 | % 72 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 73 | 74 | %CGLOSS META 75 | % Modified version of cgloss4e.sty. Hacked and renamed cgloss.sty 76 | % by Alexis Dimitriadis (alexis@babel.ling.upenn.edu). Integrated into 77 | % langsci-gb4e.sty by Sebastian Nordhoff 78 | % EnD CGLOSS META 79 | 80 | 81 | 82 | \@ifundefined{new@fontshape}{\def\reset@font{}\let\mathrm\rm\let\mathit\mit}{} 83 | 84 | 85 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 86 | % %% 87 | % Font Specifications %% 88 | % %% 89 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 90 | 91 | % Define commands for fonts to be used: 92 | % 93 | % 1) regular 94 | % a. example line 95 | \newcommand{\exfont}{\normalsize\upshape} 96 | % b. glossing line 97 | \newcommand{\glossfont}{\normalsize\upshape} 98 | % c. translation font 99 | \newcommand{\transfont}{\normalsize\upshape} 100 | % d. example number 101 | \newcommand{\exnrfont}{\exfont\upshape} 102 | % 103 | % 2) in footnote 104 | % a. example line 105 | \newcommand{\fnexfont}{\footnotesize\upshape} 106 | % b. glossing line 107 | \newcommand{\fnglossfont}{\footnotesize\upshape} 108 | % c. translation font 109 | \newcommand{\fntransfont}{\footnotesize\upshape} 110 | % d. example number 111 | \newcommand{\fnexnrfont}{\fnexfont\upshape} 112 | 113 | \newcommand{\examplesroman}{ 114 | \let\eachwordone=\upshape 115 | \exfont{\upshape} 116 | } 117 | \newcommand{\examplesitalics}{ 118 | \let\eachwordone=\itshape 119 | \exfont{\itshape} 120 | } 121 | 122 | 123 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 124 | %% %% 125 | %% Macros for examples, roughly following Linguistic Inquiry style. %% 126 | %% %% 127 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 128 | 129 | \def\qlist{\begin{list}{\Alph{xnum}.}{\usecounter{xnum}% 130 | \setlength{\rightmargin}{\leftmargin}}} 131 | \def\endqlist{\end{list}} 132 | 133 | \newif\if@noftnote\@noftnotetrue 134 | \newif\if@xrec\@xrecfalse 135 | \@definecounter{fnx} 136 | 137 | % set a flag that we are in footnotes now and change the size of example fonts 138 | \let\oldFootnotetext\@footnotetext 139 | 140 | \renewcommand\@footnotetext[1]{% 141 | \@noftnotefalse\setcounter{fnx}{0}% 142 | \begingroup% 143 | \let\exfont\fnexfont% 144 | \let\glossfont\fnglossfont% 145 | \let\transfont\fntransfont% 146 | \let\exnrfont\fnexnrfont% 147 | \oldFootnotetext{#1}% 148 | \endgroup% 149 | \@noftnotetrue} 150 | 151 | 152 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 153 | %% %% 154 | %% counters %% 155 | %% %% 156 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 157 | 158 | % start counters with 1 159 | \newcount\@xnumdepth \@xnumdepth = 0 160 | 161 | % define four levels of indentation 162 | \@definecounter{xnumi} 163 | \@definecounter{xnumii} 164 | \@definecounter{xnumiii} 165 | \@definecounter{xnumiv} 166 | 167 | 168 | % use (1) on page, but (i) in footnotes 169 | \def\thexnumi 170 | {\if@noftnote% 171 | \@arabic\@xsi{xnumi}% 172 | \else% 173 | \@roman\@xsi{xnumi}% 174 | \fi% 175 | } 176 | \def\thexnumii{\@xsii{xnumii}} 177 | \def\thexnumiii{\@xsiii{xnumiii}} 178 | \def\thexnumiv{\@xsiv{xnumiv}} 179 | \def\p@xnumii{\thexnumi% 180 | \if@noftnote% 181 | \else% 182 | .% 183 | \fi} 184 | \def\p@xnumiii{\thexnumi\thexnumii-} 185 | \def\p@xnumiv{\thexnumi\thexnumii-\thexnumiii-} 186 | 187 | \def\xs@default#1{\csname @@xs#1\endcsname} 188 | \def\@@xsi{\let\@xsi\arabic} 189 | \def\@@xsii{\let\@xsii\alph} 190 | \def\@@xsiii{\let\@xsiii\roman} 191 | \def\@@xsiv{\let\@xsi\arabic} 192 | 193 | \@definecounter{rxnumi} 194 | \@definecounter{rxnumii} 195 | \@definecounter{rxnumiii} 196 | \@definecounter{rxnumiv} 197 | 198 | \def\save@counters{% 199 | \setcounter{rxnumi}{\value{xnumi}}% 200 | \setcounter{rxnumii}{\value{xnumii}}% 201 | \setcounter{rxnumiii}{\value{xnumiii}}% 202 | \setcounter{rxnumiv}{\value{xnumiv}}}% 203 | 204 | \def\reset@counters{% 205 | \setcounter{xnumi}{\value{rxnumi}}% 206 | \setcounter{xnumii}{\value{rxnumii}}% 207 | \setcounter{xnumiii}{\value{rxnumiii}}% 208 | \setcounter{xnumiv}{\value{rxnumiv}}}% 209 | 210 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 211 | %% %% 212 | %% widths %% 213 | %% %% 214 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 215 | 216 | % Control the width of example identifiers 217 | \def\exewidth#1{\def\@exwidth{#1}} 218 | 219 | \newcommand{\twodigitexamples}{\exewidth{(23)}} 220 | \newcommand{\threedigitexamples}{\exewidth{(234)}} 221 | \newcommand{\fourdigitexamples}{\exewidth{(2345)}} 222 | 223 | \def\gblabelsep#1{\def\@gblabelsep{#1}} 224 | \gblabelsep{1em} 225 | 226 | \def\subexsep#1{\def\@subexsep{#1}} 227 | \subexsep{1.5ex} 228 | 229 | % set initial sizes of example number and judgement sizes 230 | \exewidth{\exnrfont (35)} 231 | 232 | % how much should examples in footnotes be indented? 233 | \newlength{\footexindent} 234 | \setlength{\footexindent}{0pt} 235 | 236 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 237 | %% %% 238 | %% example lists %% 239 | %% %% 240 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 241 | 242 | \def\exe{% 243 | %\ifnum\value{equation}>9 \exewidth{(23)}\else\fi% 244 | %inserted by LangSci, for large example numbers 245 | \ifnum\value{equation}>98 \exewidth{(235)}\else\fi% 246 | \@ifnextchar [{\@exe}{\@exe[\@exwidth]}} 247 | 248 | \def\@exe[#1]{\ifnum \@xnumdepth >0% 249 | \if@xrec\@exrecwarn\fi% 250 | \if@noftnote\@exrecwarn\fi% 251 | \@xnumdepth0\@listdepth0\@xrectrue% 252 | \save@counters% 253 | \fi% 254 | \advance\@xnumdepth \@ne \@@xsi% 255 | \if@noftnote% 256 | \begin{list}{(\thexnumi)}% 257 | {\usecounter{xnumi}\@subex{#1}{\@gblabelsep}{0em}% 258 | \setcounter{xnumi}{\value{equation}} 259 | \nopagebreak}% 260 | \else% 261 | \begin{list}{(\roman{xnumi})}% 262 | {\usecounter{xnumi}\@subex{(iiv)}{\@gblabelsep}{\footexindent}% 263 | \setcounter{xnumi}{\value{fnx}}}% 264 | \fi} 265 | 266 | 267 | \def\endexe{\if@noftnote\setcounter{equation}{\value{xnumi}}% 268 | \else\setcounter{fnx}{\value{xnumi}}% 269 | \reset@counters\@xrecfalse\fi\end{list}} 270 | 271 | \def\@exrecwarn{\typeout{*** Recursion on "exe"---your 272 | example numbering will probably be screwed up!}} 273 | 274 | \def\xlist{\@ifnextchar [{\@xlist{}}{\@xlist{}[iv.]}} 275 | \def\xlista{\@ifnextchar [{\@xlist{\alph}}{\@xlist{\alph}[m.]}} 276 | \def\xlistabr{\@ifnextchar [{\@xlist{(\alph)}}{\@xlist{(\alph)}[m.]}} 277 | \def\xlisti{\@ifnextchar [{\@xlist{\roman}}{\@xlist{\roman}[iv.]}} 278 | \def\xlistn{\@ifnextchar [{\@xlist{\arabic}}{\@xlist{\arabic}[9.]}} 279 | \def\xlistA{\@ifnextchar [{\@xlist{\Alph}}{\@xlist{\Alph}[M.]}} 280 | \def\xlistI{\@ifnextchar [{\@xlist{\Roman}}{\@xlist{\Roman}[IV.]}} 281 | 282 | \def\endxlist{\end{list}} 283 | \def\endxlista{\end{list}} 284 | \def\endxlistabr{\end{list}} 285 | \def\endxlistn{\end{list}} 286 | \def\endxlistA{\end{list}} 287 | \def\endxlistI{\end{list}} 288 | \def\endxlisti{\end{list}} 289 | 290 | 291 | 292 | 293 | %%% a generic sublist-styler 294 | \def\@xlist#1[#2]{\ifnum \@xnumdepth >3 \@toodeep\else% 295 | \advance\@xnumdepth \@ne% 296 | \edef\@xnumctr{xnum\romannumeral\the\@xnumdepth}% 297 | \def\@bla{#1} 298 | \ifx\@bla\empty\xs@default{\romannumeral\the\@xnumdepth}\else% 299 | \expandafter\let\csname @xs\romannumeral\the\@xnumdepth\endcsname#1\fi 300 | \begin{list}{\csname the\@xnumctr\endcsname.}% 301 | {\usecounter{\@xnumctr}\@subex{#2}{\@subexsep}{0em}}\fi} 302 | 303 | %% Added third argument to be able to add some more space to leftmargin 304 | %% for footnotes that have bigger indentation. 305 | %% St. M�. 07.01.2007 306 | \def\@subex#1#2#3{\settowidth{\labelwidth}{#1}\itemindent\z@\labelsep#2% 307 | \ifnum\the\@xnumdepth=1% 308 | \topsep 7\p@ plus2\p@ minus3\p@\itemsep3\p@ plus2\p@\else% 309 | \topsep1.5\p@ plus\p@\itemsep1.5\p@ plus\p@\fi% 310 | \parsep\p@ plus.5\p@ minus.5\p@% 311 | \leftmargin\labelwidth\advance\leftmargin#2\advance\leftmargin#3\relax} 312 | 313 | %%% the example-items 314 | \def\ex{\@ifnextchar [{\@ex}{\item}} 315 | \def\@ex[#1]#2{\item\@exj[#1]{#2}} 316 | \def\@exj[#1]#2{\@exjbg{#1} #2 \end{list}\nopagebreak} 317 | \def\exi#1{\item[#1]\@ifnextchar [{\@exj}{}} 318 | \def\judgewidth#1{\def\@jwidth{#1}} 319 | \judgewidth{??} 320 | \judgewidth{*} % if wider judgements are needed, enlarge within papers 321 | \def\@exjbg#1{\begin{list}{#1}{\@subex{\@jwidth}{.5ex}{0em}}\item} 322 | \def\exr#1{\exi{{(\ref{#1})}}} 323 | \def\exp#1{\exi{{(\ref{#1}$'$)}}} 324 | \def\sn{\exi{}} 325 | 326 | 327 | \def\ex{\@ifnextchar [{\exnrfont\@ex}{\exnrfont\item\exfont}} 328 | \def\@ex[#1]#2{\item\@exj[#1]{\exfont#2}} 329 | 330 | \def\@exjbg#1{\begin{list}{{\exnrfont#1}}{\@subex{\@jwidth}{.5ex}{0em}}\item} 331 | \def\exi#1{\item[{\exnrfont#1}]\@ifnextchar [{\exnrfont\@exj}{}} 332 | 333 | \def\ea{\ifnum\@xnumdepth=0\begin{exe}\else\begin{xlist}[iv.]\fi\raggedright\ex} 334 | \def\eal{\begin{exe}\exnrfont\ex\begin{xlist}[iv.]\raggedright} 335 | \def\eas{\ifnum\@xnumdepth=0\begin{exe}[(34)]\else\begin{xlist}[iv.]\fi\ex\begin{tabular}[t]{@{}p{\linewidth}@{}}} 336 | 337 | % allow hyphenation and justification 338 | \def\eanoraggedright{\ifnum\@xnumdepth=0\begin{exe}\else\begin{xlist}[iv.]\fi\ex} 339 | \def\ealnoraggedright{\begin{exe}\exnrfont\ex\begin{xlist}[iv.]} 340 | 341 | 342 | 343 | \def\z{\ifnum\@xnumdepth=1\end{exe}\else\end{xlist}\fi} 344 | \def\zl{\end{xlist}\end{exe}} 345 | \def\zs{\end{tabular}\ifnum\@xnumdepth=1\end{exe}\else\end{xlist}\fi} 346 | \def\zllast{\end{xlist}\end{exe}\removelastskip} 347 | 348 | % Control vertical space for examples in footnotes 349 | \def\zlast{\z\vspace{-\baselineskip}} 350 | \def\eafirst{\vspace{-1.5\baselineskip}\ea} 351 | 352 | %%%%%% control the alignment of exampleno. and (picture-)example 353 | %%%%%% (by Lex Holt ). 354 | \def\attop#1{\leavevmode\vtop{\strut\vskip-\baselineskip\vbox{#1}}} 355 | \def\atcenter#1{$\vcenter{#1}$} 356 | %%%%%% 357 | 358 | 359 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 360 | %% %% 361 | %% several examples in one line %% 362 | %% %% 363 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 364 | 365 | \newcommand{\xbox}[2]{\noindent\parbox[t]{#1}{#2}\noindent} 366 | \newcommand{\nobreakbox}[1]{\xbox{\linewidth}{#1}} 367 | \newcommand{\xref}[1]{(\ref{#1})} 368 | \newcommand{\xxref}[2]{(\ref{#1}--\ref{#2})} 369 | 370 | 371 | \iftoggle{cgloss}{ 372 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 373 | %% %% 374 | %% CGLOSS starts here %% 375 | %% %% 376 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 377 | 378 | 379 | \let\@gsingle=1 380 | \def\singlegloss{\let\@gsingle=1} 381 | \def\nosinglegloss{\let\@gsingle=0} 382 | \@ifundefined{new@fontshape}% 383 | {\def\@selfnt{\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi}} 384 | {\def\@selfnt{\selectfont}} 385 | 386 | \def\gll% % Introduces 2-line text-and-gloss. 387 | {\raggedright% 388 | \bgroup %\begin{flushleft} 389 | \ifx\@gsingle1% 390 | \def\baselinestretch{1}\@selfnt\fi 391 | \bgroup 392 | \twosent 393 | } 394 | 395 | \def\glll% % Introduces 3-line text-and-gloss. 396 | {\bgroup %\begin{flushleft} 397 | \ifx\@gsingle1% 398 | \def\baselinestretch{1}\@selfnt\fi 399 | \bgroup 400 | \threesent 401 | } 402 | 403 | 404 | \def\gllll% % Introduces 4-line text-and-gloss. 405 | {\bgroup %\begin{flushleft} 406 | \ifx\@gsingle1% 407 | \def\baselinestretch{1}\@selfnt\fi 408 | \bgroup 409 | \foursent 410 | } 411 | 412 | 413 | \def\glllll% % Introduces 5-line text-and-gloss. 414 | {\bgroup %\begin{flushleft} 415 | \ifx\@gsingle1% 416 | \def\baselinestretch{1}\@selfnt\fi 417 | \bgroup 418 | \fivesent 419 | } 420 | 421 | 422 | \def\gllllll% % Introduces 6-line text-and-gloss. 423 | {\bgroup %\begin{flushleft} 424 | \ifx\@gsingle1% 425 | \def\baselinestretch{1}\@selfnt\fi 426 | \bgroup 427 | \sixsent 428 | } 429 | 430 | 431 | \def\glllllll% % Introduces 7-line text-and-gloss. 432 | {\bgroup %\begin{flushleft} 433 | \ifx\@gsingle1% 434 | \def\baselinestretch{1}\@selfnt\fi 435 | \bgroup 436 | \sevensent 437 | } 438 | 439 | 440 | \def\gllllllll% % Introduces 8-line text-and-gloss. 441 | {\bgroup %\begin{flushleft} 442 | \ifx\@gsingle1% 443 | \def\baselinestretch{1}\@selfnt\fi 444 | \bgroup 445 | \eightsent 446 | } 447 | 448 | 449 | \newlength{\gltoffset} 450 | \setlength{\gltoffset}{.17\baselineskip} 451 | \newcommand{\nogltOffset}{\setlength{\gltoffset}{0pt}} 452 | \newcommand{\resetgltOffset}{\setlength{\gltoffset}{.17\baselineskip}} 453 | \def\glt{\ifhmode\\*[\gltoffset]\else\nobreak\vskip\gltoffset\nobreak\fi\transfont} 454 | 455 | 456 | % Introduces a translation 457 | \let\trans\glt 458 | 459 | % \def\gln{\relax} 460 | % % Ends the gloss environment. 461 | 462 | % The following TeX code is adapted, with permission, from: 463 | % gloss.tex: Macros for vertically aligning words in consecutive sentences. 464 | % Version: 1.0 release: 26 November 1990 465 | % Copyright (c) 1991 Marcel R. van der Goot (marcel@cs.caltech.edu). 466 | 467 | \newbox\lineone % boxes with words from first line 468 | \newbox\linetwo 469 | \newbox\linethree 470 | \newbox\linefour 471 | \newbox\linefive 472 | \newbox\linesix 473 | \newbox\lineseven 474 | \newbox\lineeight 475 | \newbox\wordone % a word from the first line (hbox) 476 | \newbox\wordtwo 477 | \newbox\wordthree 478 | \newbox\wordfour 479 | \newbox\wordfive 480 | \newbox\wordsix 481 | \newbox\wordseven 482 | \newbox\wordeight 483 | \newbox\gline % the constructed double line (hbox) 484 | \newskip\glossglue % extra glue between glossed pairs or tuples 485 | \glossglue = 0pt plus 2pt minus 1pt % allow stretch/shrink between words 486 | %\glossglue = 5pt plus 2pt minus 1pt % allow stretch/shrink between words 487 | \newif\ifnotdone 488 | 489 | \@ifundefined{eachwordone}{\let\eachwordone=\upshape}{\relax} 490 | \@ifundefined{eachwordtwo}{\let\eachwordtwo=\upshape}{\relax} 491 | \@ifundefined{eachwordthree}{\let\eachwordthree=\upshape}{\relax} 492 | \@ifundefined{eachwordfour}{\let\eachwordfour=\upshape}{\relax} 493 | \@ifundefined{eachwordfive}{\let\eachwordfive=\upshape}{\relax} 494 | \@ifundefined{eachwordsix}{\let\eachwordsix=\upshape}{\relax} 495 | \@ifundefined{eachwordseven}{\let\eachwordseven=\upshape}{\relax} 496 | \@ifundefined{eachwordeight}{\let\eachwordeight=\upshape}{\relax} 497 | 498 | \def\lastword#1#2#3% #1 = \each, #2 = line box, #3 = word box 499 | {\setbox#2=\vbox{\unvbox#2% 500 | \global\setbox#3=\lastbox 501 | }% 502 | \ifvoid#3\global\setbox#3=\hbox{#1\strut{} }\fi 503 | % extra space following \strut in case #1 needs a space 504 | } 505 | 506 | \def\testdone 507 | {\ifdim\ht\lineone=0pt 508 | \ifdim\ht\linetwo=0pt \notdonefalse % tricky space after pt 509 | \else\notdonetrue 510 | \fi 511 | \else\notdonetrue 512 | \fi 513 | } 514 | 515 | \gdef\getwords(#1,#2)#3 #4\\% #1=linebox, #2=\each, #3=1st word, #4=remainder 516 | {\setbox#1=\vbox{\hbox{#2\strut#3{} }% adds space, the {} is needed for CJK otherwise the space 517 | % would be ignored 518 | \unvbox#1% 519 | }% 520 | \def\more{#4}% 521 | \ifx\more\empty\let\more=\donewords 522 | \else\let\more=\getwords 523 | \fi 524 | \more(#1,#2)#4\\% 525 | } 526 | 527 | \gdef\donewords(#1,#2)\\{}% 528 | 529 | \gdef\twosent#1\\ #2\\{% #1 = first line, #2 = second line 530 | \getwords(\lineone,\eachwordone)#1 \\% 531 | \getwords(\linetwo,\eachwordtwo)#2 \\% 532 | \loop\lastword{\eachwordone}{\lineone}{\wordone}% 533 | \lastword{\eachwordtwo}{\linetwo}{\wordtwo}% 534 | \global\setbox\gline=\hbox{\unhbox\gline 535 | \hskip\glossglue 536 | \vtop{\box\wordone % vtop was vbox 537 | \nointerlineskip 538 | \box\wordtwo 539 | }% 540 | }% 541 | \testdone 542 | \ifnotdone 543 | \repeat 544 | \egroup % matches \bgroup in \gloss 545 | \gl@stop} 546 | 547 | \gdef\threesent#1\\ #2\\ #3\\{% #1 = first line, #2 = second line, #3 = third 548 | \getwords(\lineone,\eachwordone)#1 \\% 549 | \getwords(\linetwo,\eachwordtwo)#2 \\% 550 | \getwords(\linethree,\eachwordthree)#3 \\% 551 | \loop\lastword{\eachwordone}{\lineone}{\wordone}% 552 | \lastword{\eachwordtwo}{\linetwo}{\wordtwo}% 553 | \lastword{\eachwordthree}{\linethree}{\wordthree}% 554 | \global\setbox\gline=\hbox{\unhbox\gline 555 | \hskip\glossglue 556 | \vtop{\box\wordone % vtop was vbox 557 | \nointerlineskip 558 | \box\wordtwo 559 | \nointerlineskip 560 | \box\wordthree 561 | }% 562 | }% 563 | \testdone 564 | \ifnotdone 565 | \repeat 566 | \egroup % matches \bgroup in \gloss 567 | \gl@stop} 568 | 569 | 570 | 571 | \gdef\foursent#1\\ #2\\ #3\\ #4\\{% #1 = first line, #2 = second line, #3 = third etc 572 | \getwords(\lineone,\eachwordone)#1 \\% 573 | \getwords(\linetwo,\eachwordtwo)#2 \\% 574 | \getwords(\linethree,\eachwordthree)#3 \\% 575 | \getwords(\linefour,\eachwordfour)#4 \\% 576 | \loop\lastword{\eachwordone}{\lineone}{\wordone}% 577 | \lastword{\eachwordtwo}{\linetwo}{\wordtwo}% 578 | \lastword{\eachwordthree}{\linethree}{\wordthree}% 579 | \lastword{\eachwordfour}{\linefour}{\wordfour}% 580 | \global\setbox\gline=\hbox{\unhbox\gline 581 | \hskip\glossglue 582 | \vtop{\box\wordone % vtop was vbox 583 | \nointerlineskip 584 | \box\wordtwo 585 | \nointerlineskip 586 | \box\wordthree 587 | \nointerlineskip 588 | \box\wordfour 589 | }% 590 | }% 591 | \testdone 592 | \ifnotdone 593 | \repeat 594 | \egroup % matches \bgroup in \gloss 595 | \gl@stop} 596 | 597 | 598 | 599 | \gdef\fivesent#1\\ #2\\ #3\\ #4\\ #5\\{% #1 = first line, #2 = second line, #3 = third etc 600 | \getwords(\lineone,\eachwordone)#1 \\% 601 | \getwords(\linetwo,\eachwordtwo)#2 \\% 602 | \getwords(\linethree,\eachwordthree)#3 \\% 603 | \getwords(\linefour,\eachwordfour)#4 \\% 604 | \getwords(\linefive,\eachwordfive)#5 \\% 605 | \loop\lastword{\eachwordone}{\lineone}{\wordone}% 606 | \lastword{\eachwordtwo}{\linetwo}{\wordtwo}% 607 | \lastword{\eachwordthree}{\linethree}{\wordthree}% 608 | \lastword{\eachwordfour}{\linefour}{\wordfour}% 609 | \lastword{\eachwordfive}{\linefive}{\wordfive}% 610 | \global\setbox\gline=\hbox{\unhbox\gline 611 | \hskip\glossglue 612 | \vtop{\box\wordone % vtop was vbox 613 | \nointerlineskip 614 | \box\wordtwo 615 | \nointerlineskip 616 | \box\wordthree 617 | \nointerlineskip 618 | \box\wordfour 619 | \nointerlineskip 620 | \box\wordfive 621 | }% 622 | }% 623 | \testdone 624 | \ifnotdone 625 | \repeat 626 | \egroup % matches \bgroup in \gloss 627 | \gl@stop} 628 | 629 | 630 | 631 | \gdef\sixsent#1\\ #2\\ #3\\ #4\\ #5\\ #6\\{% #1 = first line, #2 = second line, #3 = third etc 632 | \getwords(\lineone,\eachwordone)#1 \\% 633 | \getwords(\linetwo,\eachwordtwo)#2 \\% 634 | \getwords(\linethree,\eachwordthree)#3 \\% 635 | \getwords(\linefour,\eachwordfour)#4 \\% 636 | \getwords(\linefive,\eachwordfive)#5 \\% 637 | \getwords(\linesix,\eachwordsix)#6 \\% 638 | \loop\lastword{\eachwordone}{\lineone}{\wordone}% 639 | \lastword{\eachwordtwo}{\linetwo}{\wordtwo}% 640 | \lastword{\eachwordthree}{\linethree}{\wordthree}% 641 | \lastword{\eachwordfour}{\linefour}{\wordfour}% 642 | \lastword{\eachwordfive}{\linefive}{\wordfive}% 643 | \lastword{\eachwordsix}{\linesix}{\wordsix}% 644 | \global\setbox\gline=\hbox{\unhbox\gline 645 | \hskip\glossglue 646 | \vtop{\box\wordone % vtop was vbox 647 | \nointerlineskip 648 | \box\wordtwo 649 | \nointerlineskip 650 | \box\wordthree 651 | \nointerlineskip 652 | \box\wordfour 653 | \nointerlineskip 654 | \box\wordfive 655 | \nointerlineskip 656 | \box\wordsix 657 | }% 658 | }% 659 | \testdone 660 | \ifnotdone 661 | \repeat 662 | \egroup % matches \bgroup in \gloss 663 | \gl@stop} 664 | 665 | 666 | 667 | \gdef\sevensent#1\\ #2\\ #3\\ #4\\ #5\\ #6\\ #7\\{% #1 = first line, #2 = second line, #3 = third etc 668 | \getwords(\lineone,\eachwordone)#1 \\% 669 | \getwords(\linetwo,\eachwordtwo)#2 \\% 670 | \getwords(\linethree,\eachwordthree)#3 \\% 671 | \getwords(\linefour,\eachwordfour)#4 \\% 672 | \getwords(\linefive,\eachwordfive)#5 \\% 673 | \getwords(\linesix,\eachwordsix)#6 \\% 674 | \getwords(\lineseven,\eachwordseven)#7 \\% 675 | \loop\lastword{\eachwordone}{\lineone}{\wordone}% 676 | \lastword{\eachwordtwo}{\linetwo}{\wordtwo}% 677 | \lastword{\eachwordthree}{\linethree}{\wordthree}% 678 | \lastword{\eachwordfour}{\linefour}{\wordfour}% 679 | \lastword{\eachwordfive}{\linefive}{\wordfive}% 680 | \lastword{\eachwordsix}{\linesix}{\wordsix}% 681 | \lastword{\eachwordseven}{\lineseven}{\wordseven}% 682 | \global\setbox\gline=\hbox{\unhbox\gline 683 | \hskip\glossglue 684 | \vtop{\box\wordone % vtop was vbox 685 | \nointerlineskip 686 | \box\wordtwo 687 | \nointerlineskip 688 | \box\wordthree 689 | \nointerlineskip 690 | \box\wordfour 691 | \nointerlineskip 692 | \box\wordfive 693 | \nointerlineskip 694 | \box\wordsix 695 | \nointerlineskip 696 | \box\wordseven 697 | }% 698 | }% 699 | \testdone 700 | \ifnotdone 701 | \repeat 702 | \egroup % matches \bgroup in \gloss 703 | \gl@stop} 704 | 705 | 706 | 707 | \gdef\eightsent#1\\ #2\\ #3\\ #4\\ #5\\ #6\\ #7\\ #8\\{% #1 = first line, #2 = second line, #3 = third etc 708 | \getwords(\lineone,\eachwordone)#1 \\% 709 | \getwords(\linetwo,\eachwordtwo)#2 \\% 710 | \getwords(\linethree,\eachwordthree)#3 \\% 711 | \getwords(\linefour,\eachwordfour)#4 \\% 712 | \getwords(\linefive,\eachwordfive)#5 \\% 713 | \getwords(\linesix,\eachwordsix)#6 \\% 714 | \getwords(\lineseven,\eachwordseven)#7 \\% 715 | \getwords(\lineeight,\eachwordeight)#8 \\% 716 | \loop\lastword{\eachwordone}{\lineone}{\wordone}% 717 | \lastword{\eachwordtwo}{\linetwo}{\wordtwo}% 718 | \lastword{\eachwordthree}{\linethree}{\wordthree}% 719 | \lastword{\eachwordfour}{\linefour}{\wordfour}% 720 | \lastword{\eachwordfive}{\linefive}{\wordfive}% 721 | \lastword{\eachwordsix}{\linesix}{\wordsix}% 722 | \lastword{\eachwordseven}{\lineseven}{\wordseven}% 723 | \lastword{\eachwordeight}{\lineeight}{\wordeight}% 724 | \global\setbox\gline=\hbox{\unhbox\gline 725 | \hskip\glossglue 726 | \vtop{\box\wordone % vtop was vbox 727 | \nointerlineskip 728 | \box\wordtwo 729 | \nointerlineskip 730 | \box\wordthree 731 | \nointerlineskip 732 | \box\wordfour 733 | \nointerlineskip 734 | \box\wordfive 735 | \nointerlineskip 736 | \box\wordsix 737 | \nointerlineskip 738 | \box\wordseven 739 | \nointerlineskip 740 | \box\wordeight 741 | }% 742 | }% 743 | \testdone 744 | \ifnotdone 745 | \repeat 746 | \egroup % matches \bgroup in \gloss 747 | \gl@stop} 748 | 749 | %\def\gl@stop{{\hskip -\glossglue}\unhbox\gline\end{flushleft}} 750 | 751 | % \leavevmode puts us back in horizontal mode, so that a \\ will work 752 | \def\gl@stop{{\hskip -\glossglue}\unhbox\gline\leavevmode \egroup} 753 | }{} %end toggle cgloss 754 | 755 | \iftoggle{jambox}{ 756 | %BeGIN Jambox 757 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% 758 | % 759 | % Alexis Dimitriadis 760 | % 761 | % This is version 0.3 (informal release, Nov. 2003). 762 | % 763 | % Line up material a fixed distance from the right margin. For annotating 764 | % example sentences, usually with a short note in parentheses. 765 | % May overflow to the left or right, or line up on the next line as necessary. 766 | % 767 | % \jambox[width]{text} Align 'text' starting 'width' distance from the 768 | % right margin (default \the\jamwidth). 769 | % \jam(something) Align a note delimited by parentheses (which are 770 | % retained). No optional argument. 771 | % \jambox*{text} Set \jamwidth to the width of 'text', then align it. 772 | % (\jamwidth stays set for the rest of the environment). 773 | % 774 | % Notes: 775 | % 776 | % Distance from the right margin can be set to an explicit amount, or to the 777 | % width of some piece of text, as follows: 778 | % 779 | % \jamwidth=2in\relax Or 780 | % \settowidth\jamwidth {(``annotation'')} 781 | % 782 | % \jamwidth is locally scoped, so it can be set globally or inside an example 783 | % environment. 784 | % 785 | % BUG: Not compatible with ragged-right mode. 786 | % 787 | % Incompatibilities: Not useful with the vanilla cgloss4e.sty, which ends 788 | % glossed lines prematurely. 789 | % I do have a suitably modified file, cgloss.sty. With it you can do the 790 | % following: 791 | % \gll To kimeno. \\ 792 | % the text \\ \jambox{(Greek)} 793 | % \trans `The text.' 794 | 795 | 796 | \newdimen\jamwidth \jamwidth=2in 797 | \def\jambox{\@ifnextchar[{\@jambox} 798 | {\@ifnextchar*{\@jamsetbox}{\@jambox[\the\jamwidth]}}} 799 | 800 | % Set width AND display the argument. 801 | % The star is read and ignored; the argument #1 is boxed, used to set 802 | % \jamwidth, then passed to \@jambox (which also puts it in \@tempboxa!) 803 | % 804 | \def\@jamsetbox*#1{\setbox\@tempboxa\hbox{#1}\jamwidth=\wd\@tempboxa 805 | \@jambox[\the\jamwidth]{\box\@tempboxa}} 806 | 807 | \def\@jambox[#1]#2{{\setbox\@tempboxa\hbox {#2}% 808 | \ifdim \wd\@tempboxa<#1\relax % if label fits in the alloted space: 809 | \@tempdima=#1\relax \advance\@tempdima by-\wd\@tempboxa % remaining \hspace 810 | \unskip\nobreak\hfill\penalty250 % break line here if necessary 811 | \hskip 1.2em minus 1.2em % used when the line extends past the margin 812 | \hbox{}\nobreak\hfill\box\@tempboxa\nobreak 813 | \hskip\@tempdima minus \@tempdima\hbox{}% 814 | \else % the label is too wide: just right-align it 815 | \hfill\penalty50\hbox{}\nobreak\hfill\box\@tempboxa 816 | \fi 817 | % suppress closing glue: 818 | \parfillskip=0pt \finalhyphendemerits=0 \par}} 819 | % The penalty enables a break, taken only if the line cannot fit. 820 | % The \hbox{} ensures the next line does not begin with \hfill, which would 821 | % be discarded if initial. 822 | % (\vadjust inserts an empty element at the beginning of the next line, so 823 | % that COULD be used instead of \hbox{}). 824 | % Algorithm adapted from The TeXBook. 825 | % 826 | % The closing \par could be a problem if there is a \parskip... 827 | }{} 828 | \endinput 829 | -------------------------------------------------------------------------------- /docs/pandoc-ling-old.lua: -------------------------------------------------------------------------------- 1 | --[[ 2 | pandoc-linguex: make interlinear glossing with pandoc 3 | 4 | Copyright © 2021 Michael Cysouw 5 | 6 | Permission to use, copy, modify, and/or distribute this software for any 7 | purpose with or without fee is hereby granted, provided that the above 8 | copyright notice and this permission notice appear in all copies. 9 | 10 | THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES 11 | WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF 12 | MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR 13 | ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES 14 | WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN 15 | ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF 16 | OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. 17 | ]] 18 | 19 | PANDOC_VERSION:must_be_at_least '2.10' 20 | 21 | --------------------- 22 | -- 'global' variables 23 | --------------------- 24 | 25 | local counter = 0 -- actual numbering of examples 26 | local chapter = 1 -- numbering of chapters (for unknown reasons this starts at 1, not 0) 27 | local counterInChapter = 0 -- counter reset for each chapter 28 | local indexEx = {} -- global lookup for example IDs 29 | local orderInText = 0 -- order of references for resolving "Next"-style references 30 | local indexRef = {} -- key/value: order in text = refID/exID 31 | local rev_indexRef = {} -- "reversed" indexRef, i.e. key/value: refID/exID = order-number in text 32 | 33 | ------------------------------------ 34 | -- User Settings with default values 35 | ------------------------------------ 36 | 37 | local formatGloss = false -- format interlinear examples 38 | local xrefSuffixSep = " " --   separator to be inserted after number in example references 39 | local restartAtChapter = false -- restart numbering at highest header without adding local chapternumbers 40 | local addChapterNumber = false -- add chapternumbers to counting and restart at highest header 41 | local latexPackage = "linguex" 42 | local topDivision = "section" 43 | 44 | function getUserSettings (meta) 45 | if meta.formatGloss ~= nil then 46 | formatGloss = meta.formatGloss 47 | end 48 | if meta.xrefSuffixSep ~= nil then 49 | xrefSuffixSep = pandoc.utils.stringify(meta.xrefSuffixSep) 50 | end 51 | if meta.restartAtChapter ~= nil then 52 | restartAtChapter = meta.restartAtChapter 53 | end 54 | if meta.addChapterNumber ~= nil then 55 | addChapterNumber = meta.addChapterNumber 56 | end 57 | if meta.latexPackage ~= nil then 58 | latexPackage = pandoc.utils.stringify(meta.latexPackage) 59 | end 60 | if meta["top-level-division"] ~= nil then 61 | topDivision = pandoc.utils.stringify(meta["top-level-division"]) 62 | end 63 | end 64 | 65 | ------------------------------------------ 66 | -- add latex dependencies: langsci-gb4e is not on CTAN! 67 | -- restarting of counters is not working right for gb4e 68 | ------------------------------------------ 69 | 70 | function addFormatting (meta) 71 | local tmp = meta['header-includes'] or pandoc.MetaList{meta['header-includes']} 72 | 73 | if FORMAT:match "html" then 74 | -- add specific CSS for layout of examples 75 | -- building on classes set in this filter 76 | -- local f = io.open("pandoc-ling.css") 77 | -- local css = f:read("*a") 78 | -- f:close() 79 | local css = [[ 80 | 100 | ]] 101 | tmp[#tmp+1] = pandoc.MetaBlocks(pandoc.RawBlock("html", css)) 102 | 103 | meta['header-includes'] = tmp 104 | end 105 | 106 | if FORMAT:match "latex" then 107 | 108 | local function add (s) 109 | tmp[#tmp+1] = pandoc.MetaBlocks(pandoc.RawBlock("tex", s)) 110 | end 111 | 112 | if latexPackage == "linguex" then 113 | add("\\usepackage{linguex}") 114 | -- no brackets 115 | add("\\renewcommand{\\theExLBr}{}") 116 | add("\\renewcommand{\\theExRBr}{}") 117 | --add("\\renewcommand{\\firstrefdash}{}") 118 | add("\\usepackage{chngcntr}") 119 | if addChapterNumber then 120 | add("\\counterwithin{ExNo}{"..topDivision.."}") 121 | add("\\renewcommand{\\Exarabic}{\\the"..topDivision..".\\arabic}") 122 | elseif restartAtChapter then 123 | add("\\counterwithin*{ExNo}{"..topDivision.."}") 124 | end 125 | 126 | elseif latexPackage:match "gb4e" then 127 | add("\\usepackage{"..latexPackage.."}") 128 | -- nnext package does not work with added top level number 129 | add("\\usepackage[noparens]{nnext}") 130 | add("\\usepackage{chngcntr}") 131 | if addChapterNumber then 132 | add("\\counterwithin{xnumi}{"..topDivision.."}") 133 | elseif restartAtChapter then 134 | add("\\counterwithin*{xnumi}{"..topDivision.."}") 135 | end 136 | 137 | elseif latexPackage == "expex" then 138 | add("\\usepackage{expex}") 139 | add("\\lingset{belowglpreambleskip=-1.5ex, aboveglftskip=-1.5ex, exskip=0ex, interpartskip=-0.5ex, belowpreambleskip=-1ex}") 140 | if addChapterNumber then 141 | add("\\lingset{exnotype=chapter.arabic}") 142 | end 143 | if restartAtChapter then 144 | --add("\\usepackage{epltxchapno}") 145 | add("\\usepackage{etoolbox}") 146 | add("\\pretocmd{\\"..topDivision.."}{\\excnt=1}{}{}") 147 | end 148 | 149 | end 150 | meta['header-includes'] = tmp 151 | end 152 | return meta 153 | end 154 | 155 | ------------------------------------------ 156 | -- add invisible numbering to section 157 | ------------------------------------------ 158 | 159 | function addSectionNumbering (doc) 160 | local sections = pandoc.utils.make_sections(true, nil, doc.blocks) 161 | return pandoc.Pandoc(sections, doc.meta) 162 | end 163 | 164 | --------------------------- 165 | -- help function for format 166 | --------------------------- 167 | 168 | function splitPara (p) 169 | -- remove quotes, they interfere with the layout 170 | if p[1].tag == "Quoted" then 171 | p = p[1].content 172 | end 173 | -- split paragraph in subtables at Space 174 | -- to insert paragraph into pandoc.Table 175 | -- Is there a better way to do this in Pandoc-Lua? 176 | local start = 1 177 | local result = {} 178 | for i=1,#p do 179 | if p[i].tag == "Space" then 180 | local chunk = table.move(p, start, i-1, 1, {}) 181 | table.insert(result, {pandoc.Plain(chunk)} ) 182 | start = i + 1 183 | end 184 | end 185 | if start <= #p then 186 | local chunk = table.move(p, start, #p, 1, {}) 187 | table.insert(result, {pandoc.Plain(chunk)} ) 188 | end 189 | return result 190 | end 191 | 192 | function turnIntoTable (rowContent, nCols, extraCols) 193 | -- turn examples into Tables for alignment 194 | -- use simpleTable for construction 195 | local caption = {} 196 | local headers = {} 197 | local aligns = {} 198 | for i=1,nCols do aligns[i] = "AlignLeft" end 199 | aligns[extraCols + 1] = "AlignRight" -- Column for grammaticality judgements 200 | local widths = {} 201 | for i=1,nCols do widths[i] = 0 end 202 | local rows = rowContent 203 | 204 | local result = pandoc.SimpleTable( 205 | caption, 206 | aligns, 207 | widths, 208 | headers, 209 | rows 210 | ) 211 | -- turn into fancy new tables 212 | result = pandoc.utils.from_simple_table(result) 213 | 214 | -- set class of table to "example" for styling via CSS 215 | result.attr = {class = "linguistic-example"} 216 | -- set class of judgment columns to "judgment" for styling via CSS 217 | for i=1,#result.bodies[1].body do 218 | result.bodies[1].body[i][2][extraCols+1].attr = pandoc.Attr(nil, {"linguistic-judgement"}) 219 | end 220 | 221 | return result 222 | end 223 | 224 | function splitForSmallCaps (s) 225 | -- turn uppercase in gloss into small caps 226 | local split = {} 227 | for lower,upper in string.gmatch(s, "(.-)([%u%d][%u%d]+)") do 228 | if lower ~= "" then 229 | lower = pandoc.Str(lower) 230 | table.insert(split, lower) 231 | end 232 | upper = pandoc.SmallCaps(pandoc.text.lower(upper)) 233 | table.insert(split, upper) 234 | end 235 | for leftover in string.gmatch(s, "[%u%d][%u%d]+(.-[^%u%s])$") do 236 | leftover = pandoc.Str(leftover) 237 | table.insert(split, leftover) 238 | end 239 | if #split == 0 then 240 | if s == "~" then s = "   " end -- sequence "space-nobreakspace-space" 241 | table.insert(split, pandoc.Str(s)) 242 | end 243 | 244 | return split 245 | end 246 | 247 | function splitJudgement (line) 248 | local judgement = "" 249 | local first = pandoc.utils.stringify(line[1]) 250 | if first == "^" then 251 | judgement = line[2] 252 | table.remove(line, 1) 253 | table.remove(line, 1) 254 | table.remove(line, 1) 255 | elseif string.sub(first, 1, 1) == "^" then 256 | judgement = pandoc.Str(string.sub(first, 2)) 257 | table.remove(line, 1) 258 | table.remove(line, 1) 259 | end 260 | return judgement, line 261 | end 262 | 263 | ------------------------ 264 | -- make markup in Pandoc 265 | ------------------------ 266 | 267 | function pandocMakeSingle (single, extraCols) 268 | -- Make just a single-line example 269 | local judge, data = splitJudgement(single) 270 | local line = { {pandoc.Plain(judge)}, {pandoc.Plain(data)} } 271 | 272 | -- add extra columns before 273 | -- either one (nummer) or two (nummer, letter) 274 | if extraCols > 0 then 275 | for i=1,extraCols do 276 | table.insert(line, 1, {} ) 277 | end 278 | end 279 | 280 | -- turn into Table 281 | local nCols = #line 282 | local rowContent = { line } 283 | local exampleSingle = turnIntoTable(rowContent, nCols, extraCols) 284 | return exampleSingle 285 | end 286 | 287 | function pandocMakeInterlinear (block, extraCols, formatOverride) 288 | -- Make interlinear gloss 4-liner from LineBlock input 289 | -- override format per example 290 | local globalFormatGloss = formatGloss 291 | if formatOverride ~= nil then 292 | formatGloss = (formatOverride == "true") 293 | end 294 | 295 | -- the four lines are: header, source, gloss and trans(lation) 296 | local header = { { pandoc.Plain(block[1]) } } 297 | table.insert(header, 1, {} ) 298 | 299 | local judgeSource, source = splitJudgement(block[2]) 300 | source = splitPara(source) 301 | if formatGloss then 302 | -- remove format at make emph throughout 303 | for i=1,#source do 304 | local string = pandoc.utils.stringify(source[i]) 305 | source[i] = { pandoc.Plain(pandoc.Emph(string)) } 306 | end 307 | end 308 | table.insert(source, 1, { pandoc.Plain(judgeSource) } ) 309 | 310 | local gloss = splitPara(block[3]) 311 | if formatGloss then 312 | -- remove format and turn capital-sequences into smallcaps 313 | for i=1,#gloss do 314 | local string = pandoc.utils.stringify(gloss[i]) 315 | gloss[i] = { pandoc.Plain(splitForSmallCaps(string)) } 316 | end 317 | end 318 | table.insert(gloss, 1, {} ) 319 | 320 | local trans = block[#block] 321 | if formatGloss then 322 | -- remove quotes and add singlequote througout 323 | if trans[1].tag == "Quoted" then 324 | trans = trans[1].content 325 | end 326 | trans = {{ pandoc.Plain(pandoc.Quoted("SingleQuote", trans)) }} 327 | else 328 | trans = {{ pandoc.Plain(trans) }} 329 | end 330 | table.insert(trans, 1, {} ) 331 | 332 | -- return to global setting 333 | if formatOverride ~= nil then 334 | formatGloss = globalFormatGloss 335 | end 336 | 337 | -- add extra columns before, either one or two 338 | for i=1,extraCols do 339 | table.insert(header, 1, {} ) 340 | table.insert(source, 1, {} ) 341 | table.insert(gloss, 1, {} ) 342 | table.insert(trans, 1, {} ) 343 | end 344 | 345 | -- turn into Table 346 | local nCols = math.max(#source, #gloss) 347 | local rowContent = {header, source, gloss, trans} 348 | local interlinear = turnIntoTable(rowContent, nCols, extraCols) 349 | 350 | -- make header and trans long cells 351 | interlinear.bodies[1].body[1][2][extraCols+2].col_span = nCols - extraCols - 1 352 | interlinear.bodies[1].body[#block][2][extraCols+2].col_span = nCols - extraCols - 1 353 | 354 | -- shift upwards when header is empty 355 | if next(block[1]) == nil then 356 | table.remove(interlinear.bodies[1].body, 1) 357 | end 358 | 359 | return interlinear 360 | end 361 | 362 | -- When multiple interlinears are combined, separate Tables are needed 363 | -- also make separate Tables when single examples are mixed with interlinears 364 | 365 | function pandocMakeList(data, number, formatOverride) 366 | -- make a list of tables 367 | local example = {} 368 | -- go through all items of the list 369 | for i=1,#data do 370 | 371 | if data[i][1].tag ~= "LineBlock" then 372 | example[i] = pandocMakeSingle(data[i][1].content, 2) 373 | -- add letter for sub-example in second column 374 | example[i].bodies[1].body[1][2][2].contents[1] = 375 | pandoc.Plain(string.char(96+i)..".") 376 | 377 | if i>1 and data[i-1][1].tag ~= "LineBlock" then 378 | -- add tablerow to previous if also Plain/Para 379 | table.insert(example[i-1].bodies[1].body, example[i].bodies[1].body[1]) 380 | -- exchange tables 381 | example[i] = example[i-1] 382 | example[i-1] = "ignore" 383 | end 384 | 385 | elseif data[i][1].tag == "LineBlock" then 386 | example[i] = pandocMakeInterlinear(data[i][1].content, 2, formatOverride) 387 | -- add letter for sub-example in second column 388 | example[i].bodies[1].body[1][2][2].contents[1] = 389 | pandoc.Plain(string.char(96+i)..".") 390 | end 391 | end 392 | 393 | -- remove empty tables. Work around for `table.remove` 394 | local exampleList = {} 395 | for i=1,#example do 396 | if example[i] ~= "ignore" then 397 | table.insert(exampleList,example[i]) 398 | end 399 | end 400 | 401 | -- keep track of judgements for better alignment 402 | local judgeSize = 0 403 | for i=1,#exampleList do 404 | for j=1,#exampleList[i].bodies[1].body do 405 | if exampleList[i].bodies[1].body[j][2][3].contents[1] ~= nil then 406 | local judge = pandoc.utils.stringify(exampleList[i].bodies[1].body[j][2][3].contents[1]) 407 | judgeSize = math.max(judgeSize, utf8.len(judge)) 408 | end 409 | end 410 | end 411 | 412 | -- rough approximations 413 | local spaceForNumber = string.rep(" ", 2*(string.len(number)+2)) 414 | local spaceForLabel = tostring(15 + 5*judgeSize) 415 | if judgeSize == 0 then spaceForLabel = 0 end 416 | 417 | for i=1,#exampleList do 418 | -- For better alignment with example number, add invisibles in first column 419 | -- not nice solution, but portable across formats 420 | exampleList[i].bodies[1].body[1][2][1].contents[1] = pandoc.Plain(spaceForNumber) 421 | -- For better alignment, add column-width to judgement column 422 | -- note: this is not portable outside html 423 | exampleList[i].bodies[1].body[1][2][3].attr = 424 | pandoc.Attr(nil, { "linguistic-judgement" }, { width = spaceForLabel.."px"} ) 425 | end 426 | 427 | return exampleList 428 | end 429 | 430 | function pandocMakeExample (data, number, formatOverride) 431 | -- make the examples as list of tables 432 | local example = {} 433 | local preamble = nil 434 | 435 | if #data == 2 then 436 | -- first part is assumed to be preamble 437 | preamble = data[1].content 438 | -- go on with second part 439 | data = { data[2] } 440 | end 441 | 442 | if data[1].tag == "Para" then 443 | -- make one-line example 444 | example[1] = pandocMakeSingle(data[1].content, 1) 445 | elseif data[1].tag == "LineBlock" then 446 | -- make one interlinear example 447 | example[1] = pandocMakeInterlinear(data[1].content, 1, formatOverride) 448 | elseif data[1].tag == "OrderedList" then 449 | -- make list of examples 450 | example = pandocMakeList(data[1].content, number, formatOverride) 451 | end 452 | 453 | if preamble ~= nil then 454 | -- How many positions should preamble be shifted to the left? 455 | local shift = 1 456 | if data[1].tag == "OrderedList" then shift = 0 end 457 | -- insert preamble as first row in example 458 | preamble = pandocMakeSingle(preamble, shift) 459 | table.insert(example[1].bodies[1].body, 1, preamble.bodies[1].body[1]) 460 | -- make preamble multi-column 461 | local range = #example[1].colspecs - shift - 1 462 | example[1].bodies[1].body[1][2][2].col_span = range 463 | end 464 | 465 | -- Add example number to top left of first table 466 | local numberParen = pandoc.Plain( "("..number..")" ) 467 | example[1].bodies[1].body[1][2][1].contents[1] = numberParen 468 | 469 | return example 470 | end 471 | 472 | -------------------------- 473 | -- make markup in Latex 474 | -- using langsci-gb4e 475 | -------------------------- 476 | 477 | -- convenience functions for Latex 478 | function texFront (tex, pdoc) 479 | return table.insert(pdoc, 1, pandoc.RawInline("tex", tex)) 480 | end 481 | 482 | function texEnd (tex, pdoc) 483 | return table.insert(pdoc, pandoc.RawInline("tex", tex)) 484 | end 485 | 486 | -- this is not ideal. It is too complex to really get judgement layout to work 487 | function texSplitJudgement (line) 488 | local judge, text = splitJudgement(line) 489 | if judge ~= "" then 490 | if latexPackage == "expex" then 491 | judge = pandoc.utils.stringify(judge) 492 | texFront("\\ljudge{"..judge.."} ", text) 493 | else 494 | table.insert(text, 1, judge) 495 | end 496 | end 497 | return text 498 | end 499 | 500 | -- different kinds of examples: single line, interlinear, list 501 | function texMakeSingle (line) 502 | local example = texSplitJudgement(line) 503 | texFront("\n ", example) 504 | return example 505 | end 506 | 507 | function texMakeInterlinear (block, exID, label, level, formatOverride ) 508 | -- make one interlinear 509 | 510 | --check for local override of formatting 511 | local globalFormatGloss = formatGloss 512 | if formatOverride ~= nil then 513 | formatGloss = (formatOverride == "true") 514 | end 515 | 516 | -- the four lines are: header, source, gloss and trans(lation) 517 | local header = block[1] 518 | if level == 1 then label = "" end 519 | if latexPackage == "expex" then 520 | if #header > 1 then 521 | texFront(" "..label.."\n \\begingl\n \\glpreamble ", header) 522 | texEnd("//", header) 523 | else 524 | texFront("\n "..label.."\n \\begingl", header) 525 | end 526 | else 527 | --if level == 1 then 528 | -- texFront("\n ", header) 529 | --else 530 | texFront("\n "..label.." ", header) 531 | --end 532 | -- langsci-gb4e behaves here different from gb4e 533 | if latexPackage == "langsci-gb4e" then 534 | if #header > 1 then 535 | texEnd("\\\\", header) 536 | end 537 | end 538 | end 539 | 540 | local source = texSplitJudgement (block[2]) 541 | if formatGloss then 542 | for i=1,#source do 543 | if source[i].tag ~= "Space" then 544 | local string = pandoc.utils.stringify(source[i]) 545 | source[i] = pandoc.Emph(string) 546 | end 547 | end 548 | end 549 | -- add latex 550 | if latexPackage == "expex" then 551 | texFront("\n \\gla ", source) 552 | texEnd("//", source) 553 | else 554 | texFront("\n \\gll ", source) 555 | texEnd("\\\\", source) 556 | end 557 | 558 | 559 | local gloss = block[3] 560 | if formatGloss then 561 | local result = pandoc.List() 562 | for i=1,#gloss do 563 | local string = pandoc.utils.stringify(gloss[i]) 564 | result:extend(splitForSmallCaps(string)) 565 | end 566 | gloss = result 567 | end 568 | -- add latex 569 | if latexPackage == "expex" then 570 | texFront("\n \\glb ", gloss) 571 | texEnd("//", gloss) 572 | else 573 | texFront("\n ",gloss) 574 | texEnd("\\\\", gloss) 575 | end 576 | 577 | local trans = block[4] 578 | if formatGloss then 579 | if trans[1].tag == "Quoted" then 580 | trans = trans[1].content 581 | texFront("`", trans) 582 | texEnd("'", trans) 583 | end 584 | end 585 | -- add latex 586 | if latexPackage == "expex" then 587 | texFront("\n \\glft ", trans) 588 | texEnd("//\n \\endgl", trans) 589 | else 590 | texFront("\n \\glt ", trans) 591 | end 592 | 593 | -- return to global setting 594 | if formatOverride ~= nil then 595 | formatGloss = globalFormatGloss 596 | end 597 | 598 | -- combine for output 599 | local interlinear = header 600 | interlinear:extend(source) 601 | interlinear:extend(gloss) 602 | interlinear:extend(trans) 603 | return interlinear 604 | end 605 | 606 | function texMakeList (list, exID, formatOverride) 607 | local example = pandoc.List() 608 | local labeltwo = "" 609 | 610 | for i=1,#list do 611 | 612 | if latexPackage == "linguex" then 613 | if i == 1 then labeltwo = "\\a." else labeltwo = "\\b." end 614 | elseif latexPackage:match "gb4e" then 615 | if i == 1 then labeltwo = "\\ea" else labeltwo = "\\ex" end 616 | elseif latexPackage == "expex" then 617 | labeltwo = "\\a" 618 | end 619 | 620 | if list[i][1].tag ~= "LineBlock" then 621 | local line = texSplitJudgement( list[i][1].content ) 622 | texFront("\n "..labeltwo.." ", line) 623 | example:extend(line) 624 | 625 | elseif list[i][1].tag == "LineBlock" then 626 | local line = texMakeInterlinear(list[i][1].content, exID, labeltwo, 2, formatOverride) 627 | if latexPackage:match "gb4e" then 628 | texFront("\n", line) 629 | texEnd("\n", line) 630 | end 631 | example:extend(line) 632 | end 633 | end 634 | return example 635 | end 636 | 637 | function texMakeExample (data, exID, formatOverride) 638 | local example = pandoc.List() 639 | 640 | -- different labeling for tex packages 641 | local labelone = "" 642 | if latexPackage == "linguex" then labelone = "\\ex." 643 | elseif latexPackage == "expex" then labelone = "\\ex" 644 | elseif latexPackage:match "gb4e" then labelone = "\\ea" 645 | end 646 | 647 | if #data == 2 then 648 | -- assume first part is header 649 | example = data[1].content 650 | -- and then proceed with second part 651 | data = { data[2] } 652 | end 653 | 654 | if data[1].tag == "Para" then 655 | -- example beginning 656 | if #example > 0 then texEnd("\\\\", example) end 657 | if latexPackage == "expex" then 658 | texFront(labelone.." <"..exID.."> ", example) 659 | else 660 | texFront(labelone.." \\label{"..exID.."} ", example) 661 | end 662 | -- add one-line example 663 | local line = texMakeSingle(data[1].content) 664 | example:extend(line) 665 | -- example ending 666 | if latexPackage:match "gb4e" then 667 | texEnd("\n\\z", example) 668 | elseif latexPackage == "expex" then 669 | texEnd("\n\\xe", example) 670 | end 671 | 672 | elseif data[1].tag == "LineBlock" then 673 | -- example beginning 674 | if latexPackage == "expex" then 675 | texFront(labelone.." <"..exID.."> ", example) 676 | else 677 | texFront(labelone.." \\label{"..exID.."} ", example) 678 | end 679 | -- add interlinear 680 | local interlinear = texMakeInterlinear(data[1].content, exID, labelone, 1, formatOverride) 681 | example:extend(interlinear) 682 | -- example ending 683 | if latexPackage:match "gb4e" then 684 | texEnd("\n \\z", example) 685 | elseif latexPackage == "expex" then 686 | texEnd("\n\\xe", example) 687 | end 688 | 689 | elseif data[1].tag == "OrderedList" then 690 | -- example beginning 691 | if latexPackage == "expex" then 692 | texFront("\\pex <"..exID.."> ", example) 693 | else 694 | texFront(labelone.." \\label{"..exID.."} ", example) 695 | end 696 | -- add list of examples 697 | local list = texMakeList(data[1].content, exID, formatOverride) 698 | example:extend(list) 699 | -- example ending 700 | if latexPackage:match "gb4e" then 701 | texEnd("\n \\z", example) 702 | texEnd("\n\\z", example) 703 | elseif latexPackage == "expex" then 704 | texEnd("\n\\xe", example) 705 | end 706 | end 707 | 708 | return pandoc.Plain(example) 709 | end 710 | 711 | -------------------------- 712 | -- format example from div 713 | -------------------------- 714 | 715 | function makeExample (div) 716 | 717 | -- keep track of chapters (primary sections) 718 | if div.classes[1] == "section" then 719 | if div.attributes.number ~= nil and string.len(div.attributes.number) == 1 then 720 | chapter = chapter + 1 721 | counterInChapter = 0 722 | end 723 | end 724 | 725 | -- only do formatting for divs with class "ex" 726 | if div.classes[1] == "ex" then 727 | 728 | -- keep count of examples 729 | counter = counter + 1 730 | counterInChapter = counterInChapter + 1 731 | 732 | -- format the numbering 733 | local number = counter 734 | if addChapterNumber then 735 | number = chapter.."."..counterInChapter 736 | elseif restartAtChapter then 737 | number = counterInChapter 738 | end 739 | 740 | -- make identifier for example 741 | -- or keep user-provided identifier 742 | local exID = "" 743 | if div.identifier == "" then 744 | exID = "ling-ex:"..chapter.."."..counterInChapter 745 | else 746 | exID = div.identifier 747 | end 748 | 749 | -- keep global index of ids/numbers for crossreference 750 | indexEx[exID] = number 751 | 752 | -- check format override per example 753 | local formatOverride = div.attributes['formatGloss'] 754 | 755 | -- make different format for latex 756 | if FORMAT:match "latex" then 757 | return texMakeExample(div.content, exID, formatOverride) 758 | else 759 | local example = pandocMakeExample(div.content, number, formatOverride) 760 | -- add temporary Cite to resolve "Next"-type references in pandoc 761 | -- will be removed after cross-references are in place 762 | local tmpCite = pandoc.Cite({pandoc.Str("@Target")},{pandoc.Citation(exID,"NormalCitation")}) 763 | 764 | return { 765 | pandoc.Plain(tmpCite), 766 | pandoc.Div(example, pandoc.Attr(exID) ) 767 | } 768 | end 769 | end 770 | end 771 | 772 | ------------------------- 773 | -- format crossreferences 774 | ------------------------- 775 | 776 | function uniqueNextrefs (cite) 777 | 778 | -- to resolve "Next"-style references give them all an unique ID 779 | -- make indices to check in which order they occur 780 | local nameN = string.match(cite.content[1].text, "([N]+)ext") 781 | local nameL = string.match(cite.content[1].text, "([L]+)ast") 782 | local target = string.match(cite.content[1].text, "@Target") 783 | 784 | -- use random ID to make unique 785 | if nameN ~= nil or nameL ~= nil then 786 | cite.citations[1].id = tostring(math.random(99999)) 787 | end 788 | 789 | -- make indices 790 | if nameN ~= nil or nameL ~= nil or target ~= nil then 791 | orderInText = orderInText + 1 792 | indexRef[orderInText] = cite.citations[1].id 793 | rev_indexRef[cite.citations[1].id] = orderInText 794 | end 795 | 796 | return(cite) 797 | end 798 | 799 | function resolveNextrefs (cite) 800 | 801 | -- assume Next-style refs have numeric id (from uniqueNextrefs) 802 | -- assume Example-IDs are not numeric (user should not use them!) 803 | local id = cite.citations[1].id 804 | local order = rev_indexRef[id] 805 | 806 | local distN = 0 807 | local sequenceN = string.match(cite.content[1].text, "([N]+)ext") 808 | if sequenceN ~= nil then distN = string.len(sequenceN) end 809 | 810 | if distN > 0 then 811 | for i=order,#indexRef do 812 | if tonumber(indexRef[i]) == nil then 813 | distN = distN - 1 814 | if distN == 0 then 815 | cite.citations[1].id = indexRef[i] 816 | end 817 | end 818 | end 819 | end 820 | 821 | local distL = 0 822 | local sequenceL = string.match(cite.content[1].text, "([L]+)ast") 823 | if sequenceL ~= nil then distL= string.len(sequenceL) end 824 | 825 | if distL > 0 then 826 | for i=order,1,-1 do 827 | if tonumber(indexRef[i]) == nil then 828 | distL = distL - 1 829 | if distL == 0 then 830 | cite.citations[1].id = indexRef[i] 831 | end 832 | end 833 | end 834 | end 835 | 836 | return(cite) 837 | end 838 | 839 | function removeTmpTargetrefs (cite) 840 | -- remove temporary cites for resolving Next-style reference 841 | if cite.content[1].text == "@Target" then 842 | return pandoc.Plain({}) 843 | end 844 | end 845 | 846 | function makeCrossrefs (cite) 847 | 848 | local id = cite.citations[1].id 849 | local name = string.gsub(cite.content[1].text, "[%[%]@]", "") 850 | local suffix = "" 851 | local expexName = {Next = "nextx", NNext = "anextx", Last = "lastx", LLast = "blastx"} 852 | 853 | -- prevent Latex error when user sets xrefSuffixSep to space or nothing 854 | if FORMAT:match "latex" then 855 | if xrefSuffixSep == "" or xrefSuffixSep == " " or xrefSuffixSep == " " then 856 | xrefSuffixSep = "\\," 857 | end 858 | end 859 | 860 | -- only make suffix if there is something there 861 | if #cite.citations[1].suffix > 0 then 862 | suffix = pandoc.utils.stringify(cite.citations[1].suffix[2]) 863 | suffix = xrefSuffixSep..suffix 864 | end 865 | 866 | -- make the cross-references 867 | if FORMAT:match "latex" then 868 | if latexPackage == "expex" then 869 | if string.match("@Next@NNext@Last@LLast", name) ~= nil then 870 | return pandoc.RawInline("latex", "({\\"..expexName[name].."}"..suffix..")") 871 | elseif indexEx[id] ~= nil then 872 | -- ignore other "cite" elements 873 | return pandoc.RawInline("latex", "(\\getref{"..id.."}"..suffix..")") 874 | end 875 | else 876 | if string.match("@Next@NNext@Last@LLast", name) ~= nil then 877 | -- let latex handle these 878 | return pandoc.RawInline("latex", "({\\"..name.."}"..suffix..")") 879 | elseif indexEx[id] ~= nil then 880 | -- ignore other "cite" elements 881 | return pandoc.RawInline("latex", "(\\ref{"..id.."}"..suffix..")") 882 | end 883 | end 884 | elseif indexEx[id] ~= nil then 885 | -- ignore other "cite" elements 886 | return pandoc.Link("("..indexEx[id]..suffix..")", "#"..id) 887 | end 888 | 889 | end 890 | 891 | ------------------------------------------ 892 | -- Pandoc trick to cycle through documents 893 | ------------------------------------------ 894 | 895 | return { 896 | -- preparations 897 | { Pandoc = addSectionNumbering }, 898 | { Meta = getUserSettings }, 899 | { Meta = addFormatting }, 900 | -- formatting linguistic examples as tables 901 | { Div = makeExample }, 902 | -- three passes necessary to resolve NNext-style references 903 | { Cite = uniqueNextrefs }, 904 | { Cite = resolveNextrefs }, 905 | { Cite = removeTmpTargetrefs }, 906 | -- now finally all cross-references can be set 907 | { Cite = makeCrossrefs } 908 | } 909 | -------------------------------------------------------------------------------- /docs/processVerbatim.lua: -------------------------------------------------------------------------------- 1 | function addRealCopy (code) 2 | return { code, pandoc.RawBlock("markdown", code.text) } 3 | end 4 | 5 | return { 6 | { CodeBlock = addRealCopy } 7 | } 8 | -------------------------------------------------------------------------------- /docs/readme.docx: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/docs/readme.docx -------------------------------------------------------------------------------- /docs/readme.epub: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/docs/readme.epub -------------------------------------------------------------------------------- /docs/readme_expex.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/docs/readme_expex.pdf -------------------------------------------------------------------------------- /docs/readme_expex.tex: -------------------------------------------------------------------------------- 1 | % Options for packages loaded elsewhere 2 | \PassOptionsToPackage{unicode}{hyperref} 3 | \PassOptionsToPackage{hyphens}{url} 4 | \documentclass[ 5 | ]{article} 6 | \usepackage{xcolor} 7 | \usepackage{amsmath,amssymb} 8 | \setcounter{secnumdepth}{5} 9 | \usepackage{iftex} 10 | \ifPDFTeX 11 | \usepackage[T1]{fontenc} 12 | \usepackage[utf8]{inputenc} 13 | \usepackage{textcomp} % provide euro and other symbols 14 | \else % if luatex or xetex 15 | \usepackage{unicode-math} % this also loads fontspec 16 | \defaultfontfeatures{Scale=MatchLowercase} 17 | \defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1} 18 | \fi 19 | \usepackage{lmodern} 20 | \ifPDFTeX\else 21 | % xetex/luatex font selection 22 | \fi 23 | % Use upquote if available, for straight quotes in verbatim environments 24 | \IfFileExists{upquote.sty}{\usepackage{upquote}}{} 25 | \IfFileExists{microtype.sty}{% use microtype if available 26 | \usepackage[]{microtype} 27 | \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts 28 | }{} 29 | \makeatletter 30 | \@ifundefined{KOMAClassName}{% if non-KOMA class 31 | \IfFileExists{parskip.sty}{% 32 | \usepackage{parskip} 33 | }{% else 34 | \setlength{\parindent}{0pt} 35 | \setlength{\parskip}{6pt plus 2pt minus 1pt}} 36 | }{% if KOMA class 37 | \KOMAoptions{parskip=half}} 38 | \makeatother 39 | \usepackage{graphicx} 40 | \makeatletter 41 | \newsavebox\pandoc@box 42 | \newcommand*\pandocbounded[1]{% scales image to fit in text height/width 43 | \sbox\pandoc@box{#1}% 44 | \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% 45 | \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% 46 | \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both 47 | \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% 48 | \else\usebox{\pandoc@box}% 49 | \fi% 50 | } 51 | % Set default figure placement to htbp 52 | \def\fps@figure{htbp} 53 | \makeatother 54 | \setlength{\emergencystretch}{3em} % prevent overfull lines 55 | \providecommand{\tightlist}{% 56 | \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} 57 | \usepackage{expex} 58 | \lingset{ 59 | belowglpreambleskip = -1.5ex, 60 | aboveglftskip = -1.5ex, 61 | exskip = 0ex, 62 | interpartskip = -0.5ex, 63 | belowpreambleskip = -2ex 64 | } 65 | \usepackage{bookmark} 66 | \IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available 67 | \urlstyle{same} 68 | \hypersetup{ 69 | pdftitle={Using pandoc-ling}, 70 | pdfauthor={Michael Cysouw}, 71 | hidelinks, 72 | pdfcreator={LaTeX via pandoc}} 73 | 74 | \title{Using pandoc-ling} 75 | \author{Michael Cysouw} 76 | \date{} 77 | 78 | \begin{document} 79 | \maketitle 80 | 81 | { 82 | \setcounter{tocdepth}{3} 83 | \tableofcontents 84 | } 85 | \section{pandoc-ling}\label{pandoc-ling} 86 | 87 | \emph{Michael Cysouw} 88 | \textless{}\href{mailto:cysouw@mac.com}{\nolinkurl{cysouw@mac.com}}\textgreater{} 89 | 90 | A Pandoc filter for linguistic examples 91 | 92 | tl;dr 93 | 94 | \begin{itemize} 95 | \tightlist 96 | \item 97 | Easily write linguistic examples including basic interlinear glossing. 98 | \item 99 | Let numbering and cross-referencing be done for you. 100 | \item 101 | Export to (almost) any format of your wishes for final polishing. 102 | \item 103 | As an example, check out this readme in 104 | \href{https://cysouw.github.io/pandoc-ling/readme.html}{HTML} or 105 | \href{https://cysouw.github.io/pandoc-ling/readme_gb4e.pdf}{Latex}. 106 | \end{itemize} 107 | 108 | \section{Rationale}\label{rationale} 109 | 110 | In the field of linguistics there is an outspoken tradition to format 111 | example sentences in research papers in a very specific way. In the 112 | field, it is a perennial problem to get such example sentences to look 113 | just right. Within Latex, there are numerous packages to deal with this 114 | problem (e.g.~covington, linguex, gb4e, expex, etc.). Depending on your 115 | needs, there is some Latex solution for almost everyone. However, these 116 | solutions in Latex are often cumbersome to type, and they are not 117 | portable to other formats. Specifically, transfer between latex, html, 118 | docx, odt or epub would actually be highly desirable. Such transfer is 119 | the hallmark of \href{https://pandoc.org}{Pandoc}, a tool by John 120 | MacFarlane that provides conversion between these (and many more) 121 | formats. 122 | 123 | Any such conversion between text-formats naturally never works 124 | perfectly: every text-format has specific features that are not 125 | transferable to other formats. A central goal of Pandoc (at least in my 126 | interpretation) is to define a set of shared concepts for text-structure 127 | (a `common denominator' if you will, but surely not `least'!) that can 128 | then be mapped to other formats. In many ways, Pandoc tries (again) to 129 | define a set of logical concepts for text structure (`semantic markup'), 130 | which can then be formatted by your favourite typesetter. As long as you 131 | stay inside the realm of this `common denominator' (in practice that 132 | means Pandoc's extended version of Markdown/CommonMark), conversion 133 | works reasonably well (think 90\%-plus). 134 | 135 | Building on John Gruber's 136 | \href{https://daringfireball.net/projects/markdown/syntax}{Markdown 137 | philosophy}, there is a strong urge here to learn to restrain oneself 138 | while writing, and try to restrict the number of layout-possibilities to 139 | a minimum. In this sense, with \texttt{pandoc-ling} I propose a 140 | Markdown-structure for linguistic examples that is simple, easy to type, 141 | easy to read, and portable through the Pandoc universe by way of an 142 | extension mechanism of Pandoc, called a `Pandoc Lua Filter'. This 143 | extension will not magically allow you to write every linguistic example 144 | thinkable, but my guess is that in practice the present proposal covers 145 | the majority of situations in linguistic publications (think 90\%-plus). 146 | As an example (and test case) I have included automatic conversions into 147 | various formats in this repository (chech them out in the directory 148 | \texttt{tests} to get an idea of the strengths and weaknesses of the 149 | current implementation). 150 | 151 | \section{The basic structure of a linguistic 152 | example}\label{the-basic-structure-of-a-linguistic-example} 153 | 154 | Basically, a linguistic example consists of 6 possible building blocks, 155 | of which only the number and at least one example line are necessary. 156 | The space between the building blocks is kept as minimal as possible 157 | without becoming cramped. When (optional) building blocks are not 158 | included, then the other blocks shift left and up (only exception: a 159 | preamble without labels is not shifted left completely, but left-aligned 160 | with the example, not with the judgement). 161 | 162 | \begin{itemize} 163 | \tightlist 164 | \item 165 | \textbf{Number}: Running tally of all examples in the work, possibly 166 | restarting at chapters or other major headings. Typically between 167 | round brackets, possibly with a chapter number added before in long 168 | works, e.g.~example (7.26). Aligned top-left, typically left-aligned 169 | to main text margin. 170 | \item 171 | \textbf{Preamble}: Optional information about the content/kind of 172 | example. Aligned top-left: to the top with the number, to the left 173 | with the (optional) label. When there is no label, then preamble is 174 | aligned with the example, not with the judgment. 175 | \item 176 | \textbf{Label}: Indices for sub-examples. Only present when there are 177 | more than one example grouped together inside one numbered entity. 178 | Typically these sub-example labels use latin letters followed by a 179 | full stop. They are left-aligned with the preamble, and each label is 180 | top-aligned with the top-line of the corresponding example (important 181 | for longer line-wrapped examples). 182 | \item 183 | \textbf{Judgment}: Examples can optionally have grammaticality 184 | judgments, typically symbols like **?!* sometimes in superscript 185 | relative to the corresponding example. judgements are right-aligned to 186 | each other, typically with only minimal space to the left-aligned 187 | examples. 188 | \item 189 | \textbf{Line example}: A minimal linguistic example has at least one 190 | line example, i.e.~an utterance of interest. Building blocks in 191 | general shift left and up when other (optional) building blocks are 192 | not present. Minimally, this results in a number with one line 193 | example. 194 | \item 195 | \textbf{Interlinear example}: A complex structure typically used for 196 | examples from languages unknown to most readers. Consist of three or 197 | four lines that are left-aligned: 198 | 199 | \begin{itemize} 200 | \tightlist 201 | \item 202 | \textbf{Header}: An optional header is typically used to display 203 | information about the language of the example, including literature 204 | references. When not present, then all other lines from the 205 | interlinear example shift upwards. 206 | \item 207 | \textbf{Source}: The actual language utterance, often typeset in 208 | italics. This line is internally separated at spaces, and each 209 | sub-block is left-aligned with the corresponding sub-blocks of the 210 | gloss. 211 | \item 212 | \textbf{Gloss}: Explanation of the meaning of the source, often 213 | using abbreviations in small caps. This line is internally separated 214 | at spaces, and each block is left-aligned with the block from 215 | source. 216 | \item 217 | \textbf{Translation}: Free translation of the source, typically 218 | quoted. Not separated in blocks, but freely extending to the right. 219 | Left-aligned with the other lines from the interlinear example. 220 | \end{itemize} 221 | \end{itemize} 222 | 223 | \begin{figure} 224 | \centering 225 | \pandocbounded{\includegraphics[keepaspectratio,alt={The structure of a linguistic example.}]{figure/ExampleStructure.png}} 226 | \caption{The structure of a linguistic example.} 227 | \end{figure} 228 | 229 | There are of course much more possibilities to extend the structure of a 230 | linguistic examples, like third or fourth subdivisions of labels (often 231 | using small roman numerals as a third level) or multiple glossing lines 232 | in the interlinear example. Also, the content of the header is sometimes 233 | found right-aligned to the right of the interlinear example (language 234 | into to the top, reference to the bottom). All such options are 235 | currently not supported by \texttt{pandoc-ling}. 236 | 237 | Under the hood, this structure is prepared by \texttt{pandoc-ling} as a 238 | table. Tables are reasonably well transcoded to different document 239 | formats. Specific layout considerations mostly have to be set manually. 240 | Alignment of the text should work in most exports. Some \texttt{CSS} 241 | styling is proposed by \texttt{pandoc-ling}, but can of course be 242 | overruled. For latex (and beamer) special output is prepared using 243 | various available latex packages (see options, below). 244 | 245 | \section{\texorpdfstring{Introducing 246 | \texttt{pandoc-ling}}{Introducing pandoc-ling}}\label{introducing-pandoc-ling} 247 | 248 | \subsection{Editing linguistic 249 | examples}\label{editing-linguistic-examples} 250 | 251 | To include a linguistic example in Markdown \texttt{pandoc-ling} uses 252 | the \texttt{div} structure, which is indicated in Pandoc-Markdown by 253 | typing three colons at the start and three colons at the end. To 254 | indicate the \texttt{class} of this \texttt{div} the letters `ex' (for 255 | `example') should be added after the top colons (with or without space 256 | in between). This `ex'-class is the signal for \texttt{pandoc-ling} to 257 | start processing such a \texttt{div}. The numbering of these examples 258 | will be inserted by \texttt{pandoc-ling}. 259 | 260 | Empty lines can be added inside the \texttt{div} for visual pleasure, as 261 | they mostly do not have an influence on the output. Exception: do 262 | \emph{not} use empty lines between unlabelled line examples. Multiple 263 | lines of text can be used (without empty lines in between), but they 264 | will simply be interpreted as one sequential paragraph. 265 | 266 | \begin{verbatim} 267 | ::: ex 268 | This is the most basic structure of a linguistic example. 269 | ::: 270 | \end{verbatim} 271 | 272 | \begin{samepage} 273 | \ex 274 | This is the most basic structure of a linguistic example. 275 | \xe 276 | \end{samepage} 277 | 278 | Alternatively, the \texttt{class} can be put in curled brackets (and 279 | then a leading full stop is necessary before \texttt{ex}). Inside these 280 | brackets more attributes can be added (separated by space), for example 281 | an id, using a hash, or any attribute=value pairs that should apply to 282 | this example. Currently there is only one real attribute implemented 283 | (\texttt{formatGloss}), but in principle it is possible to add more 284 | attributes that can be used to fine-tune the typesetting of the example 285 | (see below for a description of such \texttt{local\ options}). 286 | 287 | \begin{verbatim} 288 | ::: {#id .ex formatGloss=false} 289 | 290 | This is a multi-line example. 291 | But that does not mean anything for the result 292 | All these lines are simply treated as one paragraph. 293 | They will become one example with one number. 294 | 295 | ::: 296 | \end{verbatim} 297 | 298 | \begin{samepage} 299 | \ex 300 | This is a multi-line example. But that does not mean anything for the 301 | result All these lines are simply treated as one paragraph. They will 302 | become one example with one number. 303 | \xe 304 | \end{samepage} 305 | 306 | A preamble can be added by inserting an empty line between preamble and 307 | example. The same considerations about multiple text-lines apply. 308 | 309 | \begin{verbatim} 310 | :::ex 311 | Preamble 312 | 313 | This is an example with a preamble. 314 | ::: 315 | \end{verbatim} 316 | 317 | \begin{samepage} 318 | \ex Preamble\\* 319 | This is an example with a preamble. 320 | \xe 321 | \end{samepage} 322 | 323 | Sub-examples with labels are entered by starting each sub-example with a 324 | small latin letter and a full stop. Empty lines between labels are 325 | allowed. Subsequent lines without labels are treated as one paragraph. 326 | Empty lines \emph{not} followed by a label with a full stop will result 327 | in errors. 328 | 329 | \begin{verbatim} 330 | :::ex 331 | a. This is the first example. 332 | b. This is the second. 333 | a. The actual letters are not important, `pandoc-ling` will put them in order. 334 | 335 | e. Empty lines are allowed between labelled lines 336 | Subsequent lines are again treated as one sequential paragraph. 337 | ::: 338 | \end{verbatim} 339 | 340 | \begin{samepage} 341 | \pex[*=] 342 | \a This is the first example. 343 | \a This is the second. 344 | \a The actual letters are not important, \texttt{pandoc-ling} will put 345 | them in order. 346 | \a Empty lines are allowed between labelled lines Subsequent lines are 347 | again treated as one sequential paragraph. 348 | \xe 349 | \end{samepage} 350 | 351 | A labelled list can be combined with a preamble. 352 | 353 | \begin{verbatim} 354 | :::ex 355 | Any nice description here 356 | 357 | a. one example sentence. 358 | b. two 359 | c. three 360 | ::: 361 | \end{verbatim} 362 | 363 | \begin{samepage} 364 | \pex[*=] Any nice description here\\* 365 | \a one example sentence. 366 | \a two 367 | \a three 368 | \xe 369 | \end{samepage} 370 | 371 | Grammaticality judgements should be added before an example, and after 372 | an optional label, separated from both by spaces (though four spaces in 373 | a row should be avoided, that could lead to layout errors). To indicate 374 | that any sequence of symbols is a judgements, prepend the judgement with 375 | a caret \texttt{\^{}}. Alignment will be figured out by 376 | \texttt{pandoc-ling}. 377 | 378 | \begin{verbatim} 379 | :::ex 380 | Throwing in a preamble for good measure 381 | 382 | a. ^* This traditionally signals ungrammaticality. 383 | b. ^? Question-marks indicate questionable grammaticality. 384 | c. ^^whynot?^ But in principle any sequence can be used (here even in superscript). 385 | d. However, such long sequences sometimes lead to undesirable effects in the layout. 386 | ::: 387 | \end{verbatim} 388 | 389 | \begin{samepage} 390 | \pex[*=whynot?] Throwing in a preamble for good measure\\* 391 | \a \ljudge{*}This traditionally signals ungrammaticality. 392 | \a \ljudge{?}Question-marks indicate questionable grammaticality. 393 | \a \ljudge{\textsuperscript{whynot?}}But in principle any sequence can 394 | be used (here even in superscript). 395 | \a However, such long sequences sometimes lead to undesirable effects 396 | in the layout. 397 | \xe 398 | \end{samepage} 399 | 400 | A minor detail is the alignment of a single example with a preamble and 401 | grammaticality judgements. In this case it looks better for the preamble 402 | to be left aligned with the example and not with the judgement. 403 | 404 | \begin{verbatim} 405 | :::ex 406 | Here is a special case with a preamble 407 | 408 | ^^???^ With a singly questionably example. 409 | Note the alignment! Especially with this very long example 410 | that should go over various lines in the output. 411 | ::: 412 | \end{verbatim} 413 | 414 | \begin{samepage} 415 | \ex Here is a special case with a preamble\\* 416 | 417 | \judge{\textsuperscript{???}} With a singly questionably example. Note 418 | the alignment! Especially with this very long example that should go 419 | over various lines in the output. 420 | \xe 421 | \end{samepage} 422 | 423 | For the lazy writers among us, it is also possible to use a simple 424 | bullet list instead of a labelled list. Note that the listed elements 425 | will still be formatted as a labelled list. 426 | 427 | \begin{verbatim} 428 | :::ex 429 | - This is a lazy example. 430 | - ^# It should return letters at the start just as before. 431 | - ^% Also testing some unusual judgements. 432 | ::: 433 | \end{verbatim} 434 | 435 | \begin{samepage} 436 | \pex[*=\#] 437 | \a This is a lazy example. 438 | \a \ljudge{\#}It should return letters at the start just as before. 439 | \a \ljudge{\%}Also testing some unusual judgements. 440 | \xe 441 | \end{samepage} 442 | 443 | Just for testing: a single example with a judgement (which resulted in 444 | an error in earlier versions). 445 | 446 | \begin{verbatim} 447 | ::: ex 448 | ^* This traditionally signals ungrammaticality. 449 | ::: 450 | \end{verbatim} 451 | 452 | \begin{samepage} 453 | \ex 454 | 455 | \judge{*} This traditionally signals ungrammaticality. 456 | \xe 457 | \end{samepage} 458 | 459 | \subsection{Interlinear examples}\label{interlinear-examples} 460 | 461 | For interlinear examples with aligned source and gloss, the structure of 462 | a \texttt{lineblock} is used, starting the lines with a vertical line 463 | \texttt{\textbar{}}. There should always be four vertical lines (for 464 | header, source, gloss and translation, respectively), although the 465 | content after the first vertical line can be empty. The source and gloss 466 | lines are separated at spaces, and all parts are right-aligned. If you 467 | want to have a space that is not separated, you will have to `protect' 468 | the space, either by putting a backslash before the space, or by 469 | inserting a non-breaking space instead of a normal space (either type 470 | \texttt{\ } or insert an actual non-breaking space, i.e.~unicode 471 | character \texttt{U+00A0}). 472 | 473 | \begin{verbatim} 474 | :::ex 475 | | Dutch (Germanic) 476 | | Deze zin is in het nederlands. 477 | | DEM sentence AUX in DET dutch. 478 | | This sentence is dutch. 479 | ::: 480 | \end{verbatim} 481 | 482 | \begin{samepage} 483 | \ex[*=] 484 | \begingl 485 | \glpreamble Dutch (Germanic)// 486 | \gla Deze zin is in het nederlands. // 487 | \glb DEM sentence AUX in DET dutch. // 488 | \glft This sentence is dutch.// 489 | \endgl 490 | \xe 491 | \end{samepage} 492 | 493 | An attempt is made to format interlinear examples when the option 494 | \texttt{formatGloss=true} is added. This will: 495 | 496 | \begin{itemize} 497 | \tightlist 498 | \item 499 | remove formatting from the source and set everything in italics, 500 | \item 501 | remove formatting from the gloss and set sequences (\textgreater1) of 502 | capitals and numbers into small caps (note that the positioning of 503 | small caps on web pages is 504 | \href{https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align}{highly 505 | complex}), 506 | \item 507 | a tilde \texttt{\textasciitilde{}} between spaces in the gloss is 508 | treated as a shortcut for an empty gloss (internally, the sequence 509 | \texttt{space-tilde-space} is replaced by 510 | \texttt{space-space-nonBreakingSpace-space-space}), 511 | \item 512 | consistently put translations in single quotes, possibly removing 513 | other quotes. 514 | \end{itemize} 515 | 516 | \begin{verbatim} 517 | ::: {.ex formatGloss=true} 518 | | Dutch (Germanic) 519 | | Is deze zin in het nederlands ? 520 | | AUX DEM sentence in DET dutch Q 521 | | Is this sentence dutch? 522 | ::: 523 | \end{verbatim} 524 | 525 | \begin{samepage} 526 | \ex[*=] 527 | \begingl 528 | \glpreamble Dutch (Germanic)// 529 | \gla \emph{Is} \emph{deze} \emph{zin} \emph{in} \emph{het} 530 | \emph{nederlands} \emph{?} // 531 | \glb \textsc{aux} \textsc{dem} sentence in \textsc{det} dutch 532 | \textsc{q} // 533 | \glft `Is this sentence dutch?'// 534 | \endgl 535 | \xe 536 | \end{samepage} 537 | 538 | The results of such formatting will not always work, but it seems to be 539 | quite robust in my testing. The next example brings everything together: 540 | 541 | \begin{itemize} 542 | \tightlist 543 | \item 544 | a preamble, 545 | \item 546 | labels, both for single lines and for interlinear examples, 547 | \item 548 | interlinear examples start on a new line immediately after the 549 | letter-label, 550 | \item 551 | grammaticality judgements with proper alignment, 552 | \item 553 | when the header of an interlinear example is left out, everything is 554 | shifted up, 555 | \item 556 | The formatting of the interlinear is harmonised. 557 | \end{itemize} 558 | 559 | \begin{verbatim} 560 | ::: {.ex formatGloss=true samePage=false} 561 | Completely superfluous preamble, but it works ... 562 | 563 | a. 564 | | Dutch (Germanic) Note the grammaticality judgement! 565 | | ^^:–)^ Deze zin is (dit\ is test) nederlands. 566 | | DEM sentence AUX ~ dutch. 567 | | This sentence is dutch. 568 | 569 | b. 570 | | 571 | | Deze tweede zin heeft geen header. 572 | | DEM second sentence have.3SG.PRES no header. 573 | | This second sentence does not have a header. 574 | 575 | a. Mixing single line examples with interlinear examples. 576 | a. This is of course highly unusal. 577 | Just for this example, let's add some extra material in this example. 578 | ::: 579 | \end{verbatim} 580 | 581 | \pex[*=:–)] Completely superfluous preamble, but it works 582 | \ldots{}\\* 583 | \a 584 | \begingl 585 | \glpreamble Dutch (Germanic) Note the grammaticality judgement!// 586 | \gla \ljudge{\textsuperscript{:--)}}\emph{Deze} \emph{zin} \emph{is} 587 | \emph{(dit~is~test)} \emph{nederlands.} // 588 | \glb \textsc{dem} sentence \textsc{aux} ~ dutch. // 589 | \glft `This sentence is dutch.'// 590 | \endgl 591 | \a 592 | \begingl 593 | \gla \emph{Deze} \emph{tweede} \emph{zin} \emph{heeft} \emph{geen} 594 | \emph{header.} // 595 | \glb \textsc{dem} second sentence have.\textsc{3sg}.\textsc{pres} no 596 | header. // 597 | \glft `This second sentence does not have a header.'// 598 | \endgl 599 | \a Mixing single line examples with interlinear examples. 600 | \a This is of course highly unusal. Just for this example, let's add 601 | some extra material in this example. 602 | \xe 603 | 604 | Also, as a quick workaround for showing multiple source lines without 605 | alignment with the glossing (e.g.~for phonetic or orthographic 606 | representations of the example), it is possible to use the header of 607 | interlinear example. For a line break in the header, use the double 608 | backslash \texttt{\textbackslash{}\textbackslash{}}, either inline or at 609 | the end of a line. When you type a header using multiple lines (as shown 610 | below), then subsequent lines have to start with space. For now, this 611 | only works in the header line. 612 | 613 | \begin{verbatim} 614 | ::: ex 615 | | Example with an multiline header \\ 616 | *can be used for orthographic representations*, \\ 617 | or phonetic transcription, \\ or for whatever you like 618 | | Dit is een lui voorbeeld=je 619 | | DEM COP DET lazy example=DIM 620 | | This is a lazy example. 621 | ::: 622 | \end{verbatim} 623 | 624 | \begin{samepage} 625 | \ex[*=] 626 | \begingl 627 | \glpreamble Example with an multiline header \\ 628 | \emph{can be used for orthographic representations}, \\ 629 | or phonetic transcription, \\ 630 | or for whatever you like// 631 | \gla Dit is een lui voorbeeld=je // 632 | \glb DEM COP DET lazy example=DIM // 633 | \glft This is a lazy example.// 634 | \endgl 635 | \xe 636 | \end{samepage} 637 | 638 | \subsection{Cross-referencing 639 | examples}\label{cross-referencing-examples} 640 | 641 | The examples are automatically numbered by \texttt{pandoc-ling}. 642 | Cross-references to examples inside a document can be made by using the 643 | \texttt{{[}@ID{]}} format (used by Pandoc for citations). When an 644 | example has an explicit identifier (like \texttt{\#test} in the next 645 | example), then a reference can be made to this example with 646 | \texttt{{[}@test{]}}, leading to (\getref{test}) when formatted (note 647 | that the formatting does not work on the github website. Please check 648 | the `docs' subdirectory). 649 | 650 | \begin{verbatim} 651 | ::: {#test .ex} 652 | This is a test 653 | ::: 654 | \end{verbatim} 655 | 656 | \begin{samepage} 657 | \ex 658 | This is a test 659 | \xe 660 | \end{samepage} 661 | 662 | Inspired by the \texttt{linguex}-approach, you can also use the keywords 663 | \texttt{next} or \texttt{last} to refer to the next or the last example, 664 | e.g.~\texttt{{[}@last{]}} will be formatted as (\getref{test}). By 665 | doubling the first letters to \texttt{nnext} or \texttt{llast} reference 666 | to the next/last-but-one can be made. Actually, the number of starting 667 | letters can be repeated at will in \texttt{pandoc-ling}, so something 668 | like \texttt{{[}@llllllllast{]}} will also work. It will be formatted as 669 | (\getref{ex7}) after the processing of \texttt{pandoc-ling}. Needless to 670 | say that in such a situation an explicit identifier would be a better 671 | choice. 672 | 673 | Referring to sub-examples can be done by manually adding a suffix into 674 | the cross reference, simply separated from the identifier by a space. 675 | For example, \texttt{{[}@lllast~c{]}} will refer to the third 676 | sub-example of the last-but-two example. Formatted this will look like 677 | this: (\getref{ex13}\,c), smile! However, note that the ``c'' has to be 678 | manually determined. It is simply a literal suffix that will be copied 679 | into the cross-reference. Something like \texttt{{[}@last\ hA1l0{]}} 680 | will work also, leading to (\getref{test}\,hA1l0) when formatted (which 681 | is of course nonsensical). 682 | 683 | For exports that include attributes (like html), the examples have an 684 | explicit id of the form \texttt{exNUMBER} in which \texttt{NUMBER} is 685 | the actual number as given in the formatted output. This means that it 686 | is possible to refer to an example on any web-page by using the 687 | hash-mechanism to refer to a part of the web-page. For example 688 | \texttt{\#ex4.7} at can be used to refer to the seventh example in the 689 | html-output of this readme (try 690 | \href{https://cysouw.github.io/pandoc-ling/readme.html\#ex4.7}{this 691 | link}). The id in this example has a chapter number `4' because in the 692 | html conversion I have set the option \texttt{addChapterNumber} to 693 | \texttt{true}. (Note: when numbers restart the count in each chapter 694 | with the option \texttt{restartAtChapter}, then the id is of the form 695 | \texttt{exCHAPTER.NUMBER}. This is necessary to resolve clashing ids, as 696 | the same number might then be used in different chapters.) 697 | 698 | I propose to use these ids also to refer to examples in citations when 699 | writing scholarly papers, e.g.~(Cysouw 2021: \#ex7), independent of 700 | whether the links actually resolve. In principle, such citations could 701 | easily be resolved when online publications are properly prepared. The 702 | same proposal could also work for other parts of research papers, for 703 | example using tags like \texttt{\#sec,\ \#fig,\ \#tab,\ \#eq} (see the 704 | Pandoc filter 705 | \href{https://github.com/cysouw/crossref-adapt}{\texttt{crossref-adapt}}). 706 | To refer to paragraphs (which should replace page numbers in a future of 707 | adaptive design), I propose to use no tag, but directly add the number 708 | to the hash (see the Pandoc filter 709 | \href{https://github.com/cysouw/count-para}{\texttt{count-para}} for a 710 | practical mechanism to add such numbering). 711 | 712 | \subsection{\texorpdfstring{Options of 713 | \texttt{pandoc-ling}}{Options of pandoc-ling}}\label{options-of-pandoc-ling} 714 | 715 | \subsubsection{Global options}\label{global-options} 716 | 717 | The following global options are available with \texttt{pandoc-ling}. 718 | These can be added to the 719 | \href{https://pandoc.org/MANUAL.html\#metadata-blocks}{Pandoc metadata}. 720 | An example of such metadata can be found at the bottom of this 721 | \texttt{readme} in the form of a YAML-block. Pandoc allows for various 722 | methods to provide metadata (see the link above). 723 | 724 | \begin{itemize} 725 | \tightlist 726 | \item 727 | \textbf{\texttt{formatGloss}} (boolean, default \texttt{false}): 728 | should all interlinear examples be consistently formatted? If you use 729 | this option, you can simply use capital letters for abbreviations in 730 | the gloss, and they will be changed to small caps. The source line is 731 | set to italics, and the translations is put into single quotes. 732 | \item 733 | \textbf{\texttt{samePage}} (boolean, default \texttt{true}, only for 734 | Latex): should examples be kept together on the same page? Can also be 735 | overriden for individual examples by adding 736 | \texttt{\{.ex\ samePage=false\}} at the start of an example (cf.~below 737 | on \texttt{local\ options}). 738 | \item 739 | \textbf{\texttt{xrefSuffixSep}} (string, defaults to no-break-space): 740 | When cross references have a suffix, how should the separator be 741 | formatted? The defaults `no-break-space' is a safe options. I 742 | personally like a `narrow no-break space' better (Unicode 743 | \texttt{U+202F}), but this symbol does not work with all fonts, and 744 | might thus lead to errors. For Latex typesetting, all space-like 745 | symbols are converted to a Latex thin space 746 | \texttt{\textbackslash{},}. 747 | \item 748 | \textbf{\texttt{restartAtChapter}} (boolean, default \texttt{false}): 749 | should the counting restart for each chapter? 750 | 751 | \begin{itemize} 752 | \tightlist 753 | \item 754 | Actually, when \texttt{true} this setting will restart the counting 755 | at the highest heading level, which for various output formats can 756 | be set by the Pandoc option \texttt{top-level-division}. 757 | \item 758 | The id of each example will now be of the form 759 | \texttt{exCHAPTER.NUMBER} to resolve any clashes when the same 760 | number appears in different chapter. 761 | \item 762 | Depending on your Latex setup, an explicit entry 763 | \texttt{top-level-division:\ chapter} might be necessary in your 764 | metadata. 765 | \end{itemize} 766 | \item 767 | \textbf{\texttt{addChapterNumber}} (boolean, default \texttt{false}): 768 | should the chapter (= highest heading level) number be added to the 769 | number of the example? When setting this to \texttt{true} any setting 770 | of \texttt{restartAtChapter} will be ignored. In most Latex situations 771 | this only works in combination with a \texttt{documentclass:\ book}. 772 | \item 773 | \textbf{\texttt{latexPackage}} (one of: \texttt{linguex}, 774 | \texttt{gb4e}, \texttt{langsci-gb4e}, \texttt{expex}, default 775 | \texttt{linguex}): Various options for converting examples to Latex 776 | packages that typeset linguistic examples. None of the conversions 777 | works perfectly, though in should work in most normal situations 778 | (think 90\%-plus). It might be necessary to first convert to 779 | \texttt{Latex}, correct the output, and then typeset separately with a 780 | latex compiler like \texttt{xelatex}. Using the direct option insider 781 | Pandoc might also work in many situations. Export to 782 | \textbf{\texttt{beamer}} seems to work reasonably well with the 783 | \texttt{gb4e} package. All others have artefacts or errors. 784 | \end{itemize} 785 | 786 | \subsubsection{Local options}\label{local-options} 787 | 788 | Local options are options that can be set for each individual example. 789 | The \texttt{formatGloss} option can be used to have an individual 790 | example be formatted differently from the global setting. For example, 791 | when the global setting is \texttt{formatGloss:\ true} in the metadata, 792 | then adding \texttt{formatGloss=false} in the curly brackets of a 793 | specific example will block the formatting. This is especially useful 794 | when the automatic formatting does not give the desired result. 795 | 796 | If you want to add something else (not a linguistic example) in a 797 | numbered example, then there is the local option \texttt{noFormat=true}. 798 | An attempt will be made to try and do a reasonable layout. Multiple 799 | paragraphs will simply we taken as is, and the number will be put in 800 | front. In HTML the number will be centred. It is usable for an 801 | incidental mathematical formula. 802 | 803 | \begin{verbatim} 804 | ::: {.ex noFormat=true} 805 | $$\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}$$ 806 | ::: 807 | \end{verbatim} 808 | 809 | \begin{samepage} 810 | \ex 811 | \[\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}\]\\ 812 | 813 | \xe 814 | \end{samepage} 815 | 816 | \subsection{\texorpdfstring{Issues with 817 | \texttt{pandoc-ling}}{Issues with pandoc-ling}}\label{issues-with-pandoc-ling} 818 | 819 | \begin{itemize} 820 | \tightlist 821 | \item 822 | Manually provided identifiers for examples should not be purely 823 | numerical (so do not use e.g.~\texttt{\#5789}). In some situation this 824 | interferes with the setting of the cross-references. 825 | \item 826 | Because the cross-references use the same structure as citations in 827 | Pandoc, the processing of citations (by \texttt{citeproc}) should be 828 | performed \textbf{after} the processing by \texttt{pandoc-ling}. 829 | Another Pandoc filter, 830 | \href{https://github.com/lierdakil/pandoc-crossref}{\texttt{pandoc-crossref}}, 831 | for numbering figures and other captions, also uses the same system. 832 | There seems to be no conflict between \texttt{pandoc-ling} and 833 | \texttt{pandoc-crossref}. 834 | \item 835 | Interlinear examples will will not wrap at the end of the page. There 836 | is no solution yet for longer examples that are longer than the size 837 | of the page. 838 | \item 839 | It is not (yet) possible to have more than one glossing line. 840 | \item 841 | When exporting to \texttt{docx} there is a problem because there are 842 | paragraphs inserted after tables, which adds space in lists with 843 | multiple interlinear examples (except when they have exactly the same 844 | number of columns). This is 845 | \href{https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2013_release/how-to-remove-extra-paragraph-after-table/995b3811-9f55-4df1-bbbc-9f672b1ad262}{by 846 | design}. The official solution is to set font-size to 1 for this 847 | paragraph inside MS Word. 848 | \item 849 | Multi-column cells are crucial for \texttt{pandoc-ling} to work 850 | properly. These are only introduced in new table format with Pandoc 851 | 2.10 (so older Pandoc version are not supported). Also note that these 852 | structures are not yet exported to all formats, e.g.~it will not be 853 | displayed correctly in \texttt{docx}. However, this is currently an 854 | area of active development 855 | \item 856 | \texttt{langsci-gb4e} is only available as part of the 857 | \href{https://ctan.org/pkg/langsci?lang=en}{\texttt{langsci} package}. 858 | You have to make it available to Pandoc, e.g.~by adding it into the 859 | same directory as the pandoc-ling.lua filter. I have added a recent 860 | version of \texttt{langsci-gb4e} here for convenience, but this one 861 | might be outdated at some time in the future. 862 | \item 863 | \texttt{beamer} output seems to work best with 864 | \texttt{latexPackage:\ gb4e}. 865 | \end{itemize} 866 | 867 | \subsection{A note on Latex 868 | conversion}\label{a-note-on-latex-conversion} 869 | 870 | Originally, I decided to write this filter as a two-pronged conversion, 871 | making a markdown version myself, but using a mapping to one of the many 872 | latex libraries for linguistics examples as a quick fix. I assumed that 873 | such a mapping would be the easy part. However, it turned out that the 874 | mapping to latex was much more difficult that I anticipated. Basically, 875 | it turned out that the `common denominator' that I was aiming for was 876 | not necessarily the `common denominator' provided by the latex packages. 877 | I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and 878 | expex) with growing dismay. This approach resulted in a first version. 879 | However, after this version was (more or less) finished, I realised that 880 | it would be better to first define the `common denominator' more clearly 881 | (as done here), and then implement this purely in Pandoc. From that 882 | basis I have then made attempts to map them to the various latex 883 | packages. 884 | 885 | \subsection{A note on implementation}\label{a-note-on-implementation} 886 | 887 | The basic structure of the examples are transformed into Pandoc tables. 888 | Tables are reasonably safe for converting in other formats. Care has 889 | been taken to add \texttt{classes} to all elements of the tables 890 | (e.g.~the preamble has the class \texttt{linguistic-example-preamble}). 891 | When exported formats are aware of these classes, they can be used to 892 | fine-tune the formatting. I have used a few such fine-tunings into the 893 | html output of this filter by adding a few CSS-style statements. The 894 | naming of the classes is quite transparent, using the form 895 | \texttt{linguistic-example-STRUCTURE}. 896 | 897 | The whole table is encapsulated in a \texttt{div} with class \texttt{ex} 898 | and an id of the form \texttt{exNUMBER}. This means that an example can 899 | be directly referred to in web-links by using the hash-mechanism. For 900 | example, adding \texttt{\#ex3} to the end of a link will immediately 901 | jump to this example in a browser. 902 | 903 | The current implementation is completely independent from the 904 | \href{https://pandoc.org/MANUAL.html\#numbered-example-lists}{Pandoc 905 | numbered examples implementation} and both can work side by side, like 906 | (2): 907 | 908 | \begin{enumerate} 909 | \def\labelenumi{(\arabic{enumi})} 910 | \item 911 | These are native Pandoc numbered examples 912 | \item 913 | They are independent of \texttt{pandoc-ling} but use the same output 914 | formatting in many default exports, like latex. 915 | \end{enumerate} 916 | 917 | However, in practice various output-formats of Pandoc (e.g.~latex) also 918 | use numbers in round brackets for these, so in practice it might be 919 | confusing to combine both. 920 | 921 | \end{document} 922 | -------------------------------------------------------------------------------- /docs/readme_gb4e.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/docs/readme_gb4e.pdf -------------------------------------------------------------------------------- /docs/readme_langsci-gb4e.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/docs/readme_langsci-gb4e.pdf -------------------------------------------------------------------------------- /docs/readme_langsci-gb4e.tex: -------------------------------------------------------------------------------- 1 | % Options for packages loaded elsewhere 2 | \PassOptionsToPackage{unicode}{hyperref} 3 | \PassOptionsToPackage{hyphens}{url} 4 | \documentclass[ 5 | ]{article} 6 | \usepackage{xcolor} 7 | \usepackage{amsmath,amssymb} 8 | \setcounter{secnumdepth}{5} 9 | \usepackage{iftex} 10 | \ifPDFTeX 11 | \usepackage[T1]{fontenc} 12 | \usepackage[utf8]{inputenc} 13 | \usepackage{textcomp} % provide euro and other symbols 14 | \else % if luatex or xetex 15 | \usepackage{unicode-math} % this also loads fontspec 16 | \defaultfontfeatures{Scale=MatchLowercase} 17 | \defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1} 18 | \fi 19 | \usepackage{lmodern} 20 | \ifPDFTeX\else 21 | % xetex/luatex font selection 22 | \fi 23 | % Use upquote if available, for straight quotes in verbatim environments 24 | \IfFileExists{upquote.sty}{\usepackage{upquote}}{} 25 | \IfFileExists{microtype.sty}{% use microtype if available 26 | \usepackage[]{microtype} 27 | \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts 28 | }{} 29 | \makeatletter 30 | \@ifundefined{KOMAClassName}{% if non-KOMA class 31 | \IfFileExists{parskip.sty}{% 32 | \usepackage{parskip} 33 | }{% else 34 | \setlength{\parindent}{0pt} 35 | \setlength{\parskip}{6pt plus 2pt minus 1pt}} 36 | }{% if KOMA class 37 | \KOMAoptions{parskip=half}} 38 | \makeatother 39 | \usepackage{graphicx} 40 | \makeatletter 41 | \newsavebox\pandoc@box 42 | \newcommand*\pandocbounded[1]{% scales image to fit in text height/width 43 | \sbox\pandoc@box{#1}% 44 | \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% 45 | \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% 46 | \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both 47 | \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% 48 | \else\usebox{\pandoc@box}% 49 | \fi% 50 | } 51 | % Set default figure placement to htbp 52 | \def\fps@figure{htbp} 53 | \makeatother 54 | \setlength{\emergencystretch}{3em} % prevent overfull lines 55 | \providecommand{\tightlist}{% 56 | \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} 57 | \usepackage{langsci-gb4e} 58 | \usepackage{chngcntr} 59 | \counterwithin{xnumi}{section} 60 | \exewidth{(9.123)} 61 | \usepackage{bookmark} 62 | \IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available 63 | \urlstyle{same} 64 | \hypersetup{ 65 | pdftitle={Using pandoc-ling}, 66 | pdfauthor={Michael Cysouw}, 67 | hidelinks, 68 | pdfcreator={LaTeX via pandoc}} 69 | 70 | \title{Using pandoc-ling} 71 | \author{Michael Cysouw} 72 | \date{} 73 | 74 | \begin{document} 75 | \maketitle 76 | 77 | { 78 | \setcounter{tocdepth}{3} 79 | \tableofcontents 80 | } 81 | \section{pandoc-ling}\label{pandoc-ling} 82 | 83 | \emph{Michael Cysouw} 84 | \textless{}\href{mailto:cysouw@mac.com}{\nolinkurl{cysouw@mac.com}}\textgreater{} 85 | 86 | A Pandoc filter for linguistic examples 87 | 88 | tl;dr 89 | 90 | \begin{itemize} 91 | \tightlist 92 | \item 93 | Easily write linguistic examples including basic interlinear glossing. 94 | \item 95 | Let numbering and cross-referencing be done for you. 96 | \item 97 | Export to (almost) any format of your wishes for final polishing. 98 | \item 99 | As an example, check out this readme in 100 | \href{https://cysouw.github.io/pandoc-ling/readme.html}{HTML} or 101 | \href{https://cysouw.github.io/pandoc-ling/readme_gb4e.pdf}{Latex}. 102 | \end{itemize} 103 | 104 | \section{Rationale}\label{rationale} 105 | 106 | In the field of linguistics there is an outspoken tradition to format 107 | example sentences in research papers in a very specific way. In the 108 | field, it is a perennial problem to get such example sentences to look 109 | just right. Within Latex, there are numerous packages to deal with this 110 | problem (e.g.~covington, linguex, gb4e, expex, etc.). Depending on your 111 | needs, there is some Latex solution for almost everyone. However, these 112 | solutions in Latex are often cumbersome to type, and they are not 113 | portable to other formats. Specifically, transfer between latex, html, 114 | docx, odt or epub would actually be highly desirable. Such transfer is 115 | the hallmark of \href{https://pandoc.org}{Pandoc}, a tool by John 116 | MacFarlane that provides conversion between these (and many more) 117 | formats. 118 | 119 | Any such conversion between text-formats naturally never works 120 | perfectly: every text-format has specific features that are not 121 | transferable to other formats. A central goal of Pandoc (at least in my 122 | interpretation) is to define a set of shared concepts for text-structure 123 | (a `common denominator' if you will, but surely not `least'!) that can 124 | then be mapped to other formats. In many ways, Pandoc tries (again) to 125 | define a set of logical concepts for text structure (`semantic markup'), 126 | which can then be formatted by your favourite typesetter. As long as you 127 | stay inside the realm of this `common denominator' (in practice that 128 | means Pandoc's extended version of Markdown/CommonMark), conversion 129 | works reasonably well (think 90\%-plus). 130 | 131 | Building on John Gruber's 132 | \href{https://daringfireball.net/projects/markdown/syntax}{Markdown 133 | philosophy}, there is a strong urge here to learn to restrain oneself 134 | while writing, and try to restrict the number of layout-possibilities to 135 | a minimum. In this sense, with \texttt{pandoc-ling} I propose a 136 | Markdown-structure for linguistic examples that is simple, easy to type, 137 | easy to read, and portable through the Pandoc universe by way of an 138 | extension mechanism of Pandoc, called a `Pandoc Lua Filter'. This 139 | extension will not magically allow you to write every linguistic example 140 | thinkable, but my guess is that in practice the present proposal covers 141 | the majority of situations in linguistic publications (think 90\%-plus). 142 | As an example (and test case) I have included automatic conversions into 143 | various formats in this repository (chech them out in the directory 144 | \texttt{tests} to get an idea of the strengths and weaknesses of the 145 | current implementation). 146 | 147 | \section{The basic structure of a linguistic 148 | example}\label{the-basic-structure-of-a-linguistic-example} 149 | 150 | Basically, a linguistic example consists of 6 possible building blocks, 151 | of which only the number and at least one example line are necessary. 152 | The space between the building blocks is kept as minimal as possible 153 | without becoming cramped. When (optional) building blocks are not 154 | included, then the other blocks shift left and up (only exception: a 155 | preamble without labels is not shifted left completely, but left-aligned 156 | with the example, not with the judgement). 157 | 158 | \begin{itemize} 159 | \tightlist 160 | \item 161 | \textbf{Number}: Running tally of all examples in the work, possibly 162 | restarting at chapters or other major headings. Typically between 163 | round brackets, possibly with a chapter number added before in long 164 | works, e.g.~example (7.26). Aligned top-left, typically left-aligned 165 | to main text margin. 166 | \item 167 | \textbf{Preamble}: Optional information about the content/kind of 168 | example. Aligned top-left: to the top with the number, to the left 169 | with the (optional) label. When there is no label, then preamble is 170 | aligned with the example, not with the judgment. 171 | \item 172 | \textbf{Label}: Indices for sub-examples. Only present when there are 173 | more than one example grouped together inside one numbered entity. 174 | Typically these sub-example labels use latin letters followed by a 175 | full stop. They are left-aligned with the preamble, and each label is 176 | top-aligned with the top-line of the corresponding example (important 177 | for longer line-wrapped examples). 178 | \item 179 | \textbf{Judgment}: Examples can optionally have grammaticality 180 | judgments, typically symbols like **?!* sometimes in superscript 181 | relative to the corresponding example. judgements are right-aligned to 182 | each other, typically with only minimal space to the left-aligned 183 | examples. 184 | \item 185 | \textbf{Line example}: A minimal linguistic example has at least one 186 | line example, i.e.~an utterance of interest. Building blocks in 187 | general shift left and up when other (optional) building blocks are 188 | not present. Minimally, this results in a number with one line 189 | example. 190 | \item 191 | \textbf{Interlinear example}: A complex structure typically used for 192 | examples from languages unknown to most readers. Consist of three or 193 | four lines that are left-aligned: 194 | 195 | \begin{itemize} 196 | \tightlist 197 | \item 198 | \textbf{Header}: An optional header is typically used to display 199 | information about the language of the example, including literature 200 | references. When not present, then all other lines from the 201 | interlinear example shift upwards. 202 | \item 203 | \textbf{Source}: The actual language utterance, often typeset in 204 | italics. This line is internally separated at spaces, and each 205 | sub-block is left-aligned with the corresponding sub-blocks of the 206 | gloss. 207 | \item 208 | \textbf{Gloss}: Explanation of the meaning of the source, often 209 | using abbreviations in small caps. This line is internally separated 210 | at spaces, and each block is left-aligned with the block from 211 | source. 212 | \item 213 | \textbf{Translation}: Free translation of the source, typically 214 | quoted. Not separated in blocks, but freely extending to the right. 215 | Left-aligned with the other lines from the interlinear example. 216 | \end{itemize} 217 | \end{itemize} 218 | 219 | \begin{figure} 220 | \centering 221 | \pandocbounded{\includegraphics[keepaspectratio,alt={The structure of a linguistic example.}]{figure/ExampleStructure.png}} 222 | \caption{The structure of a linguistic example.} 223 | \end{figure} 224 | 225 | There are of course much more possibilities to extend the structure of a 226 | linguistic examples, like third or fourth subdivisions of labels (often 227 | using small roman numerals as a third level) or multiple glossing lines 228 | in the interlinear example. Also, the content of the header is sometimes 229 | found right-aligned to the right of the interlinear example (language 230 | into to the top, reference to the bottom). All such options are 231 | currently not supported by \texttt{pandoc-ling}. 232 | 233 | Under the hood, this structure is prepared by \texttt{pandoc-ling} as a 234 | table. Tables are reasonably well transcoded to different document 235 | formats. Specific layout considerations mostly have to be set manually. 236 | Alignment of the text should work in most exports. Some \texttt{CSS} 237 | styling is proposed by \texttt{pandoc-ling}, but can of course be 238 | overruled. For latex (and beamer) special output is prepared using 239 | various available latex packages (see options, below). 240 | 241 | \section{\texorpdfstring{Introducing 242 | \texttt{pandoc-ling}}{Introducing pandoc-ling}}\label{introducing-pandoc-ling} 243 | 244 | \subsection{Editing linguistic 245 | examples}\label{editing-linguistic-examples} 246 | 247 | To include a linguistic example in Markdown \texttt{pandoc-ling} uses 248 | the \texttt{div} structure, which is indicated in Pandoc-Markdown by 249 | typing three colons at the start and three colons at the end. To 250 | indicate the \texttt{class} of this \texttt{div} the letters `ex' (for 251 | `example') should be added after the top colons (with or without space 252 | in between). This `ex'-class is the signal for \texttt{pandoc-ling} to 253 | start processing such a \texttt{div}. The numbering of these examples 254 | will be inserted by \texttt{pandoc-ling}. 255 | 256 | Empty lines can be added inside the \texttt{div} for visual pleasure, as 257 | they mostly do not have an influence on the output. Exception: do 258 | \emph{not} use empty lines between unlabelled line examples. Multiple 259 | lines of text can be used (without empty lines in between), but they 260 | will simply be interpreted as one sequential paragraph. 261 | 262 | \begin{verbatim} 263 | ::: ex 264 | This is the most basic structure of a linguistic example. 265 | ::: 266 | \end{verbatim} 267 | 268 | \begin{samepage} 269 | \ea \judgewidth{} \label{ex4.1} 270 | This is the most basic structure of a linguistic example. 271 | \z 272 | \end{samepage} 273 | 274 | Alternatively, the \texttt{class} can be put in curled brackets (and 275 | then a leading full stop is necessary before \texttt{ex}). Inside these 276 | brackets more attributes can be added (separated by space), for example 277 | an id, using a hash, or any attribute=value pairs that should apply to 278 | this example. Currently there is only one real attribute implemented 279 | (\texttt{formatGloss}), but in principle it is possible to add more 280 | attributes that can be used to fine-tune the typesetting of the example 281 | (see below for a description of such \texttt{local\ options}). 282 | 283 | \begin{verbatim} 284 | ::: {#id .ex formatGloss=false} 285 | 286 | This is a multi-line example. 287 | But that does not mean anything for the result 288 | All these lines are simply treated as one paragraph. 289 | They will become one example with one number. 290 | 291 | ::: 292 | \end{verbatim} 293 | 294 | \begin{samepage} 295 | \ea \judgewidth{} \label{id} 296 | This is a multi-line example. But that does not mean anything for the 297 | result All these lines are simply treated as one paragraph. They will 298 | become one example with one number. 299 | \z 300 | \end{samepage} 301 | 302 | A preamble can be added by inserting an empty line between preamble and 303 | example. The same considerations about multiple text-lines apply. 304 | 305 | \begin{verbatim} 306 | :::ex 307 | Preamble 308 | 309 | This is an example with a preamble. 310 | ::: 311 | \end{verbatim} 312 | 313 | \begin{samepage} 314 | \ea \judgewidth{} \label{ex4.3} Preamble\\* 315 | This is an example with a preamble. 316 | \z 317 | \end{samepage} 318 | 319 | Sub-examples with labels are entered by starting each sub-example with a 320 | small latin letter and a full stop. Empty lines between labels are 321 | allowed. Subsequent lines without labels are treated as one paragraph. 322 | Empty lines \emph{not} followed by a label with a full stop will result 323 | in errors. 324 | 325 | \begin{verbatim} 326 | :::ex 327 | a. This is the first example. 328 | b. This is the second. 329 | a. The actual letters are not important, `pandoc-ling` will put them in order. 330 | 331 | e. Empty lines are allowed between labelled lines 332 | Subsequent lines are again treated as one sequential paragraph. 333 | ::: 334 | \end{verbatim} 335 | 336 | \begin{samepage} 337 | \ea \judgewidth{} \label{ex4.4} 338 | \ea [] { This is the first example. } 339 | \ex [] { This is the second. } 340 | \ex [] { The actual letters are not important, \texttt{pandoc-ling} 341 | will put them in order. } 342 | \ex [] { Empty lines are allowed between labelled lines Subsequent 343 | lines are again treated as one sequential paragraph. } 344 | \z 345 | \z 346 | \end{samepage} 347 | 348 | A labelled list can be combined with a preamble. 349 | 350 | \begin{verbatim} 351 | :::ex 352 | Any nice description here 353 | 354 | a. one example sentence. 355 | b. two 356 | c. three 357 | ::: 358 | \end{verbatim} 359 | 360 | \begin{samepage} 361 | \ea \judgewidth{} \label{ex4.5} Any nice description here 362 | \ea [] { one example sentence. } 363 | \ex [] { two } 364 | \ex [] { three } 365 | \z 366 | \z 367 | \end{samepage} 368 | 369 | Grammaticality judgements should be added before an example, and after 370 | an optional label, separated from both by spaces (though four spaces in 371 | a row should be avoided, that could lead to layout errors). To indicate 372 | that any sequence of symbols is a judgements, prepend the judgement with 373 | a caret \texttt{\^{}}. Alignment will be figured out by 374 | \texttt{pandoc-ling}. 375 | 376 | \begin{verbatim} 377 | :::ex 378 | Throwing in a preamble for good measure 379 | 380 | a. ^* This traditionally signals ungrammaticality. 381 | b. ^? Question-marks indicate questionable grammaticality. 382 | c. ^^whynot?^ But in principle any sequence can be used (here even in superscript). 383 | d. However, such long sequences sometimes lead to undesirable effects in the layout. 384 | ::: 385 | \end{verbatim} 386 | 387 | \begin{samepage} 388 | \ea \judgewidth{whynot?} \label{ex4.6} Throwing in a preamble for good 389 | measure 390 | \ea [*] { This traditionally signals ungrammaticality. } 391 | \ex [?] { Question-marks indicate questionable grammaticality. } 392 | \ex [\textsuperscript{whynot?}] { But in principle any sequence can be 393 | used (here even in superscript). } 394 | \ex [] { However, such long sequences sometimes lead to undesirable 395 | effects in the layout. } 396 | \z 397 | \z 398 | \end{samepage} 399 | 400 | A minor detail is the alignment of a single example with a preamble and 401 | grammaticality judgements. In this case it looks better for the preamble 402 | to be left aligned with the example and not with the judgement. 403 | 404 | \begin{verbatim} 405 | :::ex 406 | Here is a special case with a preamble 407 | 408 | ^^???^ With a singly questionably example. 409 | Note the alignment! Especially with this very long example 410 | that should go over various lines in the output. 411 | ::: 412 | \end{verbatim} 413 | 414 | \begin{samepage} 415 | \ea \judgewidth{???} \label{ex4.7} Here is a special case with a 416 | preamble\\* 417 | \textsuperscript{???}With a singly questionably example. Note the 418 | alignment! Especially with this very long example that should go over 419 | various lines in the output. 420 | \z 421 | \end{samepage} 422 | 423 | For the lazy writers among us, it is also possible to use a simple 424 | bullet list instead of a labelled list. Note that the listed elements 425 | will still be formatted as a labelled list. 426 | 427 | \begin{verbatim} 428 | :::ex 429 | - This is a lazy example. 430 | - ^# It should return letters at the start just as before. 431 | - ^% Also testing some unusual judgements. 432 | ::: 433 | \end{verbatim} 434 | 435 | \begin{samepage} 436 | \ea \judgewidth{\#} \label{ex4.8} 437 | \ea [] { This is a lazy example. } 438 | \ex [\#] { It should return letters at the start just as before. } 439 | \ex [\%] { Also testing some unusual judgements. } 440 | \z 441 | \z 442 | \end{samepage} 443 | 444 | Just for testing: a single example with a judgement (which resulted in 445 | an error in earlier versions). 446 | 447 | \begin{verbatim} 448 | ::: ex 449 | ^* This traditionally signals ungrammaticality. 450 | ::: 451 | \end{verbatim} 452 | 453 | \begin{samepage} 454 | \ea \judgewidth{*} \label{ex4.9} 455 | *This traditionally signals ungrammaticality. 456 | \z 457 | \end{samepage} 458 | 459 | \subsection{Interlinear examples}\label{interlinear-examples} 460 | 461 | For interlinear examples with aligned source and gloss, the structure of 462 | a \texttt{lineblock} is used, starting the lines with a vertical line 463 | \texttt{\textbar{}}. There should always be four vertical lines (for 464 | header, source, gloss and translation, respectively), although the 465 | content after the first vertical line can be empty. The source and gloss 466 | lines are separated at spaces, and all parts are right-aligned. If you 467 | want to have a space that is not separated, you will have to `protect' 468 | the space, either by putting a backslash before the space, or by 469 | inserting a non-breaking space instead of a normal space (either type 470 | \texttt{\ } or insert an actual non-breaking space, i.e.~unicode 471 | character \texttt{U+00A0}). 472 | 473 | \begin{verbatim} 474 | :::ex 475 | | Dutch (Germanic) 476 | | Deze zin is in het nederlands. 477 | | DEM sentence AUX in DET dutch. 478 | | This sentence is dutch. 479 | ::: 480 | \end{verbatim} 481 | 482 | \begin{samepage} 483 | \ea [] { \judgewidth{} \label{ex4.10} 484 | Dutch (Germanic)\\* 485 | \gll Deze zin is in het nederlands. \\ 486 | DEM sentence AUX in DET dutch. \\ 487 | \glt This sentence is dutch. } 488 | \z 489 | \end{samepage} 490 | 491 | An attempt is made to format interlinear examples when the option 492 | \texttt{formatGloss=true} is added. This will: 493 | 494 | \begin{itemize} 495 | \tightlist 496 | \item 497 | remove formatting from the source and set everything in italics, 498 | \item 499 | remove formatting from the gloss and set sequences (\textgreater1) of 500 | capitals and numbers into small caps (note that the positioning of 501 | small caps on web pages is 502 | \href{https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align}{highly 503 | complex}), 504 | \item 505 | a tilde \texttt{\textasciitilde{}} between spaces in the gloss is 506 | treated as a shortcut for an empty gloss (internally, the sequence 507 | \texttt{space-tilde-space} is replaced by 508 | \texttt{space-space-nonBreakingSpace-space-space}), 509 | \item 510 | consistently put translations in single quotes, possibly removing 511 | other quotes. 512 | \end{itemize} 513 | 514 | \begin{verbatim} 515 | ::: {.ex formatGloss=true} 516 | | Dutch (Germanic) 517 | | Is deze zin in het nederlands ? 518 | | AUX DEM sentence in DET dutch Q 519 | | Is this sentence dutch? 520 | ::: 521 | \end{verbatim} 522 | 523 | \begin{samepage} 524 | \ea [] { \judgewidth{} \label{ex4.11} 525 | Dutch (Germanic)\\* 526 | \gll \emph{Is} \emph{deze} \emph{zin} \emph{in} \emph{het} 527 | \emph{nederlands} \emph{?} \\ 528 | \textsc{aux} \textsc{dem} sentence in \textsc{det} dutch 529 | \textsc{q} \\ 530 | \glt `Is this sentence dutch?' } 531 | \z 532 | \end{samepage} 533 | 534 | The results of such formatting will not always work, but it seems to be 535 | quite robust in my testing. The next example brings everything together: 536 | 537 | \begin{itemize} 538 | \tightlist 539 | \item 540 | a preamble, 541 | \item 542 | labels, both for single lines and for interlinear examples, 543 | \item 544 | interlinear examples start on a new line immediately after the 545 | letter-label, 546 | \item 547 | grammaticality judgements with proper alignment, 548 | \item 549 | when the header of an interlinear example is left out, everything is 550 | shifted up, 551 | \item 552 | The formatting of the interlinear is harmonised. 553 | \end{itemize} 554 | 555 | \begin{verbatim} 556 | ::: {.ex formatGloss=true samePage=false} 557 | Completely superfluous preamble, but it works ... 558 | 559 | a. 560 | | Dutch (Germanic) Note the grammaticality judgement! 561 | | ^^:–)^ Deze zin is (dit\ is test) nederlands. 562 | | DEM sentence AUX ~ dutch. 563 | | This sentence is dutch. 564 | 565 | b. 566 | | 567 | | Deze tweede zin heeft geen header. 568 | | DEM second sentence have.3SG.PRES no header. 569 | | This second sentence does not have a header. 570 | 571 | a. Mixing single line examples with interlinear examples. 572 | a. This is of course highly unusal. 573 | Just for this example, let's add some extra material in this example. 574 | ::: 575 | \end{verbatim} 576 | 577 | \ea \judgewidth{:–)} \label{ex4.12} Completely superfluous preamble, but 578 | it works \ldots{} 579 | \ea [\textsuperscript{:--)}] { 580 | Dutch (Germanic) Note the grammaticality judgement!\\* 581 | \gll \emph{Deze} \emph{zin} \emph{is} \emph{(dit~is~test)} 582 | \emph{nederlands.} \\ 583 | \textsc{dem} sentence \textsc{aux} ~ dutch. \\ 584 | \glt `This sentence is dutch.' } 585 | \ex [] { 586 | \gll \emph{Deze} \emph{tweede} \emph{zin} \emph{heeft} \emph{geen} 587 | \emph{header.} \\ 588 | \textsc{dem} second sentence have.\textsc{3sg}.\textsc{pres} no 589 | header. \\ 590 | \glt `This second sentence does not have a header.' } 591 | \ex [] { Mixing single line examples with interlinear examples. } 592 | \ex [] { This is of course highly unusal. Just for this example, let's 593 | add some extra material in this example. } 594 | \z 595 | \z 596 | 597 | Also, as a quick workaround for showing multiple source lines without 598 | alignment with the glossing (e.g.~for phonetic or orthographic 599 | representations of the example), it is possible to use the header of 600 | interlinear example. For a line break in the header, use the double 601 | backslash \texttt{\textbackslash{}\textbackslash{}}, either inline or at 602 | the end of a line. When you type a header using multiple lines (as shown 603 | below), then subsequent lines have to start with space. For now, this 604 | only works in the header line. 605 | 606 | \begin{verbatim} 607 | ::: ex 608 | | Example with an multiline header \\ 609 | *can be used for orthographic representations*, \\ 610 | or phonetic transcription, \\ or for whatever you like 611 | | Dit is een lui voorbeeld=je 612 | | DEM COP DET lazy example=DIM 613 | | This is a lazy example. 614 | ::: 615 | \end{verbatim} 616 | 617 | \begin{samepage} 618 | \ea [] { \judgewidth{} \label{ex4.13} 619 | Example with an multiline header \\ 620 | \emph{can be used for orthographic representations}, \\ 621 | or phonetic transcription, \\ 622 | or for whatever you like\\* 623 | \gll Dit is een lui voorbeeld=je \\ 624 | DEM COP DET lazy example=DIM \\ 625 | \glt This is a lazy example. } 626 | \z 627 | \end{samepage} 628 | 629 | \subsection{Cross-referencing 630 | examples}\label{cross-referencing-examples} 631 | 632 | The examples are automatically numbered by \texttt{pandoc-ling}. 633 | Cross-references to examples inside a document can be made by using the 634 | \texttt{{[}@ID{]}} format (used by Pandoc for citations). When an 635 | example has an explicit identifier (like \texttt{\#test} in the next 636 | example), then a reference can be made to this example with 637 | \texttt{{[}@test{]}}, leading to (\ref{test}) when formatted (note that 638 | the formatting does not work on the github website. Please check the 639 | `docs' subdirectory). 640 | 641 | \begin{verbatim} 642 | ::: {#test .ex} 643 | This is a test 644 | ::: 645 | \end{verbatim} 646 | 647 | \begin{samepage} 648 | \ea \judgewidth{} \label{test} 649 | This is a test 650 | \z 651 | \end{samepage} 652 | 653 | Inspired by the \texttt{linguex}-approach, you can also use the keywords 654 | \texttt{next} or \texttt{last} to refer to the next or the last example, 655 | e.g.~\texttt{{[}@last{]}} will be formatted as (\ref{test}). By doubling 656 | the first letters to \texttt{nnext} or \texttt{llast} reference to the 657 | next/last-but-one can be made. Actually, the number of starting letters 658 | can be repeated at will in \texttt{pandoc-ling}, so something like 659 | \texttt{{[}@llllllllast{]}} will also work. It will be formatted as 660 | (\ref{ex4.7}) after the processing of \texttt{pandoc-ling}. Needless to 661 | say that in such a situation an explicit identifier would be a better 662 | choice. 663 | 664 | Referring to sub-examples can be done by manually adding a suffix into 665 | the cross reference, simply separated from the identifier by a space. 666 | For example, \texttt{{[}@lllast~c{]}} will refer to the third 667 | sub-example of the last-but-two example. Formatted this will look like 668 | this: (\ref{ex4.13}\,c), smile! However, note that the ``c'' has to be 669 | manually determined. It is simply a literal suffix that will be copied 670 | into the cross-reference. Something like \texttt{{[}@last\ hA1l0{]}} 671 | will work also, leading to (\ref{test}\,hA1l0) when formatted (which is 672 | of course nonsensical). 673 | 674 | For exports that include attributes (like html), the examples have an 675 | explicit id of the form \texttt{exNUMBER} in which \texttt{NUMBER} is 676 | the actual number as given in the formatted output. This means that it 677 | is possible to refer to an example on any web-page by using the 678 | hash-mechanism to refer to a part of the web-page. For example 679 | \texttt{\#ex4.7} at can be used to refer to the seventh example in the 680 | html-output of this readme (try 681 | \href{https://cysouw.github.io/pandoc-ling/readme.html\#ex4.7}{this 682 | link}). The id in this example has a chapter number `4' because in the 683 | html conversion I have set the option \texttt{addChapterNumber} to 684 | \texttt{true}. (Note: when numbers restart the count in each chapter 685 | with the option \texttt{restartAtChapter}, then the id is of the form 686 | \texttt{exCHAPTER.NUMBER}. This is necessary to resolve clashing ids, as 687 | the same number might then be used in different chapters.) 688 | 689 | I propose to use these ids also to refer to examples in citations when 690 | writing scholarly papers, e.g.~(Cysouw 2021: \#ex7), independent of 691 | whether the links actually resolve. In principle, such citations could 692 | easily be resolved when online publications are properly prepared. The 693 | same proposal could also work for other parts of research papers, for 694 | example using tags like \texttt{\#sec,\ \#fig,\ \#tab,\ \#eq} (see the 695 | Pandoc filter 696 | \href{https://github.com/cysouw/crossref-adapt}{\texttt{crossref-adapt}}). 697 | To refer to paragraphs (which should replace page numbers in a future of 698 | adaptive design), I propose to use no tag, but directly add the number 699 | to the hash (see the Pandoc filter 700 | \href{https://github.com/cysouw/count-para}{\texttt{count-para}} for a 701 | practical mechanism to add such numbering). 702 | 703 | \subsection{\texorpdfstring{Options of 704 | \texttt{pandoc-ling}}{Options of pandoc-ling}}\label{options-of-pandoc-ling} 705 | 706 | \subsubsection{Global options}\label{global-options} 707 | 708 | The following global options are available with \texttt{pandoc-ling}. 709 | These can be added to the 710 | \href{https://pandoc.org/MANUAL.html\#metadata-blocks}{Pandoc metadata}. 711 | An example of such metadata can be found at the bottom of this 712 | \texttt{readme} in the form of a YAML-block. Pandoc allows for various 713 | methods to provide metadata (see the link above). 714 | 715 | \begin{itemize} 716 | \tightlist 717 | \item 718 | \textbf{\texttt{formatGloss}} (boolean, default \texttt{false}): 719 | should all interlinear examples be consistently formatted? If you use 720 | this option, you can simply use capital letters for abbreviations in 721 | the gloss, and they will be changed to small caps. The source line is 722 | set to italics, and the translations is put into single quotes. 723 | \item 724 | \textbf{\texttt{samePage}} (boolean, default \texttt{true}, only for 725 | Latex): should examples be kept together on the same page? Can also be 726 | overriden for individual examples by adding 727 | \texttt{\{.ex\ samePage=false\}} at the start of an example (cf.~below 728 | on \texttt{local\ options}). 729 | \item 730 | \textbf{\texttt{xrefSuffixSep}} (string, defaults to no-break-space): 731 | When cross references have a suffix, how should the separator be 732 | formatted? The defaults `no-break-space' is a safe options. I 733 | personally like a `narrow no-break space' better (Unicode 734 | \texttt{U+202F}), but this symbol does not work with all fonts, and 735 | might thus lead to errors. For Latex typesetting, all space-like 736 | symbols are converted to a Latex thin space 737 | \texttt{\textbackslash{},}. 738 | \item 739 | \textbf{\texttt{restartAtChapter}} (boolean, default \texttt{false}): 740 | should the counting restart for each chapter? 741 | 742 | \begin{itemize} 743 | \tightlist 744 | \item 745 | Actually, when \texttt{true} this setting will restart the counting 746 | at the highest heading level, which for various output formats can 747 | be set by the Pandoc option \texttt{top-level-division}. 748 | \item 749 | The id of each example will now be of the form 750 | \texttt{exCHAPTER.NUMBER} to resolve any clashes when the same 751 | number appears in different chapter. 752 | \item 753 | Depending on your Latex setup, an explicit entry 754 | \texttt{top-level-division:\ chapter} might be necessary in your 755 | metadata. 756 | \end{itemize} 757 | \item 758 | \textbf{\texttt{addChapterNumber}} (boolean, default \texttt{false}): 759 | should the chapter (= highest heading level) number be added to the 760 | number of the example? When setting this to \texttt{true} any setting 761 | of \texttt{restartAtChapter} will be ignored. In most Latex situations 762 | this only works in combination with a \texttt{documentclass:\ book}. 763 | \item 764 | \textbf{\texttt{latexPackage}} (one of: \texttt{linguex}, 765 | \texttt{gb4e}, \texttt{langsci-gb4e}, \texttt{expex}, default 766 | \texttt{linguex}): Various options for converting examples to Latex 767 | packages that typeset linguistic examples. None of the conversions 768 | works perfectly, though in should work in most normal situations 769 | (think 90\%-plus). It might be necessary to first convert to 770 | \texttt{Latex}, correct the output, and then typeset separately with a 771 | latex compiler like \texttt{xelatex}. Using the direct option insider 772 | Pandoc might also work in many situations. Export to 773 | \textbf{\texttt{beamer}} seems to work reasonably well with the 774 | \texttt{gb4e} package. All others have artefacts or errors. 775 | \end{itemize} 776 | 777 | \subsubsection{Local options}\label{local-options} 778 | 779 | Local options are options that can be set for each individual example. 780 | The \texttt{formatGloss} option can be used to have an individual 781 | example be formatted differently from the global setting. For example, 782 | when the global setting is \texttt{formatGloss:\ true} in the metadata, 783 | then adding \texttt{formatGloss=false} in the curly brackets of a 784 | specific example will block the formatting. This is especially useful 785 | when the automatic formatting does not give the desired result. 786 | 787 | If you want to add something else (not a linguistic example) in a 788 | numbered example, then there is the local option \texttt{noFormat=true}. 789 | An attempt will be made to try and do a reasonable layout. Multiple 790 | paragraphs will simply we taken as is, and the number will be put in 791 | front. In HTML the number will be centred. It is usable for an 792 | incidental mathematical formula. 793 | 794 | \begin{verbatim} 795 | ::: {.ex noFormat=true} 796 | $$\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}$$ 797 | ::: 798 | \end{verbatim} 799 | 800 | \begin{samepage} 801 | \ea \judgewidth{} \label{ex4.15} 802 | \[\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}\]\\ 803 | 804 | \z 805 | \end{samepage} 806 | 807 | \subsection{\texorpdfstring{Issues with 808 | \texttt{pandoc-ling}}{Issues with pandoc-ling}}\label{issues-with-pandoc-ling} 809 | 810 | \begin{itemize} 811 | \tightlist 812 | \item 813 | Manually provided identifiers for examples should not be purely 814 | numerical (so do not use e.g.~\texttt{\#5789}). In some situation this 815 | interferes with the setting of the cross-references. 816 | \item 817 | Because the cross-references use the same structure as citations in 818 | Pandoc, the processing of citations (by \texttt{citeproc}) should be 819 | performed \textbf{after} the processing by \texttt{pandoc-ling}. 820 | Another Pandoc filter, 821 | \href{https://github.com/lierdakil/pandoc-crossref}{\texttt{pandoc-crossref}}, 822 | for numbering figures and other captions, also uses the same system. 823 | There seems to be no conflict between \texttt{pandoc-ling} and 824 | \texttt{pandoc-crossref}. 825 | \item 826 | Interlinear examples will will not wrap at the end of the page. There 827 | is no solution yet for longer examples that are longer than the size 828 | of the page. 829 | \item 830 | It is not (yet) possible to have more than one glossing line. 831 | \item 832 | When exporting to \texttt{docx} there is a problem because there are 833 | paragraphs inserted after tables, which adds space in lists with 834 | multiple interlinear examples (except when they have exactly the same 835 | number of columns). This is 836 | \href{https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2013_release/how-to-remove-extra-paragraph-after-table/995b3811-9f55-4df1-bbbc-9f672b1ad262}{by 837 | design}. The official solution is to set font-size to 1 for this 838 | paragraph inside MS Word. 839 | \item 840 | Multi-column cells are crucial for \texttt{pandoc-ling} to work 841 | properly. These are only introduced in new table format with Pandoc 842 | 2.10 (so older Pandoc version are not supported). Also note that these 843 | structures are not yet exported to all formats, e.g.~it will not be 844 | displayed correctly in \texttt{docx}. However, this is currently an 845 | area of active development 846 | \item 847 | \texttt{langsci-gb4e} is only available as part of the 848 | \href{https://ctan.org/pkg/langsci?lang=en}{\texttt{langsci} package}. 849 | You have to make it available to Pandoc, e.g.~by adding it into the 850 | same directory as the pandoc-ling.lua filter. I have added a recent 851 | version of \texttt{langsci-gb4e} here for convenience, but this one 852 | might be outdated at some time in the future. 853 | \item 854 | \texttt{beamer} output seems to work best with 855 | \texttt{latexPackage:\ gb4e}. 856 | \end{itemize} 857 | 858 | \subsection{A note on Latex 859 | conversion}\label{a-note-on-latex-conversion} 860 | 861 | Originally, I decided to write this filter as a two-pronged conversion, 862 | making a markdown version myself, but using a mapping to one of the many 863 | latex libraries for linguistics examples as a quick fix. I assumed that 864 | such a mapping would be the easy part. However, it turned out that the 865 | mapping to latex was much more difficult that I anticipated. Basically, 866 | it turned out that the `common denominator' that I was aiming for was 867 | not necessarily the `common denominator' provided by the latex packages. 868 | I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and 869 | expex) with growing dismay. This approach resulted in a first version. 870 | However, after this version was (more or less) finished, I realised that 871 | it would be better to first define the `common denominator' more clearly 872 | (as done here), and then implement this purely in Pandoc. From that 873 | basis I have then made attempts to map them to the various latex 874 | packages. 875 | 876 | \subsection{A note on implementation}\label{a-note-on-implementation} 877 | 878 | The basic structure of the examples are transformed into Pandoc tables. 879 | Tables are reasonably safe for converting in other formats. Care has 880 | been taken to add \texttt{classes} to all elements of the tables 881 | (e.g.~the preamble has the class \texttt{linguistic-example-preamble}). 882 | When exported formats are aware of these classes, they can be used to 883 | fine-tune the formatting. I have used a few such fine-tunings into the 884 | html output of this filter by adding a few CSS-style statements. The 885 | naming of the classes is quite transparent, using the form 886 | \texttt{linguistic-example-STRUCTURE}. 887 | 888 | The whole table is encapsulated in a \texttt{div} with class \texttt{ex} 889 | and an id of the form \texttt{exNUMBER}. This means that an example can 890 | be directly referred to in web-links by using the hash-mechanism. For 891 | example, adding \texttt{\#ex3} to the end of a link will immediately 892 | jump to this example in a browser. 893 | 894 | The current implementation is completely independent from the 895 | \href{https://pandoc.org/MANUAL.html\#numbered-example-lists}{Pandoc 896 | numbered examples implementation} and both can work side by side, like 897 | (2): 898 | 899 | \begin{enumerate} 900 | \def\labelenumi{(\arabic{enumi})} 901 | \item 902 | These are native Pandoc numbered examples 903 | \item 904 | They are independent of \texttt{pandoc-ling} but use the same output 905 | formatting in many default exports, like latex. 906 | \end{enumerate} 907 | 908 | However, in practice various output-formats of Pandoc (e.g.~latex) also 909 | use numbers in round brackets for these, so in practice it might be 910 | confusing to combine both. 911 | 912 | \end{document} 913 | -------------------------------------------------------------------------------- /docs/readme_linguex.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/docs/readme_linguex.pdf -------------------------------------------------------------------------------- /docs/readme_linguex.tex: -------------------------------------------------------------------------------- 1 | % Options for packages loaded elsewhere 2 | \PassOptionsToPackage{unicode}{hyperref} 3 | \PassOptionsToPackage{hyphens}{url} 4 | \documentclass[ 5 | ]{article} 6 | \usepackage{xcolor} 7 | \usepackage{amsmath,amssymb} 8 | \setcounter{secnumdepth}{5} 9 | \usepackage{iftex} 10 | \ifPDFTeX 11 | \usepackage[T1]{fontenc} 12 | \usepackage[utf8]{inputenc} 13 | \usepackage{textcomp} % provide euro and other symbols 14 | \else % if luatex or xetex 15 | \usepackage{unicode-math} % this also loads fontspec 16 | \defaultfontfeatures{Scale=MatchLowercase} 17 | \defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1} 18 | \fi 19 | \usepackage{lmodern} 20 | \ifPDFTeX\else 21 | % xetex/luatex font selection 22 | \fi 23 | % Use upquote if available, for straight quotes in verbatim environments 24 | \IfFileExists{upquote.sty}{\usepackage{upquote}}{} 25 | \IfFileExists{microtype.sty}{% use microtype if available 26 | \usepackage[]{microtype} 27 | \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts 28 | }{} 29 | \makeatletter 30 | \@ifundefined{KOMAClassName}{% if non-KOMA class 31 | \IfFileExists{parskip.sty}{% 32 | \usepackage{parskip} 33 | }{% else 34 | \setlength{\parindent}{0pt} 35 | \setlength{\parskip}{6pt plus 2pt minus 1pt}} 36 | }{% if KOMA class 37 | \KOMAoptions{parskip=half}} 38 | \makeatother 39 | \usepackage{graphicx} 40 | \makeatletter 41 | \newsavebox\pandoc@box 42 | \newcommand*\pandocbounded[1]{% scales image to fit in text height/width 43 | \sbox\pandoc@box{#1}% 44 | \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}% 45 | \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}% 46 | \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both 47 | \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}% 48 | \else\usebox{\pandoc@box}% 49 | \fi% 50 | } 51 | % Set default figure placement to htbp 52 | \def\fps@figure{htbp} 53 | \makeatother 54 | \setlength{\emergencystretch}{3em} % prevent overfull lines 55 | \providecommand{\tightlist}{% 56 | \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}} 57 | \usepackage{linguex} 58 | \renewcommand{\theExLBr}{} 59 | \renewcommand{\theExRBr}{} 60 | \newcommand{\jdg}[1]{\makebox[0.4em][r]{\normalfont#1\ignorespaces}} 61 | \usepackage{chngcntr} 62 | \counterwithin{ExNo}{section} 63 | \renewcommand{\Exarabic}{\thesection.\arabic} 64 | \usepackage{bookmark} 65 | \IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available 66 | \urlstyle{same} 67 | \hypersetup{ 68 | pdftitle={Using pandoc-ling}, 69 | pdfauthor={Michael Cysouw}, 70 | hidelinks, 71 | pdfcreator={LaTeX via pandoc}} 72 | 73 | \title{Using pandoc-ling} 74 | \author{Michael Cysouw} 75 | \date{} 76 | 77 | \begin{document} 78 | \maketitle 79 | 80 | { 81 | \setcounter{tocdepth}{3} 82 | \tableofcontents 83 | } 84 | \section{pandoc-ling}\label{pandoc-ling} 85 | 86 | \emph{Michael Cysouw} 87 | \textless{}\href{mailto:cysouw@mac.com}{\nolinkurl{cysouw@mac.com}}\textgreater{} 88 | 89 | A Pandoc filter for linguistic examples 90 | 91 | tl;dr 92 | 93 | \begin{itemize} 94 | \tightlist 95 | \item 96 | Easily write linguistic examples including basic interlinear glossing. 97 | \item 98 | Let numbering and cross-referencing be done for you. 99 | \item 100 | Export to (almost) any format of your wishes for final polishing. 101 | \item 102 | As an example, check out this readme in 103 | \href{https://cysouw.github.io/pandoc-ling/readme.html}{HTML} or 104 | \href{https://cysouw.github.io/pandoc-ling/readme_gb4e.pdf}{Latex}. 105 | \end{itemize} 106 | 107 | \section{Rationale}\label{rationale} 108 | 109 | In the field of linguistics there is an outspoken tradition to format 110 | example sentences in research papers in a very specific way. In the 111 | field, it is a perennial problem to get such example sentences to look 112 | just right. Within Latex, there are numerous packages to deal with this 113 | problem (e.g.~covington, linguex, gb4e, expex, etc.). Depending on your 114 | needs, there is some Latex solution for almost everyone. However, these 115 | solutions in Latex are often cumbersome to type, and they are not 116 | portable to other formats. Specifically, transfer between latex, html, 117 | docx, odt or epub would actually be highly desirable. Such transfer is 118 | the hallmark of \href{https://pandoc.org}{Pandoc}, a tool by John 119 | MacFarlane that provides conversion between these (and many more) 120 | formats. 121 | 122 | Any such conversion between text-formats naturally never works 123 | perfectly: every text-format has specific features that are not 124 | transferable to other formats. A central goal of Pandoc (at least in my 125 | interpretation) is to define a set of shared concepts for text-structure 126 | (a `common denominator' if you will, but surely not `least'!) that can 127 | then be mapped to other formats. In many ways, Pandoc tries (again) to 128 | define a set of logical concepts for text structure (`semantic markup'), 129 | which can then be formatted by your favourite typesetter. As long as you 130 | stay inside the realm of this `common denominator' (in practice that 131 | means Pandoc's extended version of Markdown/CommonMark), conversion 132 | works reasonably well (think 90\%-plus). 133 | 134 | Building on John Gruber's 135 | \href{https://daringfireball.net/projects/markdown/syntax}{Markdown 136 | philosophy}, there is a strong urge here to learn to restrain oneself 137 | while writing, and try to restrict the number of layout-possibilities to 138 | a minimum. In this sense, with \texttt{pandoc-ling} I propose a 139 | Markdown-structure for linguistic examples that is simple, easy to type, 140 | easy to read, and portable through the Pandoc universe by way of an 141 | extension mechanism of Pandoc, called a `Pandoc Lua Filter'. This 142 | extension will not magically allow you to write every linguistic example 143 | thinkable, but my guess is that in practice the present proposal covers 144 | the majority of situations in linguistic publications (think 90\%-plus). 145 | As an example (and test case) I have included automatic conversions into 146 | various formats in this repository (chech them out in the directory 147 | \texttt{tests} to get an idea of the strengths and weaknesses of the 148 | current implementation). 149 | 150 | \section{The basic structure of a linguistic 151 | example}\label{the-basic-structure-of-a-linguistic-example} 152 | 153 | Basically, a linguistic example consists of 6 possible building blocks, 154 | of which only the number and at least one example line are necessary. 155 | The space between the building blocks is kept as minimal as possible 156 | without becoming cramped. When (optional) building blocks are not 157 | included, then the other blocks shift left and up (only exception: a 158 | preamble without labels is not shifted left completely, but left-aligned 159 | with the example, not with the judgement). 160 | 161 | \begin{itemize} 162 | \tightlist 163 | \item 164 | \textbf{Number}: Running tally of all examples in the work, possibly 165 | restarting at chapters or other major headings. Typically between 166 | round brackets, possibly with a chapter number added before in long 167 | works, e.g.~example (7.26). Aligned top-left, typically left-aligned 168 | to main text margin. 169 | \item 170 | \textbf{Preamble}: Optional information about the content/kind of 171 | example. Aligned top-left: to the top with the number, to the left 172 | with the (optional) label. When there is no label, then preamble is 173 | aligned with the example, not with the judgment. 174 | \item 175 | \textbf{Label}: Indices for sub-examples. Only present when there are 176 | more than one example grouped together inside one numbered entity. 177 | Typically these sub-example labels use latin letters followed by a 178 | full stop. They are left-aligned with the preamble, and each label is 179 | top-aligned with the top-line of the corresponding example (important 180 | for longer line-wrapped examples). 181 | \item 182 | \textbf{Judgment}: Examples can optionally have grammaticality 183 | judgments, typically symbols like **?!* sometimes in superscript 184 | relative to the corresponding example. judgements are right-aligned to 185 | each other, typically with only minimal space to the left-aligned 186 | examples. 187 | \item 188 | \textbf{Line example}: A minimal linguistic example has at least one 189 | line example, i.e.~an utterance of interest. Building blocks in 190 | general shift left and up when other (optional) building blocks are 191 | not present. Minimally, this results in a number with one line 192 | example. 193 | \item 194 | \textbf{Interlinear example}: A complex structure typically used for 195 | examples from languages unknown to most readers. Consist of three or 196 | four lines that are left-aligned: 197 | 198 | \begin{itemize} 199 | \tightlist 200 | \item 201 | \textbf{Header}: An optional header is typically used to display 202 | information about the language of the example, including literature 203 | references. When not present, then all other lines from the 204 | interlinear example shift upwards. 205 | \item 206 | \textbf{Source}: The actual language utterance, often typeset in 207 | italics. This line is internally separated at spaces, and each 208 | sub-block is left-aligned with the corresponding sub-blocks of the 209 | gloss. 210 | \item 211 | \textbf{Gloss}: Explanation of the meaning of the source, often 212 | using abbreviations in small caps. This line is internally separated 213 | at spaces, and each block is left-aligned with the block from 214 | source. 215 | \item 216 | \textbf{Translation}: Free translation of the source, typically 217 | quoted. Not separated in blocks, but freely extending to the right. 218 | Left-aligned with the other lines from the interlinear example. 219 | \end{itemize} 220 | \end{itemize} 221 | 222 | \begin{figure} 223 | \centering 224 | \pandocbounded{\includegraphics[keepaspectratio,alt={The structure of a linguistic example.}]{figure/ExampleStructure.png}} 225 | \caption{The structure of a linguistic example.} 226 | \end{figure} 227 | 228 | There are of course much more possibilities to extend the structure of a 229 | linguistic examples, like third or fourth subdivisions of labels (often 230 | using small roman numerals as a third level) or multiple glossing lines 231 | in the interlinear example. Also, the content of the header is sometimes 232 | found right-aligned to the right of the interlinear example (language 233 | into to the top, reference to the bottom). All such options are 234 | currently not supported by \texttt{pandoc-ling}. 235 | 236 | Under the hood, this structure is prepared by \texttt{pandoc-ling} as a 237 | table. Tables are reasonably well transcoded to different document 238 | formats. Specific layout considerations mostly have to be set manually. 239 | Alignment of the text should work in most exports. Some \texttt{CSS} 240 | styling is proposed by \texttt{pandoc-ling}, but can of course be 241 | overruled. For latex (and beamer) special output is prepared using 242 | various available latex packages (see options, below). 243 | 244 | \section{\texorpdfstring{Introducing 245 | \texttt{pandoc-ling}}{Introducing pandoc-ling}}\label{introducing-pandoc-ling} 246 | 247 | \subsection{Editing linguistic 248 | examples}\label{editing-linguistic-examples} 249 | 250 | To include a linguistic example in Markdown \texttt{pandoc-ling} uses 251 | the \texttt{div} structure, which is indicated in Pandoc-Markdown by 252 | typing three colons at the start and three colons at the end. To 253 | indicate the \texttt{class} of this \texttt{div} the letters `ex' (for 254 | `example') should be added after the top colons (with or without space 255 | in between). This `ex'-class is the signal for \texttt{pandoc-ling} to 256 | start processing such a \texttt{div}. The numbering of these examples 257 | will be inserted by \texttt{pandoc-ling}. 258 | 259 | Empty lines can be added inside the \texttt{div} for visual pleasure, as 260 | they mostly do not have an influence on the output. Exception: do 261 | \emph{not} use empty lines between unlabelled line examples. Multiple 262 | lines of text can be used (without empty lines in between), but they 263 | will simply be interpreted as one sequential paragraph. 264 | 265 | \begin{verbatim} 266 | ::: ex 267 | This is the most basic structure of a linguistic example. 268 | ::: 269 | \end{verbatim} 270 | 271 | \begin{samepage} 272 | 273 | \ex. \label{ex4.1} 274 | This is the most basic structure of a linguistic example. 275 | 276 | \end{samepage} 277 | 278 | Alternatively, the \texttt{class} can be put in curled brackets (and 279 | then a leading full stop is necessary before \texttt{ex}). Inside these 280 | brackets more attributes can be added (separated by space), for example 281 | an id, using a hash, or any attribute=value pairs that should apply to 282 | this example. Currently there is only one real attribute implemented 283 | (\texttt{formatGloss}), but in principle it is possible to add more 284 | attributes that can be used to fine-tune the typesetting of the example 285 | (see below for a description of such \texttt{local\ options}). 286 | 287 | \begin{verbatim} 288 | ::: {#id .ex formatGloss=false} 289 | 290 | This is a multi-line example. 291 | But that does not mean anything for the result 292 | All these lines are simply treated as one paragraph. 293 | They will become one example with one number. 294 | 295 | ::: 296 | \end{verbatim} 297 | 298 | \begin{samepage} 299 | 300 | \ex. \label{id} 301 | This is a multi-line example. But that does not mean anything for the 302 | result All these lines are simply treated as one paragraph. They will 303 | become one example with one number. 304 | 305 | \end{samepage} 306 | 307 | A preamble can be added by inserting an empty line between preamble and 308 | example. The same considerations about multiple text-lines apply. 309 | 310 | \begin{verbatim} 311 | :::ex 312 | Preamble 313 | 314 | This is an example with a preamble. 315 | ::: 316 | \end{verbatim} 317 | 318 | \begin{samepage} 319 | 320 | \ex. \label{ex4.3} Preamble\\* 321 | This is an example with a preamble. 322 | 323 | \end{samepage} 324 | 325 | Sub-examples with labels are entered by starting each sub-example with a 326 | small latin letter and a full stop. Empty lines between labels are 327 | allowed. Subsequent lines without labels are treated as one paragraph. 328 | Empty lines \emph{not} followed by a label with a full stop will result 329 | in errors. 330 | 331 | \begin{verbatim} 332 | :::ex 333 | a. This is the first example. 334 | b. This is the second. 335 | a. The actual letters are not important, `pandoc-ling` will put them in order. 336 | 337 | e. Empty lines are allowed between labelled lines 338 | Subsequent lines are again treated as one sequential paragraph. 339 | ::: 340 | \end{verbatim} 341 | 342 | \begin{samepage} 343 | 344 | \ex. \label{ex4.4} 345 | \a. This is the first example. 346 | \b. This is the second. 347 | \b. The actual letters are not important, \texttt{pandoc-ling} will 348 | put them in order. 349 | \b. Empty lines are allowed between labelled lines Subsequent lines 350 | are again treated as one sequential paragraph. 351 | 352 | \end{samepage} 353 | 354 | A labelled list can be combined with a preamble. 355 | 356 | \begin{verbatim} 357 | :::ex 358 | Any nice description here 359 | 360 | a. one example sentence. 361 | b. two 362 | c. three 363 | ::: 364 | \end{verbatim} 365 | 366 | \begin{samepage} 367 | 368 | \ex. \label{ex4.5} Any nice description here 369 | \a. one example sentence. 370 | \b. two 371 | \b. three 372 | 373 | \end{samepage} 374 | 375 | Grammaticality judgements should be added before an example, and after 376 | an optional label, separated from both by spaces (though four spaces in 377 | a row should be avoided, that could lead to layout errors). To indicate 378 | that any sequence of symbols is a judgements, prepend the judgement with 379 | a caret \texttt{\^{}}. Alignment will be figured out by 380 | \texttt{pandoc-ling}. 381 | 382 | \begin{verbatim} 383 | :::ex 384 | Throwing in a preamble for good measure 385 | 386 | a. ^* This traditionally signals ungrammaticality. 387 | b. ^? Question-marks indicate questionable grammaticality. 388 | c. ^^whynot?^ But in principle any sequence can be used (here even in superscript). 389 | d. However, such long sequences sometimes lead to undesirable effects in the layout. 390 | ::: 391 | \end{verbatim} 392 | 393 | \begin{samepage} 394 | 395 | \ex. \label{ex4.6} Throwing in a preamble for good measure 396 | \a. *This traditionally signals ungrammaticality. 397 | \b. ?Question-marks indicate questionable grammaticality. 398 | \b. \textsuperscript{whynot?}But in principle any sequence can be used 399 | (here even in superscript). 400 | \b. However, such long sequences sometimes lead to undesirable effects 401 | in the layout. 402 | 403 | \end{samepage} 404 | 405 | A minor detail is the alignment of a single example with a preamble and 406 | grammaticality judgements. In this case it looks better for the preamble 407 | to be left aligned with the example and not with the judgement. 408 | 409 | \begin{verbatim} 410 | :::ex 411 | Here is a special case with a preamble 412 | 413 | ^^???^ With a singly questionably example. 414 | Note the alignment! Especially with this very long example 415 | that should go over various lines in the output. 416 | ::: 417 | \end{verbatim} 418 | 419 | \begin{samepage} 420 | 421 | \ex. \label{ex4.7} Here is a special case with a preamble\\* 422 | \textsuperscript{???}With a singly questionably example. Note the 423 | alignment! Especially with this very long example that should go over 424 | various lines in the output. 425 | 426 | \end{samepage} 427 | 428 | For the lazy writers among us, it is also possible to use a simple 429 | bullet list instead of a labelled list. Note that the listed elements 430 | will still be formatted as a labelled list. 431 | 432 | \begin{verbatim} 433 | :::ex 434 | - This is a lazy example. 435 | - ^# It should return letters at the start just as before. 436 | - ^% Also testing some unusual judgements. 437 | ::: 438 | \end{verbatim} 439 | 440 | \begin{samepage} 441 | 442 | \ex. \label{ex4.8} 443 | \a. This is a lazy example. 444 | \b. \#It should return letters at the start just as before. 445 | \b. \%Also testing some unusual judgements. 446 | 447 | \end{samepage} 448 | 449 | Just for testing: a single example with a judgement (which resulted in 450 | an error in earlier versions). 451 | 452 | \begin{verbatim} 453 | ::: ex 454 | ^* This traditionally signals ungrammaticality. 455 | ::: 456 | \end{verbatim} 457 | 458 | \begin{samepage} 459 | 460 | \ex. \label{ex4.9} 461 | *This traditionally signals ungrammaticality. 462 | 463 | \end{samepage} 464 | 465 | \subsection{Interlinear examples}\label{interlinear-examples} 466 | 467 | For interlinear examples with aligned source and gloss, the structure of 468 | a \texttt{lineblock} is used, starting the lines with a vertical line 469 | \texttt{\textbar{}}. There should always be four vertical lines (for 470 | header, source, gloss and translation, respectively), although the 471 | content after the first vertical line can be empty. The source and gloss 472 | lines are separated at spaces, and all parts are right-aligned. If you 473 | want to have a space that is not separated, you will have to `protect' 474 | the space, either by putting a backslash before the space, or by 475 | inserting a non-breaking space instead of a normal space (either type 476 | \texttt{\ } or insert an actual non-breaking space, i.e.~unicode 477 | character \texttt{U+00A0}). 478 | 479 | \begin{verbatim} 480 | :::ex 481 | | Dutch (Germanic) 482 | | Deze zin is in het nederlands. 483 | | DEM sentence AUX in DET dutch. 484 | | This sentence is dutch. 485 | ::: 486 | \end{verbatim} 487 | 488 | \begin{samepage} 489 | 490 | \ex. \label{ex4.10} Dutch (Germanic) 491 | \gll Deze zin is in het nederlands. \\ 492 | DEM sentence AUX in DET dutch. \\ 493 | \glt This sentence is dutch. 494 | 495 | \end{samepage} 496 | 497 | An attempt is made to format interlinear examples when the option 498 | \texttt{formatGloss=true} is added. This will: 499 | 500 | \begin{itemize} 501 | \tightlist 502 | \item 503 | remove formatting from the source and set everything in italics, 504 | \item 505 | remove formatting from the gloss and set sequences (\textgreater1) of 506 | capitals and numbers into small caps (note that the positioning of 507 | small caps on web pages is 508 | \href{https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align}{highly 509 | complex}), 510 | \item 511 | a tilde \texttt{\textasciitilde{}} between spaces in the gloss is 512 | treated as a shortcut for an empty gloss (internally, the sequence 513 | \texttt{space-tilde-space} is replaced by 514 | \texttt{space-space-nonBreakingSpace-space-space}), 515 | \item 516 | consistently put translations in single quotes, possibly removing 517 | other quotes. 518 | \end{itemize} 519 | 520 | \begin{verbatim} 521 | ::: {.ex formatGloss=true} 522 | | Dutch (Germanic) 523 | | Is deze zin in het nederlands ? 524 | | AUX DEM sentence in DET dutch Q 525 | | Is this sentence dutch? 526 | ::: 527 | \end{verbatim} 528 | 529 | \begin{samepage} 530 | 531 | \ex. \label{ex4.11} Dutch (Germanic) 532 | \gll \emph{Is} \emph{deze} \emph{zin} \emph{in} \emph{het} 533 | \emph{nederlands} \emph{?} \\ 534 | \textsc{aux} \textsc{dem} sentence in \textsc{det} dutch 535 | \textsc{q} \\ 536 | \glt `Is this sentence dutch?' 537 | 538 | \end{samepage} 539 | 540 | The results of such formatting will not always work, but it seems to be 541 | quite robust in my testing. The next example brings everything together: 542 | 543 | \begin{itemize} 544 | \tightlist 545 | \item 546 | a preamble, 547 | \item 548 | labels, both for single lines and for interlinear examples, 549 | \item 550 | interlinear examples start on a new line immediately after the 551 | letter-label, 552 | \item 553 | grammaticality judgements with proper alignment, 554 | \item 555 | when the header of an interlinear example is left out, everything is 556 | shifted up, 557 | \item 558 | The formatting of the interlinear is harmonised. 559 | \end{itemize} 560 | 561 | \begin{verbatim} 562 | ::: {.ex formatGloss=true samePage=false} 563 | Completely superfluous preamble, but it works ... 564 | 565 | a. 566 | | Dutch (Germanic) Note the grammaticality judgement! 567 | | ^^:–)^ Deze zin is (dit\ is test) nederlands. 568 | | DEM sentence AUX ~ dutch. 569 | | This sentence is dutch. 570 | 571 | b. 572 | | 573 | | Deze tweede zin heeft geen header. 574 | | DEM second sentence have.3SG.PRES no header. 575 | | This second sentence does not have a header. 576 | 577 | a. Mixing single line examples with interlinear examples. 578 | a. This is of course highly unusal. 579 | Just for this example, let's add some extra material in this example. 580 | ::: 581 | \end{verbatim} 582 | 583 | \ex. \label{ex4.12} Completely superfluous preamble, but it works 584 | \ldots{} 585 | \a. Dutch (Germanic) Note the grammaticality judgement! 586 | \gll \textsuperscript{:--)}\emph{Deze} \emph{zin} \emph{is} 587 | \emph{(dit~is~test)} \emph{nederlands.} \\ 588 | \textsc{dem} sentence \textsc{aux} ~ dutch. \\ 589 | \glt `This sentence is dutch.' 590 | \b. 591 | \gll \emph{Deze} \emph{tweede} \emph{zin} \emph{heeft} \emph{geen} 592 | \emph{header.} \\ 593 | \textsc{dem} second sentence have.\textsc{3sg}.\textsc{pres} no 594 | header. \\ 595 | \glt `This second sentence does not have a header.' 596 | \b. Mixing single line examples with interlinear examples. 597 | \b. This is of course highly unusal. Just for this example, let's add 598 | some extra material in this example. 599 | 600 | Also, as a quick workaround for showing multiple source lines without 601 | alignment with the glossing (e.g.~for phonetic or orthographic 602 | representations of the example), it is possible to use the header of 603 | interlinear example. For a line break in the header, use the double 604 | backslash \texttt{\textbackslash{}\textbackslash{}}, either inline or at 605 | the end of a line. When you type a header using multiple lines (as shown 606 | below), then subsequent lines have to start with space. For now, this 607 | only works in the header line. 608 | 609 | \begin{verbatim} 610 | ::: ex 611 | | Example with an multiline header \\ 612 | *can be used for orthographic representations*, \\ 613 | or phonetic transcription, \\ or for whatever you like 614 | | Dit is een lui voorbeeld=je 615 | | DEM COP DET lazy example=DIM 616 | | This is a lazy example. 617 | ::: 618 | \end{verbatim} 619 | 620 | \begin{samepage} 621 | 622 | \ex. \label{ex4.13} Example with an multiline header \\ 623 | \emph{can be used for orthographic representations}, \\ 624 | or phonetic transcription, \\ 625 | or for whatever you like 626 | \gll Dit is een lui voorbeeld=je \\ 627 | DEM COP DET lazy example=DIM \\ 628 | \glt This is a lazy example. 629 | 630 | \end{samepage} 631 | 632 | \subsection{Cross-referencing 633 | examples}\label{cross-referencing-examples} 634 | 635 | The examples are automatically numbered by \texttt{pandoc-ling}. 636 | Cross-references to examples inside a document can be made by using the 637 | \texttt{{[}@ID{]}} format (used by Pandoc for citations). When an 638 | example has an explicit identifier (like \texttt{\#test} in the next 639 | example), then a reference can be made to this example with 640 | \texttt{{[}@test{]}}, leading to (\ref{test}) when formatted (note that 641 | the formatting does not work on the github website. Please check the 642 | `docs' subdirectory). 643 | 644 | \begin{verbatim} 645 | ::: {#test .ex} 646 | This is a test 647 | ::: 648 | \end{verbatim} 649 | 650 | \begin{samepage} 651 | 652 | \ex. \label{test} 653 | This is a test 654 | 655 | \end{samepage} 656 | 657 | Inspired by the \texttt{linguex}-approach, you can also use the keywords 658 | \texttt{next} or \texttt{last} to refer to the next or the last example, 659 | e.g.~\texttt{{[}@last{]}} will be formatted as (\ref{test}). By doubling 660 | the first letters to \texttt{nnext} or \texttt{llast} reference to the 661 | next/last-but-one can be made. Actually, the number of starting letters 662 | can be repeated at will in \texttt{pandoc-ling}, so something like 663 | \texttt{{[}@llllllllast{]}} will also work. It will be formatted as 664 | (\ref{ex4.7}) after the processing of \texttt{pandoc-ling}. Needless to 665 | say that in such a situation an explicit identifier would be a better 666 | choice. 667 | 668 | Referring to sub-examples can be done by manually adding a suffix into 669 | the cross reference, simply separated from the identifier by a space. 670 | For example, \texttt{{[}@lllast~c{]}} will refer to the third 671 | sub-example of the last-but-two example. Formatted this will look like 672 | this: (\ref{ex4.13}\,c), smile! However, note that the ``c'' has to be 673 | manually determined. It is simply a literal suffix that will be copied 674 | into the cross-reference. Something like \texttt{{[}@last\ hA1l0{]}} 675 | will work also, leading to (\ref{test}\,hA1l0) when formatted (which is 676 | of course nonsensical). 677 | 678 | For exports that include attributes (like html), the examples have an 679 | explicit id of the form \texttt{exNUMBER} in which \texttt{NUMBER} is 680 | the actual number as given in the formatted output. This means that it 681 | is possible to refer to an example on any web-page by using the 682 | hash-mechanism to refer to a part of the web-page. For example 683 | \texttt{\#ex4.7} at can be used to refer to the seventh example in the 684 | html-output of this readme (try 685 | \href{https://cysouw.github.io/pandoc-ling/readme.html\#ex4.7}{this 686 | link}). The id in this example has a chapter number `4' because in the 687 | html conversion I have set the option \texttt{addChapterNumber} to 688 | \texttt{true}. (Note: when numbers restart the count in each chapter 689 | with the option \texttt{restartAtChapter}, then the id is of the form 690 | \texttt{exCHAPTER.NUMBER}. This is necessary to resolve clashing ids, as 691 | the same number might then be used in different chapters.) 692 | 693 | I propose to use these ids also to refer to examples in citations when 694 | writing scholarly papers, e.g.~(Cysouw 2021: \#ex7), independent of 695 | whether the links actually resolve. In principle, such citations could 696 | easily be resolved when online publications are properly prepared. The 697 | same proposal could also work for other parts of research papers, for 698 | example using tags like \texttt{\#sec,\ \#fig,\ \#tab,\ \#eq} (see the 699 | Pandoc filter 700 | \href{https://github.com/cysouw/crossref-adapt}{\texttt{crossref-adapt}}). 701 | To refer to paragraphs (which should replace page numbers in a future of 702 | adaptive design), I propose to use no tag, but directly add the number 703 | to the hash (see the Pandoc filter 704 | \href{https://github.com/cysouw/count-para}{\texttt{count-para}} for a 705 | practical mechanism to add such numbering). 706 | 707 | \subsection{\texorpdfstring{Options of 708 | \texttt{pandoc-ling}}{Options of pandoc-ling}}\label{options-of-pandoc-ling} 709 | 710 | \subsubsection{Global options}\label{global-options} 711 | 712 | The following global options are available with \texttt{pandoc-ling}. 713 | These can be added to the 714 | \href{https://pandoc.org/MANUAL.html\#metadata-blocks}{Pandoc metadata}. 715 | An example of such metadata can be found at the bottom of this 716 | \texttt{readme} in the form of a YAML-block. Pandoc allows for various 717 | methods to provide metadata (see the link above). 718 | 719 | \begin{itemize} 720 | \tightlist 721 | \item 722 | \textbf{\texttt{formatGloss}} (boolean, default \texttt{false}): 723 | should all interlinear examples be consistently formatted? If you use 724 | this option, you can simply use capital letters for abbreviations in 725 | the gloss, and they will be changed to small caps. The source line is 726 | set to italics, and the translations is put into single quotes. 727 | \item 728 | \textbf{\texttt{samePage}} (boolean, default \texttt{true}, only for 729 | Latex): should examples be kept together on the same page? Can also be 730 | overriden for individual examples by adding 731 | \texttt{\{.ex\ samePage=false\}} at the start of an example (cf.~below 732 | on \texttt{local\ options}). 733 | \item 734 | \textbf{\texttt{xrefSuffixSep}} (string, defaults to no-break-space): 735 | When cross references have a suffix, how should the separator be 736 | formatted? The defaults `no-break-space' is a safe options. I 737 | personally like a `narrow no-break space' better (Unicode 738 | \texttt{U+202F}), but this symbol does not work with all fonts, and 739 | might thus lead to errors. For Latex typesetting, all space-like 740 | symbols are converted to a Latex thin space 741 | \texttt{\textbackslash{},}. 742 | \item 743 | \textbf{\texttt{restartAtChapter}} (boolean, default \texttt{false}): 744 | should the counting restart for each chapter? 745 | 746 | \begin{itemize} 747 | \tightlist 748 | \item 749 | Actually, when \texttt{true} this setting will restart the counting 750 | at the highest heading level, which for various output formats can 751 | be set by the Pandoc option \texttt{top-level-division}. 752 | \item 753 | The id of each example will now be of the form 754 | \texttt{exCHAPTER.NUMBER} to resolve any clashes when the same 755 | number appears in different chapter. 756 | \item 757 | Depending on your Latex setup, an explicit entry 758 | \texttt{top-level-division:\ chapter} might be necessary in your 759 | metadata. 760 | \end{itemize} 761 | \item 762 | \textbf{\texttt{addChapterNumber}} (boolean, default \texttt{false}): 763 | should the chapter (= highest heading level) number be added to the 764 | number of the example? When setting this to \texttt{true} any setting 765 | of \texttt{restartAtChapter} will be ignored. In most Latex situations 766 | this only works in combination with a \texttt{documentclass:\ book}. 767 | \item 768 | \textbf{\texttt{latexPackage}} (one of: \texttt{linguex}, 769 | \texttt{gb4e}, \texttt{langsci-gb4e}, \texttt{expex}, default 770 | \texttt{linguex}): Various options for converting examples to Latex 771 | packages that typeset linguistic examples. None of the conversions 772 | works perfectly, though in should work in most normal situations 773 | (think 90\%-plus). It might be necessary to first convert to 774 | \texttt{Latex}, correct the output, and then typeset separately with a 775 | latex compiler like \texttt{xelatex}. Using the direct option insider 776 | Pandoc might also work in many situations. Export to 777 | \textbf{\texttt{beamer}} seems to work reasonably well with the 778 | \texttt{gb4e} package. All others have artefacts or errors. 779 | \end{itemize} 780 | 781 | \subsubsection{Local options}\label{local-options} 782 | 783 | Local options are options that can be set for each individual example. 784 | The \texttt{formatGloss} option can be used to have an individual 785 | example be formatted differently from the global setting. For example, 786 | when the global setting is \texttt{formatGloss:\ true} in the metadata, 787 | then adding \texttt{formatGloss=false} in the curly brackets of a 788 | specific example will block the formatting. This is especially useful 789 | when the automatic formatting does not give the desired result. 790 | 791 | If you want to add something else (not a linguistic example) in a 792 | numbered example, then there is the local option \texttt{noFormat=true}. 793 | An attempt will be made to try and do a reasonable layout. Multiple 794 | paragraphs will simply we taken as is, and the number will be put in 795 | front. In HTML the number will be centred. It is usable for an 796 | incidental mathematical formula. 797 | 798 | \begin{verbatim} 799 | ::: {.ex noFormat=true} 800 | $$\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}$$ 801 | ::: 802 | \end{verbatim} 803 | 804 | \begin{samepage} 805 | 806 | \ex. \label{ex4.15} 807 | \[\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}\]\\ 808 | 809 | 810 | \end{samepage} 811 | 812 | \subsection{\texorpdfstring{Issues with 813 | \texttt{pandoc-ling}}{Issues with pandoc-ling}}\label{issues-with-pandoc-ling} 814 | 815 | \begin{itemize} 816 | \tightlist 817 | \item 818 | Manually provided identifiers for examples should not be purely 819 | numerical (so do not use e.g.~\texttt{\#5789}). In some situation this 820 | interferes with the setting of the cross-references. 821 | \item 822 | Because the cross-references use the same structure as citations in 823 | Pandoc, the processing of citations (by \texttt{citeproc}) should be 824 | performed \textbf{after} the processing by \texttt{pandoc-ling}. 825 | Another Pandoc filter, 826 | \href{https://github.com/lierdakil/pandoc-crossref}{\texttt{pandoc-crossref}}, 827 | for numbering figures and other captions, also uses the same system. 828 | There seems to be no conflict between \texttt{pandoc-ling} and 829 | \texttt{pandoc-crossref}. 830 | \item 831 | Interlinear examples will will not wrap at the end of the page. There 832 | is no solution yet for longer examples that are longer than the size 833 | of the page. 834 | \item 835 | It is not (yet) possible to have more than one glossing line. 836 | \item 837 | When exporting to \texttt{docx} there is a problem because there are 838 | paragraphs inserted after tables, which adds space in lists with 839 | multiple interlinear examples (except when they have exactly the same 840 | number of columns). This is 841 | \href{https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2013_release/how-to-remove-extra-paragraph-after-table/995b3811-9f55-4df1-bbbc-9f672b1ad262}{by 842 | design}. The official solution is to set font-size to 1 for this 843 | paragraph inside MS Word. 844 | \item 845 | Multi-column cells are crucial for \texttt{pandoc-ling} to work 846 | properly. These are only introduced in new table format with Pandoc 847 | 2.10 (so older Pandoc version are not supported). Also note that these 848 | structures are not yet exported to all formats, e.g.~it will not be 849 | displayed correctly in \texttt{docx}. However, this is currently an 850 | area of active development 851 | \item 852 | \texttt{langsci-gb4e} is only available as part of the 853 | \href{https://ctan.org/pkg/langsci?lang=en}{\texttt{langsci} package}. 854 | You have to make it available to Pandoc, e.g.~by adding it into the 855 | same directory as the pandoc-ling.lua filter. I have added a recent 856 | version of \texttt{langsci-gb4e} here for convenience, but this one 857 | might be outdated at some time in the future. 858 | \item 859 | \texttt{beamer} output seems to work best with 860 | \texttt{latexPackage:\ gb4e}. 861 | \end{itemize} 862 | 863 | \subsection{A note on Latex 864 | conversion}\label{a-note-on-latex-conversion} 865 | 866 | Originally, I decided to write this filter as a two-pronged conversion, 867 | making a markdown version myself, but using a mapping to one of the many 868 | latex libraries for linguistics examples as a quick fix. I assumed that 869 | such a mapping would be the easy part. However, it turned out that the 870 | mapping to latex was much more difficult that I anticipated. Basically, 871 | it turned out that the `common denominator' that I was aiming for was 872 | not necessarily the `common denominator' provided by the latex packages. 873 | I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and 874 | expex) with growing dismay. This approach resulted in a first version. 875 | However, after this version was (more or less) finished, I realised that 876 | it would be better to first define the `common denominator' more clearly 877 | (as done here), and then implement this purely in Pandoc. From that 878 | basis I have then made attempts to map them to the various latex 879 | packages. 880 | 881 | \subsection{A note on implementation}\label{a-note-on-implementation} 882 | 883 | The basic structure of the examples are transformed into Pandoc tables. 884 | Tables are reasonably safe for converting in other formats. Care has 885 | been taken to add \texttt{classes} to all elements of the tables 886 | (e.g.~the preamble has the class \texttt{linguistic-example-preamble}). 887 | When exported formats are aware of these classes, they can be used to 888 | fine-tune the formatting. I have used a few such fine-tunings into the 889 | html output of this filter by adding a few CSS-style statements. The 890 | naming of the classes is quite transparent, using the form 891 | \texttt{linguistic-example-STRUCTURE}. 892 | 893 | The whole table is encapsulated in a \texttt{div} with class \texttt{ex} 894 | and an id of the form \texttt{exNUMBER}. This means that an example can 895 | be directly referred to in web-links by using the hash-mechanism. For 896 | example, adding \texttt{\#ex3} to the end of a link will immediately 897 | jump to this example in a browser. 898 | 899 | The current implementation is completely independent from the 900 | \href{https://pandoc.org/MANUAL.html\#numbered-example-lists}{Pandoc 901 | numbered examples implementation} and both can work side by side, like 902 | (2): 903 | 904 | \begin{enumerate} 905 | \def\labelenumi{(\arabic{enumi})} 906 | \item 907 | These are native Pandoc numbered examples 908 | \item 909 | They are independent of \texttt{pandoc-ling} but use the same output 910 | formatting in many default exports, like latex. 911 | \end{enumerate} 912 | 913 | However, in practice various output-formats of Pandoc (e.g.~latex) also 914 | use numbers in round brackets for these, so in practice it might be 915 | confusing to combine both. 916 | 917 | \end{document} 918 | -------------------------------------------------------------------------------- /docs/test.sh: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env bash 2 | 3 | # produces the readme in various formats 4 | # the filter processVerbatim.lua add the verbatim examples as real markdown 5 | 6 | # assumes Pandoc and a full Latex install 7 | # langsci-gb4e.sty is made available here 8 | 9 | # note that there are various errors in the output 10 | # they show current limitations 11 | 12 | # basic formats 13 | 14 | for format in html docx epub 15 | do 16 | pandoc ../readme.md -t markdown -L processVerbatim.lua -s --wrap=preserve | \ 17 | pandoc -t $format -o readme.$format -L ../pandoc-ling.lua -s -N --toc --mathml -F pandoc-crossref --wrap=preserve 18 | done 19 | 20 | # various latex variants, both tex and pdf 21 | 22 | for package in linguex gb4e langsci-gb4e 23 | do 24 | pandoc ../readme.md -t markdown -L processVerbatim.lua -s | \ 25 | pandoc -t latex -o readme_$package.tex -L ../pandoc-ling.lua -s -N --toc \ 26 | --metadata latexPackage="$package" 27 | 28 | pandoc ../readme.md -t markdown -L processVerbatim.lua -s | \ 29 | pandoc -o readme_$package.pdf -L ../pandoc-ling.lua -N --toc \ 30 | --metadata latexPackage="$package" --pdf-engine=xelatex 31 | done 32 | 33 | # special settings for expex, errors with chapternumbers 34 | 35 | pandoc ../readme.md -t markdown -L processVerbatim.lua -s | \ 36 | pandoc -t latex -o readme_expex.tex -L ../pandoc-ling.lua -s -N --toc \ 37 | --metadata latexPackage="expex" --metadata addChapterNumber="false" 38 | 39 | pandoc ../readme.md -t markdown -L processVerbatim.lua -s | \ 40 | pandoc -o readme_expex.pdf -L ../pandoc-ling.lua -N --toc \ 41 | --metadata latexPackage="expex" --metadata addChapterNumber="false" 42 | -------------------------------------------------------------------------------- /figure/ExampleStructure.pages: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/figure/ExampleStructure.pages -------------------------------------------------------------------------------- /figure/ExampleStructure.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/figure/ExampleStructure.pdf -------------------------------------------------------------------------------- /figure/ExampleStructure.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/figure/ExampleStructure.png -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | # pandoc-ling 2 | 3 | *Michael Cysouw* <> 4 | 5 | A Pandoc filter for linguistic examples 6 | 7 | tl;dr 8 | 9 | - Easily write linguistic examples including basic interlinear glossing. 10 | - Let numbering and cross-referencing be done for you. 11 | - Export to (almost) any format of your wishes for final polishing. 12 | - As an example, check out this readme in [HTML](https://cysouw.github.io/pandoc-ling/readme.html) or [Latex](https://cysouw.github.io/pandoc-ling/readme_gb4e.pdf). 13 | 14 | # Rationale 15 | 16 | In the field of linguistics there is an outspoken tradition to format example sentences in research papers in a very specific way. In the field, it is a perennial problem to get such example sentences to look just right. Within Latex, there are numerous packages to deal with this problem (e.g. covington, linguex, gb4e, expex, etc.). Depending on your needs, there is some Latex solution for almost everyone. However, these solutions in Latex are often cumbersome to type, and they are not portable to other formats. Specifically, transfer between latex, html, docx, odt or epub would actually be highly desirable. Such transfer is the hallmark of [Pandoc](https://pandoc.org), a tool by John MacFarlane that provides conversion between these (and many more) formats. 17 | 18 | Any such conversion between text-formats naturally never works perfectly: every text-format has specific features that are not transferable to other formats. A central goal of Pandoc (at least in my interpretation) is to define a set of shared concepts for text-structure (a 'common denominator' if you will, but surely not 'least'!) that can then be mapped to other formats. In many ways, Pandoc tries (again) to define a set of logical concepts for text structure ('semantic markup'), which can then be formatted by your favourite typesetter. As long as you stay inside the realm of this 'common denominator' (in practice that means Pandoc's extended version of Markdown/CommonMark), conversion works reasonably well (think 90%-plus). 19 | 20 | Building on John Gruber's [Markdown philosophy](https://daringfireball.net/projects/markdown/syntax), there is a strong urge here to learn to restrain oneself while writing, and try to restrict the number of layout-possibilities to a minimum. In this sense, with `pandoc-ling` I propose a Markdown-structure for linguistic examples that is simple, easy to type, easy to read, and portable through the Pandoc universe by way of an extension mechanism of Pandoc, called a 'Pandoc Lua Filter'. This extension will not magically allow you to write every linguistic example thinkable, but my guess is that in practice the present proposal covers the majority of situations in linguistic publications (think 90%-plus). As an example (and test case) I have included automatic conversions into various formats in this repository (chech them out in the directory `tests` to get an idea of the strengths and weaknesses of the current implementation). 21 | 22 | # The basic structure of a linguistic example 23 | 24 | Basically, a linguistic example consists of 6 possible building blocks, of which only the number and at least one example line are necessary. The space between the building blocks is kept as minimal as possible without becoming cramped. When (optional) building blocks are not included, then the other blocks shift left and up (only exception: a preamble without labels is not shifted left completely, but left-aligned with the example, not with the judgement). 25 | 26 | - **Number**: Running tally of all examples in the work, possibly restarting at chapters or other major headings. Typically between round brackets, possibly with a chapter number added before in long works, e.g. example (7.26). Aligned top-left, typically left-aligned to main text margin. 27 | - **Preamble**: Optional information about the content/kind of example. Aligned top-left: to the top with the number, to the left with the (optional) label. When there is no label, then preamble is aligned with the example, not with the judgment. 28 | - **Label**: Indices for sub-examples. Only present when there are more than one example grouped together inside one numbered entity. Typically these sub-example labels use latin letters followed by a full stop. They are left-aligned with the preamble, and each label is top-aligned with the top-line of the corresponding example (important for longer line-wrapped examples). 29 | - **Judgment**: Examples can optionally have grammaticality judgments, typically symbols like **?!* sometimes in superscript relative to the corresponding example. judgements are right-aligned to each other, typically with only minimal space to the left-aligned examples. 30 | - **Line example**: A minimal linguistic example has at least one line example, i.e. an utterance of interest. Building blocks in general shift left and up when other (optional) building blocks are not present. Minimally, this results in a number with one line example. 31 | - **Interlinear example**: A complex structure typically used for examples from languages unknown to most readers. Consist of three or four lines that are left-aligned: 32 | * **Header**: An optional header is typically used to display information about the language of the example, including literature references. When not present, then all other lines from the interlinear example shift upwards. 33 | * **Source**: The actual language utterance, often typeset in italics. This line is internally separated at spaces, and each sub-block is left-aligned with the corresponding sub-blocks of the gloss. 34 | * **Gloss**: Explanation of the meaning of the source, often using abbreviations in small caps. This line is internally separated at spaces, and each block is left-aligned with the block from source. 35 | * **Translation**: Free translation of the source, typically quoted. Not separated in blocks, but freely extending to the right. Left-aligned with the other lines from the interlinear example. 36 | 37 | ![The structure of a linguistic example.](figure/ExampleStructure.png) 38 | 39 | There are of course much more possibilities to extend the structure of a linguistic examples, like third or fourth subdivisions of labels (often using small roman numerals as a third level) or multiple glossing lines in the interlinear example. Also, the content of the header is sometimes found right-aligned to the right of the interlinear example (language into to the top, reference to the bottom). All such options are currently not supported by `pandoc-ling`. 40 | 41 | Under the hood, this structure is prepared by `pandoc-ling` as a table. Tables are reasonably well transcoded to different document formats. Specific layout considerations mostly have to be set manually. Alignment of the text should work in most exports. Some `CSS` styling is proposed by `pandoc-ling`, but can of course be overruled. For latex (and beamer) special output is prepared using various available latex packages (see options, below). 42 | 43 | # Introducing `pandoc-ling` 44 | 45 | ## Editing linguistic examples 46 | 47 | To include a linguistic example in Markdown `pandoc-ling` uses the `div` structure, which is indicated in Pandoc-Markdown by typing three colons at the start and three colons at the end. To indicate the `class` of this `div` the letters 'ex' (for 'example') should be added after the top colons (with or without space in between). This 'ex'-class is the signal for `pandoc-ling` to start processing such a `div`. The numbering of these examples will be inserted by `pandoc-ling`. 48 | 49 | Empty lines can be added inside the `div` for visual pleasure, as they mostly do not have an influence on the output. Exception: do *not* use empty lines between unlabelled line examples. Multiple lines of text can be used (without empty lines in between), but they will simply be interpreted as one sequential paragraph. 50 | 51 | ``` 52 | ::: ex 53 | This is the most basic structure of a linguistic example. 54 | ::: 55 | ``` 56 | 57 | Alternatively, the `class` can be put in curled brackets (and then a leading full stop is necessary before `ex`). Inside these brackets more attributes can be added (separated by space), for example an id, using a hash, or any attribute=value pairs that should apply to this example. Currently there is only one real attribute implemented (`formatGloss`), but in principle it is possible to add more attributes that can be used to fine-tune the typesetting of the example (see below for a description of such `local options`). 58 | 59 | ``` 60 | ::: {#id .ex formatGloss=false} 61 | 62 | This is a multi-line example. 63 | But that does not mean anything for the result 64 | All these lines are simply treated as one paragraph. 65 | They will become one example with one number. 66 | 67 | ::: 68 | ``` 69 | 70 | A preamble can be added by inserting an empty line between preamble and example. The same considerations about multiple text-lines apply. 71 | 72 | ``` 73 | :::ex 74 | Preamble 75 | 76 | This is an example with a preamble. 77 | ::: 78 | ``` 79 | 80 | Sub-examples with labels are entered by starting each sub-example with a small latin letter and a full stop. Empty lines between labels are allowed. Subsequent lines without labels are treated as one paragraph. Empty lines *not* followed by a label with a full stop will result in errors. 81 | 82 | ``` 83 | :::ex 84 | a. This is the first example. 85 | b. This is the second. 86 | a. The actual letters are not important, `pandoc-ling` will put them in order. 87 | 88 | e. Empty lines are allowed between labelled lines 89 | Subsequent lines are again treated as one sequential paragraph. 90 | ::: 91 | ``` 92 | 93 | A labelled list can be combined with a preamble. 94 | 95 | ``` 96 | :::ex 97 | Any nice description here 98 | 99 | a. one example sentence. 100 | b. two 101 | c. three 102 | ::: 103 | ``` 104 | 105 | Grammaticality judgements should be added before an example, and after an optional label, separated from both by spaces (though four spaces in a row should be avoided, that could lead to layout errors). To indicate that any sequence of symbols is a judgements, prepend the judgement with a caret `^`. Alignment will be figured out by `pandoc-ling`. 106 | 107 | ``` 108 | :::ex 109 | Throwing in a preamble for good measure 110 | 111 | a. ^* This traditionally signals ungrammaticality. 112 | b. ^? Question-marks indicate questionable grammaticality. 113 | c. ^^whynot?^ But in principle any sequence can be used (here even in superscript). 114 | d. However, such long sequences sometimes lead to undesirable effects in the layout. 115 | ::: 116 | ``` 117 | 118 | A minor detail is the alignment of a single example with a preamble and grammaticality judgements. In this case it looks better for the preamble to be left aligned with the example and not with the judgement. 119 | 120 | ``` 121 | :::ex 122 | Here is a special case with a preamble 123 | 124 | ^^???^ With a singly questionably example. 125 | Note the alignment! Especially with this very long example 126 | that should go over various lines in the output. 127 | ::: 128 | ``` 129 | 130 | For the lazy writers among us, it is also possible to use a simple bullet list instead of a labelled list. Note that the listed elements will still be formatted as a labelled list. 131 | 132 | ``` 133 | :::ex 134 | - This is a lazy example. 135 | - ^# It should return letters at the start just as before. 136 | - ^% Also testing some unusual judgements. 137 | ::: 138 | ``` 139 | 140 | Just for testing: a single example with a judgement (which resulted in an error in earlier versions). 141 | 142 | ``` 143 | ::: ex 144 | ^* This traditionally signals ungrammaticality. 145 | ::: 146 | ``` 147 | 148 | ## Interlinear examples 149 | 150 | For interlinear examples with aligned source and gloss, the structure of a `lineblock` is used, starting the lines with a vertical line `|`. There should always be four vertical lines (for header, source, gloss and translation, respectively), although the content after the first vertical line can be empty. The source and gloss lines are separated at spaces, and all parts are right-aligned. If you want to have a space that is not separated, you will have to 'protect' the space, either by putting a backslash before the space, or by inserting a non-breaking space instead of a normal space (either type ` ` or insert an actual non-breaking space, i.e. unicode character `U+00A0`). 151 | 152 | ``` 153 | :::ex 154 | | Dutch (Germanic) 155 | | Deze zin is in het nederlands. 156 | | DEM sentence AUX in DET dutch. 157 | | This sentence is dutch. 158 | ::: 159 | ``` 160 | 161 | An attempt is made to format interlinear examples when the option `formatGloss=true` is added. This will: 162 | 163 | - remove formatting from the source and set everything in italics, 164 | - remove formatting from the gloss and set sequences (>1) of capitals and numbers into small caps (note that the positioning of small caps on web pages is [highly complex](https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align)), 165 | - a tilde `~` between spaces in the gloss is treated as a shortcut for an empty gloss (internally, the sequence `space-tilde-space` is replaced by `space-space-nonBreakingSpace-space-space`), 166 | - consistently put translations in single quotes, possibly removing other quotes. 167 | 168 | ``` 169 | ::: {.ex formatGloss=true} 170 | | Dutch (Germanic) 171 | | Is deze zin in het nederlands ? 172 | | AUX DEM sentence in DET dutch Q 173 | | Is this sentence dutch? 174 | ::: 175 | ``` 176 | 177 | The results of such formatting will not always work, but it seems to be quite robust in my testing. The next example brings everything together: 178 | 179 | - a preamble, 180 | - labels, both for single lines and for interlinear examples, 181 | - interlinear examples start on a new line immediately after the letter-label, 182 | - grammaticality judgements with proper alignment, 183 | - when the header of an interlinear example is left out, everything is shifted up, 184 | - The formatting of the interlinear is harmonised. 185 | 186 | ``` 187 | ::: {.ex formatGloss=true samePage=false} 188 | Completely superfluous preamble, but it works ... 189 | 190 | a. 191 | | Dutch (Germanic) Note the grammaticality judgement! 192 | | ^^:–)^ Deze zin is (dit\ is test) nederlands. 193 | | DEM sentence AUX ~ dutch. 194 | | This sentence is dutch. 195 | 196 | b. 197 | | 198 | | Deze tweede zin heeft geen header. 199 | | DEM second sentence have.3SG.PRES no header. 200 | | This second sentence does not have a header. 201 | 202 | a. Mixing single line examples with interlinear examples. 203 | a. This is of course highly unusal. 204 | Just for this example, let's add some extra material in this example. 205 | ::: 206 | ``` 207 | 208 | Also, as a quick workaround for showing multiple source lines without alignment with the glossing (e.g. for phonetic or orthographic representations of the example), it is possible to use the header of interlinear example. For a line break in the header, use the double backslash `\\`, either inline or at the end of a line. When you type a header using multiple lines (as shown below), then subsequent lines have to start with space. For now, this only works in the header line. 209 | 210 | ``` 211 | ::: ex 212 | | Example with an multiline header \\ 213 | *can be used for orthographic representations*, \\ 214 | or phonetic transcription, \\ or for whatever you like 215 | | Dit is een lui voorbeeld=je 216 | | DEM COP DET lazy example=DIM 217 | | This is a lazy example. 218 | ::: 219 | ``` 220 | 221 | ## Cross-referencing examples 222 | 223 | The examples are automatically numbered by `pandoc-ling`. Cross-references to examples inside a document can be made by using the `[@ID]` format (used by Pandoc for citations). When an example has an explicit identifier (like `#test` in the next example), then a reference can be made to this example with `[@test]`, leading to [@test] when formatted (note that the formatting does not work on the github website. Please check the 'docs' subdirectory). 224 | 225 | ``` 226 | ::: {#test .ex} 227 | This is a test 228 | ::: 229 | ``` 230 | 231 | Inspired by the `linguex`-approach, you can also use the keywords `next` or `last` to refer to the next or the last example, e.g. `[@last]` will be formatted as [@last]. By doubling the first letters to `nnext` or `llast` reference to the next/last-but-one can be made. Actually, the number of starting letters can be repeated at will in `pandoc-ling`, so something like `[@llllllllast]` will also work. It will be formatted as [@llllllllast] after the processing of `pandoc-ling`. Needless to say that in such a situation an explicit identifier would be a better choice. 232 | 233 | Referring to sub-examples can be done by manually adding a suffix into the cross reference, simply separated from the identifier by a space. For example, `[@lllast c]` will refer to the third sub-example of the last-but-two example. Formatted this will look like this: [@llast c], smile! However, note that the "c" has to be manually determined. It is simply a literal suffix that will be copied into the cross-reference. Something like `[@last hA1l0]` will work also, leading to [@last hA1l0] when formatted (which is of course nonsensical). 234 | 235 | For exports that include attributes (like html), the examples have an explicit id of the form `exNUMBER` in which `NUMBER` is the actual number as given in the formatted output. This means that it is possible to refer to an example on any web-page by using the hash-mechanism to refer to a part of the web-page. For example `#ex4.7` at can be used to refer to the seventh example in the html-output of this readme (try [this link](https://cysouw.github.io/pandoc-ling/readme.html#ex4.7)). The id in this example has a chapter number '4' because in the html conversion I have set the option `addChapterNumber` to `true`. (Note: when numbers restart the count in each chapter with the option `restartAtChapter`, then the id is of the form `exCHAPTER.NUMBER`. This is necessary to resolve clashing ids, as the same number might then be used in different chapters.) 236 | 237 | I propose to use these ids also to refer to examples in citations when writing scholarly papers, e.g. (Cysouw 2021: #ex7), independent of whether the links actually resolve. In principle, such citations could easily be resolved when online publications are properly prepared. The same proposal could also work for other parts of research papers, for example using tags like `#sec, #fig, #tab, #eq` (see the Pandoc filter [`crossref-adapt`](https://github.com/cysouw/crossref-adapt)). To refer to paragraphs (which should replace page numbers in a future of adaptive design), I propose to use no tag, but directly add the number to the hash (see the Pandoc filter [`count-para`](https://github.com/cysouw/count-para) for a practical mechanism to add such numbering). 238 | 239 | ## Options of `pandoc-ling` 240 | 241 | ### Global options 242 | 243 | The following global options are available with `pandoc-ling`. These can be added to the [Pandoc metadata](https://pandoc.org/MANUAL.html#metadata-blocks). An example of such metadata can be found at the bottom of this `readme` in the form of a YAML-block. Pandoc allows for various methods to provide metadata (see the link above). 244 | 245 | - **`formatGloss`** (boolean, default `false`): should all interlinear examples be consistently formatted? If you use this option, you can simply use capital letters for abbreviations in the gloss, and they will be changed to small caps. The source line is set to italics, and the translations is put into single quotes. 246 | - **`samePage`** (boolean, default `true`, only for Latex): should examples be kept together on the same page? Can also be overriden for individual examples by adding `{.ex samePage=false}` at the start of an example (cf. below on `local options`). 247 | - **`xrefSuffixSep`** (string, defaults to no-break-space): When cross references have a suffix, how should the separator be formatted? The defaults 'no-break-space' is a safe options. I personally like a 'narrow no-break space' better (Unicode `U+202F`), but this symbol does not work with all fonts, and might thus lead to errors. For Latex typesetting, all space-like symbols are converted to a Latex thin space `\,`. 248 | - **`restartAtChapter`** (boolean, default `false`): should the counting restart for each chapter? 249 | * Actually, when `true` this setting will restart the counting at the highest heading level, which for various output formats can be set by the Pandoc option `top-level-division`. 250 | * The id of each example will now be of the form `exCHAPTER.NUMBER` to resolve any clashes when the same number appears in different chapter. 251 | * Depending on your Latex setup, an explicit entry `top-level-division: chapter` might be necessary in your metadata. 252 | - **`addChapterNumber`** (boolean, default `false`): should the chapter (= highest heading level) number be added to the number of the example? When setting this to `true` any setting of `restartAtChapter` will be ignored. In most Latex situations this only works in combination with a `documentclass: book`. 253 | - **`latexPackage`** (one of: `linguex`, `gb4e`, `langsci-gb4e`, `expex`, default `linguex`): Various options for converting examples to Latex packages that typeset linguistic examples. None of the conversions works perfectly, though in should work in most normal situations (think 90%-plus). It might be necessary to first convert to `Latex`, correct the output, and then typeset separately with a latex compiler like `xelatex`. Using the direct option insider Pandoc might also work in many situations. Export to **`beamer`** seems to work reasonably well with the `gb4e` package. All others have artefacts or errors. 254 | 255 | ### Local options 256 | 257 | Local options are options that can be set for each individual example. The `formatGloss` option can be used to have an individual example be formatted differently from the global setting. For example, when the global setting is `formatGloss: true` in the metadata, then adding `formatGloss=false` in the curly brackets of a specific example will block the formatting. This is especially useful when the automatic formatting does not give the desired result. 258 | 259 | If you want to add something else (not a linguistic example) in a numbered example, then there is the local option `noFormat=true`. An attempt will be made to try and do a reasonable layout. Multiple paragraphs will simply we taken as is, and the number will be put in front. In HTML the number will be centred. It is usable for an incidental mathematical formula. 260 | 261 | ``` 262 | ::: {.ex noFormat=true} 263 | $$\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}$$ 264 | ::: 265 | ``` 266 | 267 | ## Issues with `pandoc-ling` 268 | 269 | - Manually provided identifiers for examples should not be purely numerical (so do not use e.g. `#5789`). In some situation this interferes with the setting of the cross-references. 270 | - Because the cross-references use the same structure as citations in Pandoc, the processing of citations (by `citeproc`) should be performed **after** the processing by `pandoc-ling`. Another Pandoc filter, [`pandoc-crossref`](https://github.com/lierdakil/pandoc-crossref), for numbering figures and other captions, also uses the same system. There seems to be no conflict between `pandoc-ling` and `pandoc-crossref`. 271 | - Interlinear examples will will not wrap at the end of the page. There is no solution yet for longer examples that are longer than the size of the page. 272 | - It is not (yet) possible to have more than one glossing line. 273 | - When exporting to `docx` there is a problem because there are paragraphs inserted after tables, which adds space in lists with multiple interlinear examples (except when they have exactly the same number of columns). This is [by design](https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2013_release/how-to-remove-extra-paragraph-after-table/995b3811-9f55-4df1-bbbc-9f672b1ad262). The official solution is to set font-size to 1 for this paragraph inside MS Word. 274 | - Multi-column cells are crucial for `pandoc-ling` to work properly. These are only introduced in new table format with Pandoc 2.10 (so older Pandoc version are not supported). Also note that these structures are not yet exported to all formats, e.g. it will not be displayed correctly in `docx`. However, this is currently an area of active development 275 | - `langsci-gb4e` is only available as part of the [`langsci` package](https://ctan.org/pkg/langsci?lang=en). You have to make it available to Pandoc, e.g. by adding it into the same directory as the pandoc-ling.lua filter. I have added a recent version of `langsci-gb4e` here for convenience, but this one might be outdated at some time in the future. 276 | - `beamer` output seems to work best with `latexPackage: gb4e`. 277 | 278 | ## A note on Latex conversion 279 | 280 | Originally, I decided to write this filter as a two-pronged conversion, making a markdown version myself, but using a mapping to one of the many latex libraries for linguistics examples as a quick fix. I assumed that such a mapping would be the easy part. However, it turned out that the mapping to latex was much more difficult that I anticipated. Basically, it turned out that the 'common denominator' that I was aiming for was not necessarily the 'common denominator' provided by the latex packages. I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and expex) with growing dismay. This approach resulted in a first version. However, after this version was (more or less) finished, I realised that it would be better to first define the 'common denominator' more clearly (as done here), and then implement this purely in Pandoc. From that basis I have then made attempts to map them to the various latex packages. 281 | 282 | ## A note on implementation 283 | 284 | The basic structure of the examples are transformed into Pandoc tables. Tables are reasonably safe for converting in other formats. Care has been taken to add `classes` to all elements of the tables (e.g. the preamble has the class `linguistic-example-preamble`). When exported formats are aware of these classes, they can be used to fine-tune the formatting. I have used a few such fine-tunings into the html output of this filter by adding a few CSS-style statements. The naming of the classes is quite transparent, using the form `linguistic-example-STRUCTURE`. 285 | 286 | The whole table is encapsulated in a `div` with class `ex` and an id of the form `exNUMBER`. This means that an example can be directly referred to in web-links by using the hash-mechanism. For example, adding `#ex3` to the end of a link will immediately jump to this example in a browser. 287 | 288 | The current implementation is completely independent from the [Pandoc numbered examples implementation](https://pandoc.org/MANUAL.html#numbered-example-lists) and both can work side by side, like (@second): 289 | 290 | (@) These are native Pandoc numbered examples 291 | 292 | (@second) They are independent of `pandoc-ling` but use the same output formatting in many default exports, like latex. 293 | 294 | However, in practice various output-formats of Pandoc (e.g. latex) also use numbers in round brackets for these, so in practice it might be confusing to combine both. 295 | 296 | --- 297 | author: Michael Cysouw 298 | title: Using pandoc-ling 299 | addChapterNumber: true 300 | ... 301 | --------------------------------------------------------------------------------