├── .gitignore
├── .vscode
    └── settings.json
├── LICENSE
├── NEWS.md
├── docs
    ├── figure
    │   └── ExampleStructure.png
    ├── langsci-gb4e.sty
    ├── pandoc-ling-old.lua
    ├── processVerbatim.lua
    ├── readme.docx
    ├── readme.epub
    ├── readme.html
    ├── readme_expex.pdf
    ├── readme_expex.tex
    ├── readme_gb4e.pdf
    ├── readme_gb4e.tex
    ├── readme_langsci-gb4e.pdf
    ├── readme_langsci-gb4e.tex
    ├── readme_linguex.pdf
    ├── readme_linguex.tex
    └── test.sh
├── figure
    ├── ExampleStructure.pages
    ├── ExampleStructure.pdf
    └── ExampleStructure.png
├── pandoc-ling.lua
└── readme.md


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | 


--------------------------------------------------------------------------------
/.vscode/settings.json:
--------------------------------------------------------------------------------
1 | {
2 |   "spellright.language": [],
3 |   "spellright.documentTypes": [
4 |     "plaintext"
5 |   ]
6 | }


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 | Creative Commons Legal Code
  2 | 
  3 | CC0 1.0 Universal
  4 | 
  5 |     CREATIVE COMMONS CORPORATION IS NOT A LAW FIRM AND DOES NOT PROVIDE
  6 |     LEGAL SERVICES. DISTRIBUTION OF THIS DOCUMENT DOES NOT CREATE AN
  7 |     ATTORNEY-CLIENT RELATIONSHIP. CREATIVE COMMONS PROVIDES THIS
  8 |     INFORMATION ON AN "AS-IS" BASIS. CREATIVE COMMONS MAKES NO WARRANTIES
  9 |     REGARDING THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS
 10 |     PROVIDED HEREUNDER, AND DISCLAIMS LIABILITY FOR DAMAGES RESULTING FROM
 11 |     THE USE OF THIS DOCUMENT OR THE INFORMATION OR WORKS PROVIDED
 12 |     HEREUNDER.
 13 | 
 14 | Statement of Purpose
 15 | 
 16 | The laws of most jurisdictions throughout the world automatically confer
 17 | exclusive Copyright and Related Rights (defined below) upon the creator
 18 | and subsequent owner(s) (each and all, an "owner") of an original work of
 19 | authorship and/or a database (each, a "Work").
 20 | 
 21 | Certain owners wish to permanently relinquish those rights to a Work for
 22 | the purpose of contributing to a commons of creative, cultural and
 23 | scientific works ("Commons") that the public can reliably and without fear
 24 | of later claims of infringement build upon, modify, incorporate in other
 25 | works, reuse and redistribute as freely as possible in any form whatsoever
 26 | and for any purposes, including without limitation commercial purposes.
 27 | These owners may contribute to the Commons to promote the ideal of a free
 28 | culture and the further production of creative, cultural and scientific
 29 | works, or to gain reputation or greater distribution for their Work in
 30 | part through the use and efforts of others.
 31 | 
 32 | For these and/or other purposes and motivations, and without any
 33 | expectation of additional consideration or compensation, the person
 34 | associating CC0 with a Work (the "Affirmer"), to the extent that he or she
 35 | is an owner of Copyright and Related Rights in the Work, voluntarily
 36 | elects to apply CC0 to the Work and publicly distribute the Work under its
 37 | terms, with knowledge of his or her Copyright and Related Rights in the
 38 | Work and the meaning and intended legal effect of CC0 on those rights.
 39 | 
 40 | 1. Copyright and Related Rights. A Work made available under CC0 may be
 41 | protected by copyright and related or neighboring rights ("Copyright and
 42 | Related Rights"). Copyright and Related Rights include, but are not
 43 | limited to, the following:
 44 | 
 45 |   i. the right to reproduce, adapt, distribute, perform, display,
 46 |      communicate, and translate a Work;
 47 |  ii. moral rights retained by the original author(s) and/or performer(s);
 48 | iii. publicity and privacy rights pertaining to a person's image or
 49 |      likeness depicted in a Work;
 50 |  iv. rights protecting against unfair competition in regards to a Work,
 51 |      subject to the limitations in paragraph 4(a), below;
 52 |   v. rights protecting the extraction, dissemination, use and reuse of data
 53 |      in a Work;
 54 |  vi. database rights (such as those arising under Directive 96/9/EC of the
 55 |      European Parliament and of the Council of 11 March 1996 on the legal
 56 |      protection of databases, and under any national implementation
 57 |      thereof, including any amended or successor version of such
 58 |      directive); and
 59 | vii. other similar, equivalent or corresponding rights throughout the
 60 |      world based on applicable law or treaty, and any national
 61 |      implementations thereof.
 62 | 
 63 | 2. Waiver. To the greatest extent permitted by, but not in contravention
 64 | of, applicable law, Affirmer hereby overtly, fully, permanently,
 65 | irrevocably and unconditionally waives, abandons, and surrenders all of
 66 | Affirmer's Copyright and Related Rights and associated claims and causes
 67 | of action, whether now known or unknown (including existing as well as
 68 | future claims and causes of action), in the Work (i) in all territories
 69 | worldwide, (ii) for the maximum duration provided by applicable law or
 70 | treaty (including future time extensions), (iii) in any current or future
 71 | medium and for any number of copies, and (iv) for any purpose whatsoever,
 72 | including without limitation commercial, advertising or promotional
 73 | purposes (the "Waiver"). Affirmer makes the Waiver for the benefit of each
 74 | member of the public at large and to the detriment of Affirmer's heirs and
 75 | successors, fully intending that such Waiver shall not be subject to
 76 | revocation, rescission, cancellation, termination, or any other legal or
 77 | equitable action to disrupt the quiet enjoyment of the Work by the public
 78 | as contemplated by Affirmer's express Statement of Purpose.
 79 | 
 80 | 3. Public License Fallback. Should any part of the Waiver for any reason
 81 | be judged legally invalid or ineffective under applicable law, then the
 82 | Waiver shall be preserved to the maximum extent permitted taking into
 83 | account Affirmer's express Statement of Purpose. In addition, to the
 84 | extent the Waiver is so judged Affirmer hereby grants to each affected
 85 | person a royalty-free, non transferable, non sublicensable, non exclusive,
 86 | irrevocable and unconditional license to exercise Affirmer's Copyright and
 87 | Related Rights in the Work (i) in all territories worldwide, (ii) for the
 88 | maximum duration provided by applicable law or treaty (including future
 89 | time extensions), (iii) in any current or future medium and for any number
 90 | of copies, and (iv) for any purpose whatsoever, including without
 91 | limitation commercial, advertising or promotional purposes (the
 92 | "License"). The License shall be deemed effective as of the date CC0 was
 93 | applied by Affirmer to the Work. Should any part of the License for any
 94 | reason be judged legally invalid or ineffective under applicable law, such
 95 | partial invalidity or ineffectiveness shall not invalidate the remainder
 96 | of the License, and in such case Affirmer hereby affirms that he or she
 97 | will not (i) exercise any of his or her remaining Copyright and Related
 98 | Rights in the Work or (ii) assert any associated claims and causes of
 99 | action with respect to the Work, in either case contrary to Affirmer's
100 | express Statement of Purpose.
101 | 
102 | 4. Limitations and Disclaimers.
103 | 
104 |  a. No trademark or patent rights held by Affirmer are waived, abandoned,
105 |     surrendered, licensed or otherwise affected by this document.
106 |  b. Affirmer offers the Work as-is and makes no representations or
107 |     warranties of any kind concerning the Work, express, implied,
108 |     statutory or otherwise, including without limitation warranties of
109 |     title, merchantability, fitness for a particular purpose, non
110 |     infringement, or the absence of latent or other defects, accuracy, or
111 |     the present or absence of errors, whether or not discoverable, all to
112 |     the greatest extent permissible under applicable law.
113 |  c. Affirmer disclaims responsibility for clearing rights of other persons
114 |     that may apply to the Work or any use thereof, including without
115 |     limitation any person's Copyright and Related Rights in the Work.
116 |     Further, Affirmer disclaims responsibility for obtaining any necessary
117 |     consents, permissions or other rights required for any use of the
118 |     Work.
119 |  d. Affirmer understands and acknowledges that Creative Commons is not a
120 |     party to this document and has no duty or obligation with respect to
121 |     this CC0 or use of the Work.
122 | 


--------------------------------------------------------------------------------
/NEWS.md:
--------------------------------------------------------------------------------
 1 | # pandoc-ling 1.9 (upcoming)
 2 | 
 3 | # pandoc-ling 1.8
 4 | 
 5 | - allow for multiline headers (using `\\` as linebreak)
 6 | - bugfix for preambles with interlinear examples (thx to fmatter #21)
 7 | - bugfix for capitalisation of glossing (thx to bulbulistan #22)
 8 | - bugfix to keep header and preamble always with rest, also without samePage (#23)
 9 | - bugfix for other separators in glossing (thx to bulbulistan and 7o7omootsqwn #27)
10 | 
11 | # pandoc-ling 1.7
12 | 
13 | ## changes
14 | 
15 | - allow for no space between number and suffix in latex export
16 | - bugfix latex-linguex (thx speechchemistry)
17 | 
18 | # pandoc-ling 1.6
19 | 
20 | ## changes
21 | 
22 | - adding option `samePage` to determine whether examples are kept together on a page in Latex typesetting (#pr14, thanks to CLRafaelR)
23 | 
24 | ## bugs
25 | 
26 | - allow for one-letter elements in glossing (#pr13, thanks to CLRafaelR)
27 | 
28 | # pandoc-ling 1.5
29 | 
30 | ## changes
31 | 
32 | - removing the colon from the internal ID to harmonize the cross-document referencing
33 | 
34 | ## bugs
35 | 
36 | - handling of `header-includes` improved: additional user-provided statements are just passed through.
37 | - various internal changes to match the updated functioning of lua inside pandoc 2.12 and newer.
38 | 
39 | # pandoc-ling 1.4
40 | 
41 | ## changes
42 | 
43 | - adding option to use bullet lists for example entry. Will still be transformed into labelled list.
44 | 
45 | ## bugs
46 | 
47 | - fixed gb4e error prevented by adding `\noautomath` to preamble
48 | - fixed gb4e error with cross-referencing because of wrong placement of `\label` statement
49 | - fixed latex error when using special symbols in judgements (closes #4, thx @CLRafaelR)
50 | 
51 | # pandoc-ling 1.3.1
52 | 
53 | ## bugs
54 | 
55 | - fixed bug with not-appearing header of interlinear in HTML output
56 | - fixed bug with judgements in single-line examples (closes #3, thx @CLRafaelR)
57 | 
58 | # pandoc-ling 1.3
59 | 
60 | ## changes
61 | 
62 | - changed the ID system to "#ex:" for easier cross-document linking to examples
63 | - change @next and @last to lowercase for easier typing
64 | - added 'samepage' enclosures in latex so that examples do not break across pages
65 | 
66 | ## bugs
67 | 
68 | - resolved clash with pandoc-crossref
69 | - corrected wrong counting with unnumbered headings
70 | - fixed counter reset and \exewidth with gb4e
71 | - fixed bugs with linguex export
72 | 
73 | # pandoc-ling 1.2
74 | 
75 | ## changes
76 | 
77 | - adding experimental beamer as possible export. It uses the same routines as basic latex export. needs more testing.
78 | 
79 | ## bugs
80 | 
81 | - fixed problem with parsing local options
82 | 
83 | # pandoc-ling 1.1
84 | 
85 | ## changes
86 | 
87 | - adding experimental noFormat option to simply use the raw content of the example as a complete div to a single table cell and set number to vertically centred. For latex export, all lines are simply squashed together. needs more testing.
88 | 
89 | # pandoc-ling 1.0
90 | 
91 | First complete working version
92 | 


--------------------------------------------------------------------------------
/docs/figure/ExampleStructure.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/docs/figure/ExampleStructure.png


--------------------------------------------------------------------------------
/docs/langsci-gb4e.sty:
--------------------------------------------------------------------------------
  1 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
  2 | %%      File: langsci-gb4e.sty
  3 | %%    Author: Language Science Press (http://langsci-press.org)
  4 | %%      Date: 2020-03-17 13:12 UTC
  5 | %%   Purpose: This file contains an adapted version of the gb4e package
  6 | %%            for typetting linguistic examples. It also includes
  7 | %%            adapted versions of the cgloss and jambox packages
  8 | %%  Language: LaTeX
  9 | %%   Licence:
 10 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 11 | 
 12 | \ProvidesPackage{langsci-gb4e}[2020/01/01]
 13 | 
 14 | \usepackage{etoolbox}
 15 | 
 16 | \newtoggle{cgloss}
 17 | \toggletrue{cgloss}
 18 | \newtoggle{jambox}
 19 | \toggletrue{jambox}
 20 | \DeclareOption{nocgloss}{\togglefalse{cgloss}}
 21 | \DeclareOption{nojambox}{\togglefalse{jambox}}
 22 | \DeclareOption*{\PackageWarning{examplepackage}{Unknown option ‘\CurrentOption’}}
 23 | \ProcessOptions\relax
 24 | 
 25 | % \def\gbVersion{4e}
 26 | 
 27 | %%%%%%%%%%%%%%%%%%%%%%%%
 28 | %  Format of examples: %
 29 | %%%%%%%%%%%%%%%%%%%%%%%%
 30 | % \begin{exe} or \exbegin
 31 | % <examples>                           (arab.)
 32 | % \begin{xlist} or \xlist
 33 | % <subexamples>                        (1st embedding, alph.)
 34 | % \begin{xlisti} or \xlisti
 35 | % <subsubexamples>                     (2st embedding, rom.)
 36 | % \end{xlisti}  or \endxlisti
 37 | % <more examples>
 38 | % \end{xlist} or \endxlist
 39 | % <still more examples>
 40 | % \end{exe} or \exend
 41 | %
 42 | % Other sublist-styles: xlistA (Alph.), xlistI (Rom.), xlistn (arab)
 43 | %
 44 | % \ex                               (produces Number)
 45 | % \ex <sentence>                    (numbered example)
 46 | % \ex[jdgmt]{sentence}              (numbered example with judgement)
 47 | %
 48 | % \exi{ident}                      (produces identifier)
 49 | % \exi{ident} <sentence>           (example numbered with identifier)
 50 | % \exi{ident}[jdgmt]{sentence}     (dito with judgement)
 51 | %                      (\exr, \exp and \sn are defined in terms of \exi)
 52 | %
 53 | % \exr{label}                       (produces cross-referenced Num.)
 54 | % \exr{label} <sentence>            (cross-referenced example)
 55 | % \exr{label}[jdgmt]{sentence}      (cross-referenced example with judgement)
 56 | %
 57 | % \exp{label}                       (same as
 58 | % \exp{label} <sentence>                     \exr but
 59 | % \exp{label}[jdgmt]{sentence}                        with prime)
 60 | %
 61 | % \sn <sentence>                    (unnumbered example)
 62 | % \sn[jdgmt]{sentence}              (unnumbered example with judgement)
 63 | %
 64 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 65 | % For my own lazyness (HANDLE WITH CARE---this works only
 66 | %                                 in boringly normal cases.... ):
 67 | %
 68 | % \ea                works like \begin{exe}\ex or \begin{xlist}\ex,
 69 | %                            depending on context
 70 | % \z                 works like \end{exe} or \end{xlist}, dep on context
 71 | %
 72 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 73 | 
 74 | %CGLOSS META
 75 | % Modified version of cgloss4e.sty.  Hacked and renamed cgloss.sty
 76 | % by Alexis Dimitriadis (alexis@babel.ling.upenn.edu). Integrated into
 77 | % langsci-gb4e.sty by Sebastian Nordhoff
 78 | % EnD CGLOSS META
 79 | 
 80 | 
 81 | 
 82 | \@ifundefined{new@fontshape}{\def\reset@font{}\let\mathrm\rm\let\mathit\mit}{}
 83 | 
 84 | 
 85 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 86 | %                                                                      %%
 87 | %        Font Specifications                                           %%
 88 | %                                                                      %%
 89 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 90 | 
 91 | % Define commands for fonts to be used:
 92 | %
 93 | % 1) regular
 94 | % a. example line
 95 | \newcommand{\exfont}{\normalsize\upshape}
 96 | % b. glossing line
 97 | \newcommand{\glossfont}{\normalsize\upshape}
 98 | % c. translation font
 99 | \newcommand{\transfont}{\normalsize\upshape}
100 | % d. example number
101 | \newcommand{\exnrfont}{\exfont\upshape}
102 | %
103 | % 2) in footnote
104 | % a. example line
105 | \newcommand{\fnexfont}{\footnotesize\upshape}
106 | % b. glossing line
107 | \newcommand{\fnglossfont}{\footnotesize\upshape}
108 | % c. translation font
109 | \newcommand{\fntransfont}{\footnotesize\upshape}
110 | % d. example number
111 | \newcommand{\fnexnrfont}{\fnexfont\upshape}
112 | 
113 | \newcommand{\examplesroman}{
114 |   \let\eachwordone=\upshape
115 |   \exfont{\upshape}
116 | }
117 | \newcommand{\examplesitalics}{
118 |   \let\eachwordone=\itshape
119 |   \exfont{\itshape}
120 | }
121 | 
122 | 
123 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
124 | %%                                                                     %%
125 | %%  Macros for examples, roughly following Linguistic Inquiry style.   %%
126 | %%                                                                     %%
127 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
128 | 
129 | \def\qlist{\begin{list}{\Alph{xnum}.}{\usecounter{xnum}%
130 | \setlength{\rightmargin}{\leftmargin}}}
131 | \def\endqlist{\end{list}}
132 | 
133 | \newif\if@noftnote\@noftnotetrue
134 | \newif\if@xrec\@xrecfalse
135 | \@definecounter{fnx}
136 | 
137 | % set a flag that we are in footnotes now and change the size of example fonts
138 | \let\oldFootnotetext\@footnotetext
139 | 
140 | \renewcommand\@footnotetext[1]{%
141 |    \@noftnotefalse\setcounter{fnx}{0}%
142 | \begingroup%
143 | \let\exfont\fnexfont%
144 | \let\glossfont\fnglossfont%
145 | \let\transfont\fntransfont%
146 | \let\exnrfont\fnexnrfont%
147 |  	\oldFootnotetext{#1}%
148 | \endgroup%
149 | \@noftnotetrue}
150 | 
151 | 
152 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
153 | %%                                                                   %%
154 | %% 			counters				     %%
155 | %%                                                                   %%
156 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
157 | 
158 | % start counters with 1
159 | \newcount\@xnumdepth \@xnumdepth = 0
160 | 
161 | % define four levels of indentation
162 | \@definecounter{xnumi}
163 | \@definecounter{xnumii}
164 | \@definecounter{xnumiii}
165 | \@definecounter{xnumiv}
166 | 
167 | 
168 | % use (1) on page, but (i) in footnotes
169 | \def\thexnumi
170 | {\if@noftnote%
171 | \@arabic\@xsi{xnumi}%
172 | \else%
173 | \@roman\@xsi{xnumi}%
174 | \fi%
175 | }
176 | \def\thexnumii{\@xsii{xnumii}}
177 | \def\thexnumiii{\@xsiii{xnumiii}}
178 | \def\thexnumiv{\@xsiv{xnumiv}}
179 | \def\p@xnumii{\thexnumi%
180 | \if@noftnote%
181 | \else%
182 | .%
183 | \fi}
184 | \def\p@xnumiii{\thexnumi\thexnumii-}
185 | \def\p@xnumiv{\thexnumi\thexnumii-\thexnumiii-}
186 | 
187 | \def\xs@default#1{\csname @@xs#1\endcsname}
188 | \def\@@xsi{\let\@xsi\arabic}
189 | \def\@@xsii{\let\@xsii\alph}
190 | \def\@@xsiii{\let\@xsiii\roman}
191 | \def\@@xsiv{\let\@xsi\arabic}
192 | 
193 | \@definecounter{rxnumi}
194 | \@definecounter{rxnumii}
195 | \@definecounter{rxnumiii}
196 | \@definecounter{rxnumiv}
197 | 
198 | \def\save@counters{%
199 | \setcounter{rxnumi}{\value{xnumi}}%
200 | \setcounter{rxnumii}{\value{xnumii}}%
201 | \setcounter{rxnumiii}{\value{xnumiii}}%
202 | \setcounter{rxnumiv}{\value{xnumiv}}}%
203 | 
204 | \def\reset@counters{%
205 | \setcounter{xnumi}{\value{rxnumi}}%
206 | \setcounter{xnumii}{\value{rxnumii}}%
207 | \setcounter{xnumiii}{\value{rxnumiii}}%
208 | \setcounter{xnumiv}{\value{rxnumiv}}}%
209 | 
210 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
211 | %%                                                                   %%
212 | %% 			widths			                     %%
213 | %%                                                                   %%
214 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
215 | 
216 | % Control the width of example identifiers
217 | \def\exewidth#1{\def\@exwidth{#1}}
218 | 
219 | \newcommand{\twodigitexamples}{\exewidth{(23)}}
220 | \newcommand{\threedigitexamples}{\exewidth{(234)}}
221 | \newcommand{\fourdigitexamples}{\exewidth{(2345)}}
222 | 
223 | \def\gblabelsep#1{\def\@gblabelsep{#1}}
224 | \gblabelsep{1em}
225 | 
226 | \def\subexsep#1{\def\@subexsep{#1}}
227 | \subexsep{1.5ex}
228 | 
229 | % set initial sizes of example number and judgement sizes
230 | \exewidth{\exnrfont (35)}
231 | 
232 | % how much should examples in footnotes be indented?
233 | \newlength{\footexindent}
234 | \setlength{\footexindent}{0pt}
235 | 
236 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
237 | %%                                                                   %%
238 | %% 			example lists				     %%
239 | %%                                                                   %%
240 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
241 | 
242 | \def\exe{%
243 |     %\ifnum\value{equation}>9 \exewidth{(23)}\else\fi%
244 |     %inserted by LangSci, for large example numbers
245 |     \ifnum\value{equation}>98 \exewidth{(235)}\else\fi%
246 |     \@ifnextchar [{\@exe}{\@exe[\@exwidth]}}
247 | 
248 | \def\@exe[#1]{\ifnum \@xnumdepth >0%
249 |                  \if@xrec\@exrecwarn\fi%
250 |                  \if@noftnote\@exrecwarn\fi%
251 |                  \@xnumdepth0\@listdepth0\@xrectrue%
252 |                  \save@counters%
253 |               \fi%
254 |                  \advance\@xnumdepth \@ne \@@xsi%
255 |                  \if@noftnote%
256 |                         \begin{list}{(\thexnumi)}%
257 |                         {\usecounter{xnumi}\@subex{#1}{\@gblabelsep}{0em}%
258 |                         \setcounter{xnumi}{\value{equation}}
259 |                         \nopagebreak}%
260 |                  \else%
261 |                         \begin{list}{(\roman{xnumi})}%
262 |                         {\usecounter{xnumi}\@subex{(iiv)}{\@gblabelsep}{\footexindent}%
263 |                         \setcounter{xnumi}{\value{fnx}}}%
264 |                  \fi}
265 | 
266 | 
267 | \def\endexe{\if@noftnote\setcounter{equation}{\value{xnumi}}%
268 |                    \else\setcounter{fnx}{\value{xnumi}}%
269 |                         \reset@counters\@xrecfalse\fi\end{list}}
270 | 
271 | \def\@exrecwarn{\typeout{*** Recursion on "exe"---your
272 |                 example numbering will probably be screwed up!}}
273 | 
274 | \def\xlist{\@ifnextchar [{\@xlist{}}{\@xlist{}[iv.]}}
275 | \def\xlista{\@ifnextchar [{\@xlist{\alph}}{\@xlist{\alph}[m.]}}
276 | \def\xlistabr{\@ifnextchar [{\@xlist{(\alph)}}{\@xlist{(\alph)}[m.]}}
277 | \def\xlisti{\@ifnextchar [{\@xlist{\roman}}{\@xlist{\roman}[iv.]}}
278 | \def\xlistn{\@ifnextchar [{\@xlist{\arabic}}{\@xlist{\arabic}[9.]}}
279 | \def\xlistA{\@ifnextchar [{\@xlist{\Alph}}{\@xlist{\Alph}[M.]}}
280 | \def\xlistI{\@ifnextchar [{\@xlist{\Roman}}{\@xlist{\Roman}[IV.]}}
281 | 
282 | \def\endxlist{\end{list}}
283 | \def\endxlista{\end{list}}
284 | \def\endxlistabr{\end{list}}
285 | \def\endxlistn{\end{list}}
286 | \def\endxlistA{\end{list}}
287 | \def\endxlistI{\end{list}}
288 | \def\endxlisti{\end{list}}
289 | 
290 | 
291 | 
292 | 
293 | %%% a generic sublist-styler
294 | \def\@xlist#1[#2]{\ifnum \@xnumdepth >3 \@toodeep\else%
295 |     \advance\@xnumdepth \@ne%
296 |     \edef\@xnumctr{xnum\romannumeral\the\@xnumdepth}%
297 |     \def\@bla{#1}
298 |     \ifx\@bla\empty\xs@default{\romannumeral\the\@xnumdepth}\else%
299 |       \expandafter\let\csname @xs\romannumeral\the\@xnumdepth\endcsname#1\fi
300 |     \begin{list}{\csname the\@xnumctr\endcsname.}%
301 |                 {\usecounter{\@xnumctr}\@subex{#2}{\@subexsep}{0em}}\fi}
302 | 
303 | %% Added third argument to be able to add some more space to leftmargin
304 | %% for footnotes that have bigger indentation.
305 | %% St. M�. 07.01.2007
306 | \def\@subex#1#2#3{\settowidth{\labelwidth}{#1}\itemindent\z@\labelsep#2%
307 |          \ifnum\the\@xnumdepth=1%
308 |            \topsep 7\p@ plus2\p@ minus3\p@\itemsep3\p@ plus2\p@\else%
309 |            \topsep1.5\p@ plus\p@\itemsep1.5\p@ plus\p@\fi%
310 |          \parsep\p@ plus.5\p@ minus.5\p@%
311 |          \leftmargin\labelwidth\advance\leftmargin#2\advance\leftmargin#3\relax}
312 | 
313 | %%% the example-items
314 | \def\ex{\@ifnextchar [{\@ex}{\item}}
315 | \def\@ex[#1]#2{\item\@exj[#1]{#2}}
316 | \def\@exj[#1]#2{\@exjbg{#1} #2 \end{list}\nopagebreak}
317 | \def\exi#1{\item[#1]\@ifnextchar [{\@exj}{}}
318 | \def\judgewidth#1{\def\@jwidth{#1}}
319 | \judgewidth{??}
320 | \judgewidth{*} % if wider judgements are needed, enlarge within papers
321 | \def\@exjbg#1{\begin{list}{#1}{\@subex{\@jwidth}{.5ex}{0em}}\item}
322 | \def\exr#1{\exi{{(\ref{#1})}}}
323 | \def\exp#1{\exi{{(\ref{#1}$'$)}}}
324 | \def\sn{\exi{}}
325 | 
326 | 
327 | \def\ex{\@ifnextchar [{\exnrfont\@ex}{\exnrfont\item\exfont}}
328 | \def\@ex[#1]#2{\item\@exj[#1]{\exfont#2}}
329 | 
330 | \def\@exjbg#1{\begin{list}{{\exnrfont#1}}{\@subex{\@jwidth}{.5ex}{0em}}\item}
331 | \def\exi#1{\item[{\exnrfont#1}]\@ifnextchar [{\exnrfont\@exj}{}}
332 | 
333 | \def\ea{\ifnum\@xnumdepth=0\begin{exe}\else\begin{xlist}[iv.]\fi\raggedright\ex}
334 | \def\eal{\begin{exe}\exnrfont\ex\begin{xlist}[iv.]\raggedright}
335 | \def\eas{\ifnum\@xnumdepth=0\begin{exe}[(34)]\else\begin{xlist}[iv.]\fi\ex\begin{tabular}[t]{@{}p{\linewidth}@{}}}
336 | 
337 | % allow hyphenation and justification
338 | \def\eanoraggedright{\ifnum\@xnumdepth=0\begin{exe}\else\begin{xlist}[iv.]\fi\ex}
339 | \def\ealnoraggedright{\begin{exe}\exnrfont\ex\begin{xlist}[iv.]}
340 | 
341 | 
342 | 
343 | \def\z{\ifnum\@xnumdepth=1\end{exe}\else\end{xlist}\fi}
344 | \def\zl{\end{xlist}\end{exe}}
345 | \def\zs{\end{tabular}\ifnum\@xnumdepth=1\end{exe}\else\end{xlist}\fi}
346 | \def\zllast{\end{xlist}\end{exe}\removelastskip}
347 | 
348 | % Control vertical space for examples in footnotes
349 | \def\zlast{\z\vspace{-\baselineskip}}
350 | \def\eafirst{\vspace{-1.5\baselineskip}\ea}
351 | 
352 | %%%%%% control the alignment of exampleno. and (picture-)example
353 | %%%%%%         (by Lex Holt <lex@cogsci.ed.ac.uk>).
354 | \def\attop#1{\leavevmode\vtop{\strut\vskip-\baselineskip\vbox{#1}}}
355 | \def\atcenter#1{$\vcenter{#1}$}
356 | %%%%%%
357 | 
358 | 
359 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
360 | %%                                                                   %%
361 | %%      several examples in one line                                 %%
362 | %%                                                                   %%
363 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
364 | 
365 | \newcommand{\xbox}[2]{\noindent\parbox[t]{#1}{#2}\noindent}
366 | \newcommand{\nobreakbox}[1]{\xbox{\linewidth}{#1}}
367 | \newcommand{\xref}[1]{(\ref{#1})}
368 | \newcommand{\xxref}[2]{(\ref{#1}--\ref{#2})}
369 | 
370 | 
371 | \iftoggle{cgloss}{
372 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
373 | %%                                                                   %%
374 | %%     CGLOSS starts here                                            %%
375 | %%                                                                   %%
376 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
377 | 
378 | 
379 | \let\@gsingle=1
380 | \def\singlegloss{\let\@gsingle=1}
381 | \def\nosinglegloss{\let\@gsingle=0}
382 | \@ifundefined{new@fontshape}%
383 |    {\def\@selfnt{\ifx\@currsize\normalsize\@normalsize\else\@currsize\fi}}
384 |    {\def\@selfnt{\selectfont}}
385 | 
386 | \def\gll%                  % Introduces 2-line text-and-gloss.
387 |    {\raggedright%
388 |      \bgroup %\begin{flushleft}
389 |      \ifx\@gsingle1%
390 | 	 \def\baselinestretch{1}\@selfnt\fi
391 |     \bgroup
392 |     \twosent
393 | }
394 | 
395 | \def\glll%                  % Introduces 3-line text-and-gloss.
396 |    {\bgroup %\begin{flushleft}
397 |      \ifx\@gsingle1%
398 | 	\def\baselinestretch{1}\@selfnt\fi
399 |     \bgroup
400 |     \threesent
401 | }
402 | 
403 | 
404 | \def\gllll%                  % Introduces 4-line text-and-gloss.
405 |    {\bgroup %\begin{flushleft}
406 |      \ifx\@gsingle1%
407 | 	\def\baselinestretch{1}\@selfnt\fi
408 |     \bgroup
409 |     \foursent
410 | }
411 | 
412 | 
413 | \def\glllll%                  % Introduces 5-line text-and-gloss.
414 |    {\bgroup %\begin{flushleft}
415 |      \ifx\@gsingle1%
416 | 	\def\baselinestretch{1}\@selfnt\fi
417 |     \bgroup
418 |     \fivesent
419 | }
420 | 
421 | 
422 | \def\gllllll%                  % Introduces 6-line text-and-gloss.
423 |    {\bgroup %\begin{flushleft}
424 |      \ifx\@gsingle1%
425 | 	\def\baselinestretch{1}\@selfnt\fi
426 |     \bgroup
427 |     \sixsent
428 | }
429 | 
430 | 
431 | \def\glllllll%                  % Introduces 7-line text-and-gloss.
432 |    {\bgroup %\begin{flushleft}
433 |      \ifx\@gsingle1%
434 | 	\def\baselinestretch{1}\@selfnt\fi
435 |     \bgroup
436 |     \sevensent
437 | }
438 | 
439 | 
440 | \def\gllllllll%                  % Introduces 8-line text-and-gloss.
441 |    {\bgroup %\begin{flushleft}
442 |      \ifx\@gsingle1%
443 | 	\def\baselinestretch{1}\@selfnt\fi
444 |     \bgroup
445 |     \eightsent
446 | }
447 | 
448 | 
449 | \newlength{\gltoffset}
450 | \setlength{\gltoffset}{.17\baselineskip}
451 | \newcommand{\nogltOffset}{\setlength{\gltoffset}{0pt}}
452 | \newcommand{\resetgltOffset}{\setlength{\gltoffset}{.17\baselineskip}}
453 | \def\glt{\ifhmode\\*[\gltoffset]\else\nobreak\vskip\gltoffset\nobreak\fi\transfont}
454 | 
455 | 
456 | % Introduces a translation
457 | \let\trans\glt
458 | 
459 | % \def\gln{\relax}
460 | %       % Ends the gloss environment.
461 | 
462 | % The following TeX code is adapted, with permission, from:
463 | % gloss.tex: Macros for vertically aligning words in consecutive sentences.
464 | % Version: 1.0  release: 26 November 1990
465 | % Copyright (c) 1991 Marcel R. van der Goot (marcel@cs.caltech.edu).
466 | 
467 | \newbox\lineone % boxes with words from first line
468 | \newbox\linetwo
469 | \newbox\linethree
470 | \newbox\linefour
471 | \newbox\linefive
472 | \newbox\linesix
473 | \newbox\lineseven
474 | \newbox\lineeight
475 | \newbox\wordone % a word from the first line (hbox)
476 | \newbox\wordtwo
477 | \newbox\wordthree
478 | \newbox\wordfour
479 | \newbox\wordfive
480 | \newbox\wordsix
481 | \newbox\wordseven
482 | \newbox\wordeight
483 | \newbox\gline % the constructed double line (hbox)
484 | \newskip\glossglue % extra glue between glossed pairs or tuples
485 | \glossglue = 0pt plus 2pt minus 1pt % allow stretch/shrink between words
486 | %\glossglue = 5pt plus 2pt minus 1pt % allow stretch/shrink between words
487 | \newif\ifnotdone
488 | 
489 | \@ifundefined{eachwordone}{\let\eachwordone=\upshape}{\relax}
490 | \@ifundefined{eachwordtwo}{\let\eachwordtwo=\upshape}{\relax}
491 | \@ifundefined{eachwordthree}{\let\eachwordthree=\upshape}{\relax}
492 | \@ifundefined{eachwordfour}{\let\eachwordfour=\upshape}{\relax}
493 | \@ifundefined{eachwordfive}{\let\eachwordfive=\upshape}{\relax}
494 | \@ifundefined{eachwordsix}{\let\eachwordsix=\upshape}{\relax}
495 | \@ifundefined{eachwordseven}{\let\eachwordseven=\upshape}{\relax}
496 | \@ifundefined{eachwordeight}{\let\eachwordeight=\upshape}{\relax}
497 | 
498 | \def\lastword#1#2#3% #1 = \each, #2 = line box, #3 = word box
499 |    {\setbox#2=\vbox{\unvbox#2%
500 |                     \global\setbox#3=\lastbox
501 |                    }%
502 |     \ifvoid#3\global\setbox#3=\hbox{#1\strut{} }\fi
503 |         % extra space following \strut in case #1 needs a space
504 |    }
505 | 
506 | \def\testdone
507 |    {\ifdim\ht\lineone=0pt
508 |          \ifdim\ht\linetwo=0pt \notdonefalse % tricky space after pt
509 |          \else\notdonetrue
510 |          \fi
511 |     \else\notdonetrue
512 |     \fi
513 |    }
514 | 
515 | \gdef\getwords(#1,#2)#3 #4\\% #1=linebox, #2=\each, #3=1st word, #4=remainder
516 |    {\setbox#1=\vbox{\hbox{#2\strut#3{} }% adds space, the {} is needed for CJK otherwise the space
517 |                                         % would be ignored
518 |                     \unvbox#1%
519 |                    }%
520 |     \def\more{#4}%
521 |     \ifx\more\empty\let\more=\donewords
522 |     \else\let\more=\getwords
523 |     \fi
524 |     \more(#1,#2)#4\\%
525 |    }
526 | 
527 | \gdef\donewords(#1,#2)\\{}%
528 | 
529 | \gdef\twosent#1\\ #2\\{% #1 = first line, #2 = second line
530 |     \getwords(\lineone,\eachwordone)#1 \\%
531 |     \getwords(\linetwo,\eachwordtwo)#2 \\%
532 |     \loop\lastword{\eachwordone}{\lineone}{\wordone}%
533 |          \lastword{\eachwordtwo}{\linetwo}{\wordtwo}%
534 |          \global\setbox\gline=\hbox{\unhbox\gline
535 |                                     \hskip\glossglue
536 |                                     \vtop{\box\wordone   % vtop was vbox
537 |                                           \nointerlineskip
538 |                                           \box\wordtwo
539 |                                          }%
540 |                                    }%
541 |          \testdone
542 |          \ifnotdone
543 |     \repeat
544 |     \egroup % matches \bgroup in \gloss
545 |    \gl@stop}
546 | 
547 | \gdef\threesent#1\\ #2\\ #3\\{% #1 = first line, #2 = second line, #3 = third
548 |     \getwords(\lineone,\eachwordone)#1 \\%
549 |     \getwords(\linetwo,\eachwordtwo)#2 \\%
550 |     \getwords(\linethree,\eachwordthree)#3 \\%
551 |     \loop\lastword{\eachwordone}{\lineone}{\wordone}%
552 |          \lastword{\eachwordtwo}{\linetwo}{\wordtwo}%
553 |          \lastword{\eachwordthree}{\linethree}{\wordthree}%
554 |          \global\setbox\gline=\hbox{\unhbox\gline
555 |                                     \hskip\glossglue
556 |                                     \vtop{\box\wordone   % vtop was vbox
557 |                                           \nointerlineskip
558 |                                           \box\wordtwo
559 |                                           \nointerlineskip
560 |                                           \box\wordthree
561 |                                          }%
562 |                                    }%
563 |          \testdone
564 |          \ifnotdone
565 |     \repeat
566 |     \egroup % matches \bgroup in \gloss
567 |    \gl@stop}
568 | 
569 | 
570 | 
571 | \gdef\foursent#1\\ #2\\ #3\\ #4\\{% #1 = first line, #2 = second line, #3 = third etc
572 |     \getwords(\lineone,\eachwordone)#1 \\%
573 |     \getwords(\linetwo,\eachwordtwo)#2 \\%
574 |     \getwords(\linethree,\eachwordthree)#3 \\%
575 |     \getwords(\linefour,\eachwordfour)#4 \\%
576 |     \loop\lastword{\eachwordone}{\lineone}{\wordone}%
577 |          \lastword{\eachwordtwo}{\linetwo}{\wordtwo}%
578 |          \lastword{\eachwordthree}{\linethree}{\wordthree}%
579 |          \lastword{\eachwordfour}{\linefour}{\wordfour}%
580 |          \global\setbox\gline=\hbox{\unhbox\gline
581 |                                     \hskip\glossglue
582 |                                     \vtop{\box\wordone   % vtop was vbox
583 |                                           \nointerlineskip
584 |                                           \box\wordtwo
585 |                                           \nointerlineskip
586 |                                           \box\wordthree
587 |                                           \nointerlineskip
588 |                                           \box\wordfour
589 |                                          }%
590 |                                    }%
591 |          \testdone
592 |          \ifnotdone
593 |     \repeat
594 |     \egroup % matches \bgroup in \gloss
595 |    \gl@stop}
596 | 
597 | 
598 | 
599 | \gdef\fivesent#1\\ #2\\ #3\\ #4\\ #5\\{% #1 = first line, #2 = second line, #3 = third etc
600 |     \getwords(\lineone,\eachwordone)#1 \\%
601 |     \getwords(\linetwo,\eachwordtwo)#2 \\%
602 |     \getwords(\linethree,\eachwordthree)#3 \\%
603 |     \getwords(\linefour,\eachwordfour)#4 \\%
604 |     \getwords(\linefive,\eachwordfive)#5 \\%
605 |     \loop\lastword{\eachwordone}{\lineone}{\wordone}%
606 |          \lastword{\eachwordtwo}{\linetwo}{\wordtwo}%
607 |          \lastword{\eachwordthree}{\linethree}{\wordthree}%
608 |          \lastword{\eachwordfour}{\linefour}{\wordfour}%
609 |          \lastword{\eachwordfive}{\linefive}{\wordfive}%
610 |          \global\setbox\gline=\hbox{\unhbox\gline
611 |                                     \hskip\glossglue
612 |                                     \vtop{\box\wordone   % vtop was vbox
613 |                                           \nointerlineskip
614 |                                           \box\wordtwo
615 |                                           \nointerlineskip
616 |                                           \box\wordthree
617 |                                           \nointerlineskip
618 |                                           \box\wordfour
619 |                                           \nointerlineskip
620 |                                           \box\wordfive
621 |                                          }%
622 |                                    }%
623 |          \testdone
624 |          \ifnotdone
625 |     \repeat
626 |     \egroup % matches \bgroup in \gloss
627 |    \gl@stop}
628 | 
629 | 
630 | 
631 | \gdef\sixsent#1\\ #2\\ #3\\ #4\\ #5\\ #6\\{% #1 = first line, #2 = second line, #3 = third etc
632 |     \getwords(\lineone,\eachwordone)#1 \\%
633 |     \getwords(\linetwo,\eachwordtwo)#2 \\%
634 |     \getwords(\linethree,\eachwordthree)#3 \\%
635 |     \getwords(\linefour,\eachwordfour)#4 \\%
636 |     \getwords(\linefive,\eachwordfive)#5 \\%
637 |     \getwords(\linesix,\eachwordsix)#6 \\%
638 |     \loop\lastword{\eachwordone}{\lineone}{\wordone}%
639 |          \lastword{\eachwordtwo}{\linetwo}{\wordtwo}%
640 |          \lastword{\eachwordthree}{\linethree}{\wordthree}%
641 |          \lastword{\eachwordfour}{\linefour}{\wordfour}%
642 |          \lastword{\eachwordfive}{\linefive}{\wordfive}%
643 |          \lastword{\eachwordsix}{\linesix}{\wordsix}%
644 |          \global\setbox\gline=\hbox{\unhbox\gline
645 |                                     \hskip\glossglue
646 |                                     \vtop{\box\wordone   % vtop was vbox
647 |                                           \nointerlineskip
648 |                                           \box\wordtwo
649 |                                           \nointerlineskip
650 |                                           \box\wordthree
651 |                                           \nointerlineskip
652 |                                           \box\wordfour
653 |                                           \nointerlineskip
654 |                                           \box\wordfive
655 |                                           \nointerlineskip
656 |                                           \box\wordsix
657 |                                          }%
658 |                                    }%
659 |          \testdone
660 |          \ifnotdone
661 |     \repeat
662 |     \egroup % matches \bgroup in \gloss
663 |    \gl@stop}
664 | 
665 | 
666 | 
667 | \gdef\sevensent#1\\ #2\\ #3\\ #4\\ #5\\ #6\\ #7\\{% #1 = first line, #2 = second line, #3 = third etc
668 |     \getwords(\lineone,\eachwordone)#1 \\%
669 |     \getwords(\linetwo,\eachwordtwo)#2 \\%
670 |     \getwords(\linethree,\eachwordthree)#3 \\%
671 |     \getwords(\linefour,\eachwordfour)#4 \\%
672 |     \getwords(\linefive,\eachwordfive)#5 \\%
673 |     \getwords(\linesix,\eachwordsix)#6 \\%
674 |     \getwords(\lineseven,\eachwordseven)#7 \\%
675 |     \loop\lastword{\eachwordone}{\lineone}{\wordone}%
676 |          \lastword{\eachwordtwo}{\linetwo}{\wordtwo}%
677 |          \lastword{\eachwordthree}{\linethree}{\wordthree}%
678 |          \lastword{\eachwordfour}{\linefour}{\wordfour}%
679 |          \lastword{\eachwordfive}{\linefive}{\wordfive}%
680 |          \lastword{\eachwordsix}{\linesix}{\wordsix}%
681 |          \lastword{\eachwordseven}{\lineseven}{\wordseven}%
682 |          \global\setbox\gline=\hbox{\unhbox\gline
683 |                                     \hskip\glossglue
684 |                                     \vtop{\box\wordone   % vtop was vbox
685 |                                           \nointerlineskip
686 |                                           \box\wordtwo
687 |                                           \nointerlineskip
688 |                                           \box\wordthree
689 |                                           \nointerlineskip
690 |                                           \box\wordfour
691 |                                           \nointerlineskip
692 |                                           \box\wordfive
693 |                                           \nointerlineskip
694 |                                           \box\wordsix
695 |                                           \nointerlineskip
696 |                                           \box\wordseven
697 |                                          }%
698 |                                    }%
699 |          \testdone
700 |          \ifnotdone
701 |     \repeat
702 |     \egroup % matches \bgroup in \gloss
703 |    \gl@stop}
704 | 
705 | 
706 | 
707 | \gdef\eightsent#1\\ #2\\ #3\\ #4\\ #5\\ #6\\ #7\\ #8\\{% #1 = first line, #2 = second line, #3 = third etc
708 |     \getwords(\lineone,\eachwordone)#1 \\%
709 |     \getwords(\linetwo,\eachwordtwo)#2 \\%
710 |     \getwords(\linethree,\eachwordthree)#3 \\%
711 |     \getwords(\linefour,\eachwordfour)#4 \\%
712 |     \getwords(\linefive,\eachwordfive)#5 \\%
713 |     \getwords(\linesix,\eachwordsix)#6 \\%
714 |     \getwords(\lineseven,\eachwordseven)#7 \\%
715 |     \getwords(\lineeight,\eachwordeight)#8 \\%
716 |     \loop\lastword{\eachwordone}{\lineone}{\wordone}%
717 |          \lastword{\eachwordtwo}{\linetwo}{\wordtwo}%
718 |          \lastword{\eachwordthree}{\linethree}{\wordthree}%
719 |          \lastword{\eachwordfour}{\linefour}{\wordfour}%
720 |          \lastword{\eachwordfive}{\linefive}{\wordfive}%
721 |          \lastword{\eachwordsix}{\linesix}{\wordsix}%
722 |          \lastword{\eachwordseven}{\lineseven}{\wordseven}%
723 |          \lastword{\eachwordeight}{\lineeight}{\wordeight}%
724 |          \global\setbox\gline=\hbox{\unhbox\gline
725 |                                     \hskip\glossglue
726 |                                     \vtop{\box\wordone   % vtop was vbox
727 |                                           \nointerlineskip
728 |                                           \box\wordtwo
729 |                                           \nointerlineskip
730 |                                           \box\wordthree
731 |                                           \nointerlineskip
732 |                                           \box\wordfour
733 |                                           \nointerlineskip
734 |                                           \box\wordfive
735 |                                           \nointerlineskip
736 |                                           \box\wordsix
737 |                                           \nointerlineskip
738 |                                           \box\wordseven
739 |                                           \nointerlineskip
740 |                                           \box\wordeight
741 |                                          }%
742 |                                    }%
743 |          \testdone
744 |          \ifnotdone
745 |     \repeat
746 |     \egroup % matches \bgroup in \gloss
747 |    \gl@stop}
748 | 
749 | %\def\gl@stop{{\hskip -\glossglue}\unhbox\gline\end{flushleft}}
750 | 
751 | % \leavevmode puts us back in horizontal mode, so that a \\ will work
752 | \def\gl@stop{{\hskip -\glossglue}\unhbox\gline\leavevmode \egroup}
753 | }{} %end toggle cgloss
754 | 
755 | \iftoggle{jambox}{
756 | %BeGIN Jambox
757 | %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
758 | %
759 | % Alexis Dimitriadis
760 | %
761 | % This is version 0.3 (informal release, Nov. 2003).
762 | %
763 | % Line up material a fixed distance from the right margin.  For annotating
764 | % example sentences, usually with a short note in parentheses.
765 | % May overflow to the left or right, or line up on the next line as necessary.
766 | %
767 | % \jambox[width]{text}	Align 'text' starting 'width' distance from the
768 | %			right margin (default \the\jamwidth).
769 | % \jam(something)	Align a note delimited by parentheses (which are
770 | %			retained).  No optional argument.
771 | % \jambox*{text}        Set \jamwidth to the width of 'text', then align it.
772 | %			(\jamwidth stays set for the rest of the environment).
773 | %
774 | % Notes:
775 | %
776 | % Distance from the right margin can be set to an explicit amount, or to the
777 | % width of some piece of text, as follows:
778 | %
779 | % \jamwidth=2in\relax      Or
780 | % \settowidth\jamwidth {(``annotation'')}
781 | %
782 | % \jamwidth is locally scoped, so it can be set globally or inside an example
783 | % environment.
784 | %
785 | % BUG: Not compatible with ragged-right mode.
786 | %
787 | % Incompatibilities: Not useful with the vanilla cgloss4e.sty, which ends
788 | % glossed lines prematurely.
789 | % I do have a suitably modified file, cgloss.sty. With it you can do the
790 | % following:
791 | % \gll To kimeno. \\
792 | %      the text \\ \jambox{(Greek)}
793 | % \trans `The text.'
794 | 
795 | 
796 | \newdimen\jamwidth \jamwidth=2in
797 | \def\jambox{\@ifnextchar[{\@jambox}
798 | 	       {\@ifnextchar*{\@jamsetbox}{\@jambox[\the\jamwidth]}}}
799 | 
800 | % Set width AND display the argument.
801 | % The star is read and ignored; the argument #1 is boxed, used to set
802 | % \jamwidth, then passed to \@jambox (which also puts it in \@tempboxa!)
803 | %
804 | \def\@jamsetbox*#1{\setbox\@tempboxa\hbox{#1}\jamwidth=\wd\@tempboxa
805 |   \@jambox[\the\jamwidth]{\box\@tempboxa}}
806 | 
807 | \def\@jambox[#1]#2{{\setbox\@tempboxa\hbox {#2}%
808 |   \ifdim \wd\@tempboxa<#1\relax % if label fits in the alloted space:
809 |     \@tempdima=#1\relax \advance\@tempdima by-\wd\@tempboxa % remaining \hspace
810 |     \unskip\nobreak\hfill\penalty250 % break line here if necessary
811 |     \hskip 1.2em minus 1.2em 	  % used when the line extends past the margin
812 |     \hbox{}\nobreak\hfill\box\@tempboxa\nobreak
813 |     \hskip\@tempdima minus \@tempdima\hbox{}%
814 |   \else  % the label is too wide: just right-align it
815 |     \hfill\penalty50\hbox{}\nobreak\hfill\box\@tempboxa
816 |   \fi
817 |   % suppress closing glue:
818 |   \parfillskip=0pt \finalhyphendemerits=0 \par}}
819 | % The penalty enables a break, taken only if the line cannot fit.
820 | % The \hbox{} ensures the next line does not begin with \hfill, which would
821 | % be discarded if initial.
822 | % (\vadjust inserts an empty element at the beginning of the next line, so
823 | % that COULD be used instead of \hbox{}).
824 | % Algorithm adapted from The TeXBook.
825 | %
826 | % The closing \par could be a problem if there is a \parskip...
827 | }{}
828 | \endinput
829 | 


--------------------------------------------------------------------------------
/docs/pandoc-ling-old.lua:
--------------------------------------------------------------------------------
  1 | --[[
  2 | pandoc-linguex: make interlinear glossing with pandoc
  3 | 
  4 | Copyright © 2021 Michael Cysouw <cysouw@mac.com>
  5 | 
  6 | Permission to use, copy, modify, and/or distribute this software for any
  7 | purpose with or without fee is hereby granted, provided that the above
  8 | copyright notice and this permission notice appear in all copies.
  9 | 
 10 | THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
 11 | WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
 12 | MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
 13 | ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 14 | WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
 15 | ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
 16 | OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
 17 | ]]
 18 | 
 19 | PANDOC_VERSION:must_be_at_least '2.10'
 20 | 
 21 | ---------------------
 22 | -- 'global' variables
 23 | ---------------------
 24 | 
 25 | local counter = 0 -- actual numbering of examples
 26 | local chapter = 1 -- numbering of chapters (for unknown reasons this starts at 1, not 0)
 27 | local counterInChapter = 0 -- counter reset for each chapter
 28 | local indexEx = {} -- global lookup for example IDs
 29 | local orderInText = 0 -- order of references for resolving "Next"-style references
 30 | local indexRef = {} -- key/value: order in text = refID/exID
 31 | local rev_indexRef = {} -- "reversed" indexRef, i.e. key/value: refID/exID = order-number in text
 32 | 
 33 | ------------------------------------
 34 | -- User Settings with default values
 35 | ------------------------------------
 36 | 
 37 | local formatGloss = false -- format interlinear examples
 38 | local xrefSuffixSep = " " -- &nbsp; separator to be inserted after number in example references
 39 | local restartAtChapter = false -- restart numbering at highest header without adding local chapternumbers
 40 | local addChapterNumber = false -- add chapternumbers to counting and restart at highest header
 41 | local latexPackage = "linguex"
 42 | local topDivision = "section"
 43 | 
 44 | function getUserSettings (meta)
 45 |   if meta.formatGloss ~= nil then
 46 |     formatGloss = meta.formatGloss
 47 |   end
 48 |   if meta.xrefSuffixSep ~= nil then
 49 |     xrefSuffixSep = pandoc.utils.stringify(meta.xrefSuffixSep)
 50 |   end
 51 |   if meta.restartAtChapter ~= nil then
 52 |     restartAtChapter = meta.restartAtChapter
 53 |   end
 54 |   if meta.addChapterNumber ~= nil then
 55 |     addChapterNumber = meta.addChapterNumber
 56 |   end
 57 |   if meta.latexPackage ~= nil then
 58 |     latexPackage = pandoc.utils.stringify(meta.latexPackage)
 59 |   end
 60 |   if meta["top-level-division"] ~= nil then
 61 |     topDivision = pandoc.utils.stringify(meta["top-level-division"])
 62 |   end
 63 | end
 64 | 
 65 | ------------------------------------------
 66 | -- add latex dependencies: langsci-gb4e is not on CTAN!
 67 | -- restarting of counters is not working right for gb4e
 68 | ------------------------------------------
 69 | 
 70 | function addFormatting (meta)
 71 |   local tmp = meta['header-includes'] or pandoc.MetaList{meta['header-includes']}
 72 |   
 73 |   if FORMAT:match "html" then
 74 |     -- add specific CSS for layout of examples
 75 |     -- building on classes set in this filter
 76 |     -- local f = io.open("pandoc-ling.css")
 77 |     -- local css = f:read("*a")
 78 |     -- f:close()
 79 |     local css = [[ 
 80 |       <style>
 81 |       .linguistic-example { 
 82 |         margin: 0; 
 83 |       }
 84 |       .linguistic-example caption { 
 85 |         margin-bottom: 0; 
 86 |       }
 87 |       .linguistic-example tbody { 
 88 |         border-top: none; 
 89 |         border-bottom: none; 
 90 |         vertical-align: top; 
 91 |       }
 92 |       .linguistic-example td { 
 93 |         padding-left: 2px;
 94 |         padding-right: 4px; 
 95 |       }
 96 |       .linguistic-judgement { 
 97 |         padding-right: 0; 
 98 |       }
 99 |       </style>
100 |       ]]
101 |     tmp[#tmp+1] = pandoc.MetaBlocks(pandoc.RawBlock("html", css))
102 |     
103 |     meta['header-includes'] = tmp
104 |   end
105 |   
106 |   if FORMAT:match "latex" then
107 |     
108 |     local function add (s)
109 |       tmp[#tmp+1] = pandoc.MetaBlocks(pandoc.RawBlock("tex", s))
110 |     end
111 |   
112 |     if latexPackage == "linguex" then
113 |       add("\\usepackage{linguex}")
114 |       -- no brackets
115 |       add("\\renewcommand{\\theExLBr}{}")
116 |       add("\\renewcommand{\\theExRBr}{}")
117 |       --add("\\renewcommand{\\firstrefdash}{}")
118 |       add("\\usepackage{chngcntr}")
119 |       if addChapterNumber then
120 |         add("\\counterwithin{ExNo}{"..topDivision.."}")
121 |         add("\\renewcommand{\\Exarabic}{\\the"..topDivision..".\\arabic}")
122 |       elseif restartAtChapter then
123 |         add("\\counterwithin*{ExNo}{"..topDivision.."}")
124 |       end
125 | 
126 |     elseif latexPackage:match "gb4e" then
127 |       add("\\usepackage{"..latexPackage.."}")
128 |       -- nnext package does not work with added top level number
129 |       add("\\usepackage[noparens]{nnext}")
130 |       add("\\usepackage{chngcntr}")
131 |       if addChapterNumber then
132 |         add("\\counterwithin{xnumi}{"..topDivision.."}")
133 |       elseif restartAtChapter then
134 |         add("\\counterwithin*{xnumi}{"..topDivision.."}")
135 |       end
136 | 
137 |     elseif latexPackage == "expex" then
138 |       add("\\usepackage{expex}")
139 |       add("\\lingset{belowglpreambleskip=-1.5ex, aboveglftskip=-1.5ex, exskip=0ex, interpartskip=-0.5ex, belowpreambleskip=-1ex}")
140 |       if addChapterNumber then
141 |         add("\\lingset{exnotype=chapter.arabic}")
142 |       end
143 |       if restartAtChapter then
144 |         --add("\\usepackage{epltxchapno}")
145 |         add("\\usepackage{etoolbox}")
146 |         add("\\pretocmd{\\"..topDivision.."}{\\excnt=1}{}{}")
147 |       end
148 | 
149 |     end
150 |     meta['header-includes'] = tmp
151 |   end
152 |   return meta
153 | end
154 | 
155 | ------------------------------------------
156 | -- add invisible numbering to section
157 | ------------------------------------------
158 | 
159 | function addSectionNumbering (doc)
160 |   local sections = pandoc.utils.make_sections(true, nil, doc.blocks)
161 |   return pandoc.Pandoc(sections, doc.meta)
162 | end
163 | 
164 | ---------------------------
165 | -- help function for format
166 | ---------------------------
167 | 
168 | function splitPara (p)
169 |    -- remove quotes, they interfere with the layout
170 |   if p[1].tag == "Quoted" then
171 |     p = p[1].content
172 |   end
173 |   -- split paragraph in subtables at Space 
174 |   -- to insert paragraph into pandoc.Table
175 |   -- Is there a better way to do this in Pandoc-Lua?
176 | 	local start = 1
177 | 	local result = {}
178 | 	for i=1,#p do
179 | 		if p[i].tag == "Space" then
180 | 			local chunk = table.move(p, start, i-1, 1, {})
181 | 			table.insert(result, {pandoc.Plain(chunk)} )
182 | 			start = i + 1
183 | 		end
184 | 	end
185 | 	if start <= #p then
186 | 		local chunk = table.move(p, start, #p, 1, {})
187 | 		table.insert(result, {pandoc.Plain(chunk)} )
188 | 	end
189 | 	return result
190 | end
191 | 
192 | function turnIntoTable (rowContent, nCols, extraCols)
193 |   -- turn examples into Tables for alignment
194 |   -- use simpleTable for construction
195 |   local caption = {}
196 |   local headers = {}
197 |   local aligns = {}
198 |     for i=1,nCols do aligns[i] = "AlignLeft" end
199 |     aligns[extraCols + 1] = "AlignRight" -- Column for grammaticality judgements
200 |   local widths = {}
201 |     for i=1,nCols do widths[i] = 0 end
202 |   local rows = rowContent
203 | 
204 |   local result = pandoc.SimpleTable(
205 |       caption,
206 |       aligns,
207 |       widths,
208 |       headers,
209 |       rows
210 |   )
211 |   -- turn into fancy new tables
212 |   result = pandoc.utils.from_simple_table(result)
213 | 
214 |   -- set class of table to "example" for styling via CSS
215 |   result.attr = {class = "linguistic-example"}
216 |   -- set class of judgment columns to "judgment" for styling via CSS
217 |   for i=1,#result.bodies[1].body do
218 |     result.bodies[1].body[i][2][extraCols+1].attr = pandoc.Attr(nil, {"linguistic-judgement"})
219 |   end
220 | 
221 |   return result
222 | end
223 | 
224 | function splitForSmallCaps (s)
225 | 	-- turn uppercase in gloss into small caps
226 | 	local split = {}
227 | 	for lower,upper in string.gmatch(s, "(.-)([%u%d][%u%d]+)") do
228 | 		if lower ~= "" then
229 | 			lower = pandoc.Str(lower)
230 | 			table.insert(split, lower)
231 |     end
232 | 		upper = pandoc.SmallCaps(pandoc.text.lower(upper))
233 |     table.insert(split, upper)
234 |   end
235 |   for leftover in string.gmatch(s, "[%u%d][%u%d]+(.-[^%u%s])$") do
236 |     leftover = pandoc.Str(leftover)
237 |     table.insert(split, leftover)
238 |   end
239 |   if #split == 0 then
240 |     if s == "~" then s = "   " end -- sequence "space-nobreakspace-space"
241 |     table.insert(split, pandoc.Str(s))
242 |   end
243 | 
244 | 	return split
245 | end
246 | 
247 | function splitJudgement (line)
248 |   local judgement = ""
249 |   local first = pandoc.utils.stringify(line[1])
250 |   if first == "^" then
251 |     judgement = line[2]
252 |     table.remove(line, 1)
253 |     table.remove(line, 1)
254 |     table.remove(line, 1)
255 |   elseif string.sub(first, 1, 1) == "^" then
256 |     judgement = pandoc.Str(string.sub(first, 2))
257 |     table.remove(line, 1)
258 |     table.remove(line, 1)
259 |   end
260 |   return judgement, line
261 | end
262 | 
263 | ------------------------
264 | -- make markup in Pandoc
265 | ------------------------
266 | 
267 | function pandocMakeSingle (single, extraCols)
268 |   -- Make just a single-line example
269 |   local judge, data = splitJudgement(single)
270 |   local line = { {pandoc.Plain(judge)}, {pandoc.Plain(data)} }
271 | 
272 |   -- add extra columns before
273 |   -- either one (nummer) or two (nummer, letter)
274 |   if extraCols > 0 then
275 |   	for i=1,extraCols do
276 | 	  	table.insert(line, 1, {} )
277 |     end
278 |   end
279 | 
280 |   -- turn into Table
281 |   local nCols = #line
282 |   local rowContent = { line }
283 |   local exampleSingle = turnIntoTable(rowContent, nCols, extraCols)
284 |   return exampleSingle
285 | end
286 | 
287 | function pandocMakeInterlinear (block, extraCols, formatOverride)
288 |   -- Make interlinear gloss 4-liner from LineBlock input
289 |   -- override format per example
290 |   local globalFormatGloss = formatGloss
291 |   if formatOverride ~= nil then
292 |     formatGloss = (formatOverride == "true")
293 |   end
294 | 
295 |   -- the four lines are: header, source, gloss and trans(lation)
296 |   local header = { { pandoc.Plain(block[1]) } }
297 |   table.insert(header, 1, {} )
298 |   
299 |   local judgeSource, source = splitJudgement(block[2])
300 | 	source = splitPara(source)
301 |     if formatGloss then
302 |       -- remove format at make emph throughout
303 |       for i=1,#source do 
304 |         local string = pandoc.utils.stringify(source[i])
305 |         source[i] = { pandoc.Plain(pandoc.Emph(string)) }
306 |       end
307 |     end  
308 |     table.insert(source, 1, { pandoc.Plain(judgeSource) } )
309 | 
310 | 	local gloss = splitPara(block[3])
311 |     if formatGloss then 
312 |       -- remove format and turn capital-sequences into smallcaps
313 |       for i=1,#gloss do 
314 |         local string = pandoc.utils.stringify(gloss[i])
315 |         gloss[i] = { pandoc.Plain(splitForSmallCaps(string)) }
316 |       end 
317 |     end 
318 |     table.insert(gloss,  1, {} )
319 | 
320 |   local trans = block[#block]
321 |     if formatGloss then
322 |       -- remove quotes and add singlequote througout
323 |       if trans[1].tag == "Quoted" then
324 |         trans = trans[1].content
325 |       end
326 |       trans = {{ pandoc.Plain(pandoc.Quoted("SingleQuote", trans)) }}
327 |     else 
328 |       trans = {{ pandoc.Plain(trans) }} 
329 |     end
330 |     table.insert(trans,  1, {} )
331 | 
332 |   -- return to global setting
333 |   if formatOverride ~= nil then
334 |     formatGloss = globalFormatGloss
335 |   end
336 | 
337 | 	-- add extra columns before, either one or two
338 | 	for i=1,extraCols do
339 | 		table.insert(header, 1, {} )
340 | 		table.insert(source, 1, {} )
341 | 		table.insert(gloss,  1, {} )
342 | 		table.insert(trans,  1, {} )
343 |   end
344 |   
345 |   -- turn into Table
346 |   local nCols = math.max(#source, #gloss)
347 |   local rowContent = {header, source, gloss, trans}
348 |   local interlinear = turnIntoTable(rowContent, nCols, extraCols)
349 | 
350 |   -- make header and trans long cells
351 | 	interlinear.bodies[1].body[1][2][extraCols+2].col_span = nCols - extraCols - 1
352 |   interlinear.bodies[1].body[#block][2][extraCols+2].col_span = nCols - extraCols - 1
353 |   
354 | 	-- shift upwards when header is empty
355 | 	if next(block[1]) == nil then
356 | 		table.remove(interlinear.bodies[1].body, 1)
357 | 	end
358 | 
359 | 	return interlinear
360 | end
361 | 
362 | -- When multiple interlinears are combined, separate Tables are needed
363 | -- also make separate Tables when single examples are mixed with interlinears
364 | 
365 | function pandocMakeList(data, number, formatOverride)
366 |   -- make a list of tables
367 |   local example = {}
368 |   -- go through all items of the list
369 |   for i=1,#data do
370 |     
371 |     if data[i][1].tag ~= "LineBlock" then
372 |       example[i] = pandocMakeSingle(data[i][1].content, 2)
373 |       -- add letter for sub-example in second column
374 |       example[i].bodies[1].body[1][2][2].contents[1] = 
375 |         pandoc.Plain(string.char(96+i)..".")
376 | 
377 |       if i>1 and data[i-1][1].tag ~= "LineBlock" then
378 |         -- add tablerow to previous if also Plain/Para
379 |         table.insert(example[i-1].bodies[1].body, example[i].bodies[1].body[1])
380 |         -- exchange tables
381 |         example[i] = example[i-1]
382 |         example[i-1] = "ignore"
383 |       end
384 | 
385 |     elseif data[i][1].tag == "LineBlock" then
386 |       example[i] = pandocMakeInterlinear(data[i][1].content, 2, formatOverride)
387 |       -- add letter for sub-example in second column
388 |       example[i].bodies[1].body[1][2][2].contents[1] = 
389 |         pandoc.Plain(string.char(96+i)..".")
390 |     end
391 |   end
392 | 
393 |   -- remove empty tables. Work around for `table.remove`
394 |   local exampleList = {}
395 |   for i=1,#example do
396 |     if example[i] ~= "ignore" then
397 |       table.insert(exampleList,example[i])
398 |     end
399 |   end
400 | 
401 |   -- keep track of judgements for better alignment
402 |   local judgeSize = 0
403 |   for i=1,#exampleList do
404 |     for j=1,#exampleList[i].bodies[1].body do
405 |       if exampleList[i].bodies[1].body[j][2][3].contents[1] ~= nil then
406 |         local judge = pandoc.utils.stringify(exampleList[i].bodies[1].body[j][2][3].contents[1])
407 |         judgeSize = math.max(judgeSize, utf8.len(judge))
408 |       end
409 |     end
410 |   end
411 | 
412 |   -- rough approximations
413 |   local spaceForNumber = string.rep(" ", 2*(string.len(number)+2))
414 |   local spaceForLabel = tostring(15 + 5*judgeSize)
415 |   if judgeSize == 0 then spaceForLabel = 0 end
416 | 
417 |   for i=1,#exampleList do
418 |     -- For better alignment with example number, add invisibles in first column 
419 |     -- not nice solution, but portable across formats
420 |     exampleList[i].bodies[1].body[1][2][1].contents[1] = pandoc.Plain(spaceForNumber)
421 |     -- For better alignment, add column-width to judgement column
422 |     -- note: this is not portable outside html
423 |     exampleList[i].bodies[1].body[1][2][3].attr = 
424 |       pandoc.Attr(nil, { "linguistic-judgement" }, { width = spaceForLabel.."px"} )
425 |   end
426 | 
427 |   return exampleList
428 | end
429 | 
430 | function pandocMakeExample (data, number, formatOverride)
431 |   -- make the examples as list of tables
432 |   local example = {}
433 |   local preamble = nil
434 | 
435 |   if #data == 2 then
436 |     -- first part is assumed to be preamble
437 |     preamble = data[1].content
438 |     -- go on with second part
439 |     data = { data[2] }
440 |   end
441 |     
442 |   if data[1].tag == "Para" then
443 |     -- make one-line example
444 |     example[1] = pandocMakeSingle(data[1].content, 1)
445 |   elseif data[1].tag == "LineBlock" then
446 |     -- make one interlinear example
447 |     example[1] = pandocMakeInterlinear(data[1].content, 1, formatOverride)
448 |   elseif data[1].tag == "OrderedList" then
449 |     -- make list of examples
450 |     example = pandocMakeList(data[1].content, number, formatOverride)
451 |   end
452 |   
453 |   if preamble ~= nil then
454 |     -- How many positions should preamble be shifted to the left?
455 |     local shift = 1
456 |     if data[1].tag == "OrderedList" then shift = 0 end
457 |     -- insert preamble as first row in example
458 |     preamble = pandocMakeSingle(preamble, shift)
459 |     table.insert(example[1].bodies[1].body, 1, preamble.bodies[1].body[1])
460 |     -- make preamble multi-column
461 |     local range = #example[1].colspecs - shift - 1
462 |     example[1].bodies[1].body[1][2][2].col_span = range
463 |   end
464 | 
465 |   -- Add example number to top left of first table
466 |   local numberParen = pandoc.Plain( "("..number..")" )
467 |   example[1].bodies[1].body[1][2][1].contents[1] = numberParen
468 | 
469 |   return example
470 | end
471 | 
472 | --------------------------
473 | -- make markup in Latex
474 | -- using langsci-gb4e
475 | --------------------------
476 | 
477 | -- convenience functions for Latex
478 | function texFront (tex, pdoc)
479 |   return table.insert(pdoc, 1, pandoc.RawInline("tex", tex))
480 | end
481 | 
482 | function texEnd (tex, pdoc)
483 |   return table.insert(pdoc, pandoc.RawInline("tex", tex))
484 | end
485 | 
486 | -- this is not ideal. It is too complex to really get judgement layout to work
487 | function texSplitJudgement (line)
488 |   local judge, text = splitJudgement(line)
489 |   if judge ~= "" then
490 |     if latexPackage == "expex" then
491 |       judge = pandoc.utils.stringify(judge)
492 |       texFront("\\ljudge{"..judge.."} ", text)
493 |     else
494 |       table.insert(text, 1, judge)
495 |     end
496 |   end
497 |   return text
498 | end
499 | 
500 | -- different kinds of examples: single line, interlinear, list
501 | function texMakeSingle (line)
502 |   local example = texSplitJudgement(line)
503 |   texFront("\n  ", example)
504 |   return example
505 | end
506 | 
507 | function texMakeInterlinear (block, exID, label, level, formatOverride )
508 |   -- make one interlinear
509 | 
510 |   --check for local override of formatting
511 |   local globalFormatGloss = formatGloss
512 |   if formatOverride ~= nil then
513 |     formatGloss = (formatOverride == "true")
514 |   end
515 | 
516 |   -- the four lines are: header, source, gloss and trans(lation)
517 |   local header = block[1]
518 |   if level == 1 then label = "" end
519 |   if latexPackage == "expex" then
520 |     if #header > 1 then
521 |       texFront("  "..label.."\n  \\begingl\n  \\glpreamble ", header)
522 |       texEnd("//", header)
523 |     else
524 |       texFront("\n  "..label.."\n  \\begingl", header)
525 |     end
526 |   else
527 |     --if level == 1 then
528 |     --  texFront("\n  ", header)
529 |     --else
530 |       texFront("\n  "..label.."  ", header)
531 |     --end
532 |     -- langsci-gb4e behaves here different from gb4e
533 |     if latexPackage == "langsci-gb4e" then
534 |       if #header > 1 then
535 |         texEnd("\\\\", header)
536 |       end
537 |     end
538 |   end
539 |   
540 |   local source = texSplitJudgement (block[2])
541 |   if formatGloss then
542 |     for i=1,#source do
543 |       if source[i].tag ~= "Space" then
544 |         local string = pandoc.utils.stringify(source[i])
545 |         source[i] = pandoc.Emph(string)
546 |       end
547 |     end
548 |   end
549 |   -- add latex
550 |   if latexPackage == "expex" then
551 |     texFront("\n  \\gla ", source)
552 |     texEnd("//", source)
553 |   else
554 |     texFront("\n  \\gll ", source)
555 |     texEnd("\\\\", source)
556 |   end
557 | 
558 | 
559 |   local gloss = block[3]
560 |   if formatGloss then
561 |     local result = pandoc.List()
562 |     for i=1,#gloss do 
563 |       local string = pandoc.utils.stringify(gloss[i])
564 |       result:extend(splitForSmallCaps(string))
565 |     end
566 |     gloss = result
567 |   end 
568 |   -- add latex
569 |   if latexPackage == "expex" then
570 |     texFront("\n  \\glb ", gloss)
571 |     texEnd("//", gloss)
572 |   else
573 |     texFront("\n       ",gloss)
574 |     texEnd("\\\\", gloss)
575 |   end
576 | 
577 |   local trans = block[4]
578 |   if formatGloss then
579 |     if trans[1].tag == "Quoted" then
580 |       trans = trans[1].content
581 |       texFront("`", trans)
582 |       texEnd("'", trans)
583 |     end
584 |   end
585 |   -- add latex
586 |   if latexPackage == "expex" then
587 |     texFront("\n  \\glft ", trans)
588 |     texEnd("//\n  \\endgl", trans)
589 |   else
590 |     texFront("\n  \\glt ", trans)
591 |   end
592 | 
593 |   -- return to global setting
594 |   if formatOverride ~= nil then
595 |     formatGloss = globalFormatGloss
596 |   end
597 | 
598 |   -- combine for output
599 |   local interlinear = header
600 |   interlinear:extend(source)
601 |   interlinear:extend(gloss)
602 |   interlinear:extend(trans)
603 |   return interlinear
604 | end
605 | 
606 | function texMakeList (list, exID, formatOverride)
607 |   local example = pandoc.List() 
608 |   local labeltwo = ""
609 | 
610 |   for i=1,#list do
611 | 
612 |     if latexPackage == "linguex" then
613 |       if i == 1 then labeltwo = "\\a." else labeltwo = "\\b." end
614 |     elseif latexPackage:match "gb4e" then
615 |       if i == 1 then labeltwo = "\\ea" else labeltwo = "\\ex" end
616 |     elseif latexPackage == "expex" then
617 |       labeltwo = "\\a"
618 |     end
619 | 
620 |     if list[i][1].tag ~= "LineBlock" then
621 |       local line = texSplitJudgement( list[i][1].content )
622 |       texFront("\n  "..labeltwo.." ", line)
623 |       example:extend(line)
624 | 
625 |     elseif list[i][1].tag == "LineBlock" then
626 |       local line = texMakeInterlinear(list[i][1].content, exID, labeltwo, 2, formatOverride)
627 |       if latexPackage:match "gb4e" then
628 |         texFront("\n", line)
629 |         texEnd("\n", line)
630 |       end
631 |       example:extend(line)
632 |     end
633 |   end
634 |   return example
635 | end
636 | 
637 | function texMakeExample (data, exID, formatOverride)
638 |   local example = pandoc.List()
639 | 
640 |   -- different labeling for tex packages
641 |   local labelone = ""
642 |   if latexPackage == "linguex" then labelone = "\\ex."
643 |   elseif latexPackage == "expex" then labelone = "\\ex"
644 |   elseif latexPackage:match "gb4e" then labelone = "\\ea"
645 |   end
646 | 
647 |   if #data == 2 then
648 |     -- assume first part is header
649 |     example = data[1].content
650 |     -- and then proceed with second part
651 |     data = { data[2] }
652 |   end
653 |  
654 |   if data[1].tag == "Para" then
655 |     -- example beginning
656 |     if #example > 0 then texEnd("\\\\", example) end
657 |     if latexPackage == "expex" then
658 |       texFront(labelone.." <"..exID.."> ", example)
659 |     else
660 |       texFront(labelone.." \\label{"..exID.."} ", example)
661 |     end
662 |     -- add one-line example
663 |     local line = texMakeSingle(data[1].content)
664 |     example:extend(line)
665 |     -- example ending
666 |     if latexPackage:match "gb4e" then
667 |       texEnd("\n\\z", example)
668 |     elseif latexPackage == "expex" then
669 |       texEnd("\n\\xe", example)
670 |     end
671 | 
672 |   elseif data[1].tag == "LineBlock" then
673 |     -- example beginning
674 |     if latexPackage == "expex" then
675 |       texFront(labelone.." <"..exID.."> ", example)
676 |     else
677 |       texFront(labelone.." \\label{"..exID.."} ", example)
678 |     end
679 |     -- add interlinear
680 |     local interlinear = texMakeInterlinear(data[1].content, exID, labelone, 1, formatOverride)
681 |     example:extend(interlinear)
682 |     -- example ending
683 |     if latexPackage:match "gb4e" then
684 |       texEnd("\n  \\z", example)
685 |     elseif latexPackage == "expex" then
686 |       texEnd("\n\\xe", example)
687 |     end
688 | 
689 |   elseif data[1].tag == "OrderedList" then
690 |     -- example beginning
691 |     if latexPackage == "expex" then
692 |       texFront("\\pex <"..exID.."> ", example)
693 |     else
694 |       texFront(labelone.." \\label{"..exID.."} ", example)
695 |     end
696 |     -- add list of examples
697 |     local list = texMakeList(data[1].content, exID, formatOverride)
698 |     example:extend(list)
699 |     -- example ending
700 |     if latexPackage:match "gb4e" then
701 |       texEnd("\n  \\z", example)
702 |       texEnd("\n\\z", example)
703 |     elseif latexPackage == "expex" then
704 |       texEnd("\n\\xe", example)
705 |     end
706 |   end
707 |   
708 |   return pandoc.Plain(example)
709 | end
710 | 
711 | --------------------------
712 | -- format example from div
713 | --------------------------
714 | 
715 | function makeExample (div)
716 | 
717 |   -- keep track of chapters (primary sections)
718 |   if div.classes[1] == "section" then
719 |     if div.attributes.number ~= nil and string.len(div.attributes.number) == 1 then
720 |       chapter = chapter + 1
721 |       counterInChapter = 0
722 |     end
723 |   end
724 |  
725 |   -- only do formatting for divs with class "ex"
726 |   if div.classes[1] == "ex" then
727 | 
728 | 	  -- keep count of examples
729 | 	  counter = counter + 1
730 |     counterInChapter = counterInChapter + 1
731 | 
732 |     -- format the numbering
733 |     local number = counter
734 |     if addChapterNumber then
735 |       number = chapter.."."..counterInChapter
736 |     elseif restartAtChapter then
737 |       number = counterInChapter
738 | 	  end
739 | 
740 |     -- make identifier for example
741 |     -- or keep user-provided identifier
742 |     local exID = ""
743 |     if div.identifier == "" then
744 | 	  	exID = "ling-ex:"..chapter.."."..counterInChapter
745 | 	  else
746 | 	  	exID = div.identifier
747 | 	  end
748 | 
749 |     -- keep global index of ids/numbers for crossreference
750 |     indexEx[exID] = number
751 | 
752 |     -- check format override per example
753 |     local formatOverride = div.attributes['formatGloss']
754 | 
755 |     -- make different format for latex
756 |     if FORMAT:match "latex" then
757 |       return texMakeExample(div.content, exID, formatOverride)
758 |     else
759 |       local example = pandocMakeExample(div.content, number, formatOverride)
760 |       -- add temporary Cite to resolve "Next"-type references in pandoc
761 |       -- will be removed after cross-references are in place
762 |       local tmpCite = pandoc.Cite({pandoc.Str("@Target")},{pandoc.Citation(exID,"NormalCitation")})
763 |       
764 |       return {
765 |         pandoc.Plain(tmpCite),
766 |         pandoc.Div(example, pandoc.Attr(exID) )
767 |       }
768 |     end
769 |   end
770 | end
771 | 
772 | -------------------------
773 | -- format crossreferences
774 | -------------------------
775 | 
776 | function uniqueNextrefs (cite)
777 | 
778 |   -- to resolve "Next"-style references give them all an unique ID
779 |   -- make indices to check in which order they occur
780 |   local nameN = string.match(cite.content[1].text, "([N]+)ext")
781 |   local nameL = string.match(cite.content[1].text, "([L]+)ast")
782 |   local target = string.match(cite.content[1].text, "@Target")
783 | 
784 |   -- use random ID to make unique
785 |   if nameN ~= nil or nameL ~= nil then
786 |     cite.citations[1].id = tostring(math.random(99999))
787 |   end
788 | 
789 |   -- make indices
790 |   if nameN ~= nil or nameL ~= nil or target ~= nil then
791 |     orderInText = orderInText + 1
792 |     indexRef[orderInText] = cite.citations[1].id
793 |     rev_indexRef[cite.citations[1].id] = orderInText
794 |   end
795 | 
796 |   return(cite)
797 | end
798 | 
799 | function resolveNextrefs (cite)
800 | 
801 |   -- assume Next-style refs have numeric id (from uniqueNextrefs)
802 |   -- assume Example-IDs are not numeric (user should not use them!)
803 |   local id = cite.citations[1].id
804 |   local order = rev_indexRef[id]
805 | 
806 |   local distN = 0
807 |   local sequenceN = string.match(cite.content[1].text, "([N]+)ext")
808 |   if sequenceN ~= nil then distN = string.len(sequenceN) end
809 |   
810 |   if distN > 0 then
811 |     for i=order,#indexRef do
812 |       if tonumber(indexRef[i]) == nil then
813 |         distN = distN - 1
814 |         if distN == 0 then
815 |           cite.citations[1].id = indexRef[i]
816 |         end
817 |       end
818 |     end
819 |   end
820 | 
821 |   local distL = 0
822 |   local sequenceL = string.match(cite.content[1].text, "([L]+)ast")
823 |   if sequenceL ~= nil then distL= string.len(sequenceL) end
824 |   
825 |   if distL > 0 then
826 |     for i=order,1,-1 do
827 |       if tonumber(indexRef[i]) == nil then
828 |         distL = distL - 1
829 |         if distL == 0 then
830 |           cite.citations[1].id = indexRef[i]
831 |         end
832 |       end
833 |     end
834 |   end
835 | 
836 |   return(cite)
837 | end
838 | 
839 | function removeTmpTargetrefs (cite)
840 |   -- remove temporary cites for resolving Next-style reference
841 |   if cite.content[1].text == "@Target" then
842 |     return pandoc.Plain({})
843 |   end 
844 | end
845 | 
846 | function makeCrossrefs (cite)
847 | 
848 |   local id = cite.citations[1].id
849 |   local name = string.gsub(cite.content[1].text, "[%[%]@]", "")
850 |   local suffix = ""
851 |   local expexName = {Next = "nextx", NNext = "anextx", Last = "lastx", LLast = "blastx"}
852 | 
853 |   -- prevent Latex error when user sets xrefSuffixSep to space or nothing
854 |   if FORMAT:match "latex" then
855 |     if xrefSuffixSep == "" or xrefSuffixSep == " " or xrefSuffixSep == " " then 
856 |       xrefSuffixSep = "\\," 
857 |     end
858 |   end
859 | 
860 |   -- only make suffix if there is something there
861 |   if #cite.citations[1].suffix > 0 then
862 |     suffix = pandoc.utils.stringify(cite.citations[1].suffix[2])
863 |     suffix = xrefSuffixSep..suffix
864 |   end
865 | 
866 |   -- make the cross-references
867 |   if FORMAT:match "latex" then
868 |     if latexPackage == "expex" then
869 |       if string.match("@Next@NNext@Last@LLast", name) ~= nil then
870 |         return pandoc.RawInline("latex", "({\\"..expexName[name].."}"..suffix..")")
871 |       elseif indexEx[id] ~= nil then
872 |         -- ignore other "cite" elements
873 |         return pandoc.RawInline("latex", "(\\getref{"..id.."}"..suffix..")")
874 |       end
875 |     else
876 |       if string.match("@Next@NNext@Last@LLast", name) ~= nil then
877 |         -- let latex handle these
878 |         return pandoc.RawInline("latex", "({\\"..name.."}"..suffix..")")
879 |       elseif indexEx[id] ~= nil then
880 |         -- ignore other "cite" elements
881 |         return pandoc.RawInline("latex", "(\\ref{"..id.."}"..suffix..")")
882 |       end
883 |     end
884 |   elseif indexEx[id] ~= nil then 
885 |     -- ignore other "cite" elements
886 |     return pandoc.Link("("..indexEx[id]..suffix..")", "#"..id)
887 |   end
888 | 
889 | end
890 | 
891 | ------------------------------------------
892 | -- Pandoc trick to cycle through documents
893 | ------------------------------------------
894 | 
895 | return {
896 |   -- preparations
897 |   { Pandoc = addSectionNumbering },
898 |   { Meta = getUserSettings },
899 |   { Meta = addFormatting },
900 |   -- formatting linguistic examples as tables
901 |   { Div = makeExample },
902 |   -- three passes necessary to resolve NNext-style references
903 |   { Cite = uniqueNextrefs },
904 |   { Cite = resolveNextrefs },
905 |   { Cite = removeTmpTargetrefs },
906 |   -- now finally all cross-references can be set
907 |   { Cite = makeCrossrefs }
908 | }
909 | 


--------------------------------------------------------------------------------
/docs/processVerbatim.lua:
--------------------------------------------------------------------------------
1 | function addRealCopy (code)
2 |   return { code, pandoc.RawBlock("markdown", code.text) }
3 | end
4 | 
5 | return {
6 |   { CodeBlock = addRealCopy }
7 | }
8 | 


--------------------------------------------------------------------------------
/docs/readme.docx:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/docs/readme.docx


--------------------------------------------------------------------------------
/docs/readme.epub:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/docs/readme.epub


--------------------------------------------------------------------------------
/docs/readme_expex.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/docs/readme_expex.pdf


--------------------------------------------------------------------------------
/docs/readme_expex.tex:
--------------------------------------------------------------------------------
  1 | % Options for packages loaded elsewhere
  2 | \PassOptionsToPackage{unicode}{hyperref}
  3 | \PassOptionsToPackage{hyphens}{url}
  4 | \documentclass[
  5 | ]{article}
  6 | \usepackage{xcolor}
  7 | \usepackage{amsmath,amssymb}
  8 | \setcounter{secnumdepth}{5}
  9 | \usepackage{iftex}
 10 | \ifPDFTeX
 11 |   \usepackage[T1]{fontenc}
 12 |   \usepackage[utf8]{inputenc}
 13 |   \usepackage{textcomp} % provide euro and other symbols
 14 | \else % if luatex or xetex
 15 |   \usepackage{unicode-math} % this also loads fontspec
 16 |   \defaultfontfeatures{Scale=MatchLowercase}
 17 |   \defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1}
 18 | \fi
 19 | \usepackage{lmodern}
 20 | \ifPDFTeX\else
 21 |   % xetex/luatex font selection
 22 | \fi
 23 | % Use upquote if available, for straight quotes in verbatim environments
 24 | \IfFileExists{upquote.sty}{\usepackage{upquote}}{}
 25 | \IfFileExists{microtype.sty}{% use microtype if available
 26 |   \usepackage[]{microtype}
 27 |   \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
 28 | }{}
 29 | \makeatletter
 30 | \@ifundefined{KOMAClassName}{% if non-KOMA class
 31 |   \IfFileExists{parskip.sty}{%
 32 |     \usepackage{parskip}
 33 |   }{% else
 34 |     \setlength{\parindent}{0pt}
 35 |     \setlength{\parskip}{6pt plus 2pt minus 1pt}}
 36 | }{% if KOMA class
 37 |   \KOMAoptions{parskip=half}}
 38 | \makeatother
 39 | \usepackage{graphicx}
 40 | \makeatletter
 41 | \newsavebox\pandoc@box
 42 | \newcommand*\pandocbounded[1]{% scales image to fit in text height/width
 43 |   \sbox\pandoc@box{#1}%
 44 |   \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}%
 45 |   \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}%
 46 |   \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both
 47 |   \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}%
 48 |   \else\usebox{\pandoc@box}%
 49 |   \fi%
 50 | }
 51 | % Set default figure placement to htbp
 52 | \def\fps@figure{htbp}
 53 | \makeatother
 54 | \setlength{\emergencystretch}{3em} % prevent overfull lines
 55 | \providecommand{\tightlist}{%
 56 |   \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
 57 | \usepackage{expex}
 58 | \lingset{ 
 59 |             belowglpreambleskip = -1.5ex, 
 60 |             aboveglftskip = -1.5ex, 
 61 |             exskip = 0ex, 
 62 |             interpartskip = -0.5ex, 
 63 |             belowpreambleskip = -2ex 
 64 |           }
 65 | \usepackage{bookmark}
 66 | \IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available
 67 | \urlstyle{same}
 68 | \hypersetup{
 69 |   pdftitle={Using pandoc-ling},
 70 |   pdfauthor={Michael Cysouw},
 71 |   hidelinks,
 72 |   pdfcreator={LaTeX via pandoc}}
 73 | 
 74 | \title{Using pandoc-ling}
 75 | \author{Michael Cysouw}
 76 | \date{}
 77 | 
 78 | \begin{document}
 79 | \maketitle
 80 | 
 81 | {
 82 | \setcounter{tocdepth}{3}
 83 | \tableofcontents
 84 | }
 85 | \section{pandoc-ling}\label{pandoc-ling}
 86 | 
 87 | \emph{Michael Cysouw}
 88 | \textless{}\href{mailto:cysouw@mac.com}{\nolinkurl{cysouw@mac.com}}\textgreater{}
 89 | 
 90 | A Pandoc filter for linguistic examples
 91 | 
 92 | tl;dr
 93 | 
 94 | \begin{itemize}
 95 | \tightlist
 96 | \item
 97 |   Easily write linguistic examples including basic interlinear glossing.
 98 | \item
 99 |   Let numbering and cross-referencing be done for you.
100 | \item
101 |   Export to (almost) any format of your wishes for final polishing.
102 | \item
103 |   As an example, check out this readme in
104 |   \href{https://cysouw.github.io/pandoc-ling/readme.html}{HTML} or
105 |   \href{https://cysouw.github.io/pandoc-ling/readme_gb4e.pdf}{Latex}.
106 | \end{itemize}
107 | 
108 | \section{Rationale}\label{rationale}
109 | 
110 | In the field of linguistics there is an outspoken tradition to format
111 | example sentences in research papers in a very specific way. In the
112 | field, it is a perennial problem to get such example sentences to look
113 | just right. Within Latex, there are numerous packages to deal with this
114 | problem (e.g.~covington, linguex, gb4e, expex, etc.). Depending on your
115 | needs, there is some Latex solution for almost everyone. However, these
116 | solutions in Latex are often cumbersome to type, and they are not
117 | portable to other formats. Specifically, transfer between latex, html,
118 | docx, odt or epub would actually be highly desirable. Such transfer is
119 | the hallmark of \href{https://pandoc.org}{Pandoc}, a tool by John
120 | MacFarlane that provides conversion between these (and many more)
121 | formats.
122 | 
123 | Any such conversion between text-formats naturally never works
124 | perfectly: every text-format has specific features that are not
125 | transferable to other formats. A central goal of Pandoc (at least in my
126 | interpretation) is to define a set of shared concepts for text-structure
127 | (a `common denominator' if you will, but surely not `least'!) that can
128 | then be mapped to other formats. In many ways, Pandoc tries (again) to
129 | define a set of logical concepts for text structure (`semantic markup'),
130 | which can then be formatted by your favourite typesetter. As long as you
131 | stay inside the realm of this `common denominator' (in practice that
132 | means Pandoc's extended version of Markdown/CommonMark), conversion
133 | works reasonably well (think 90\%-plus).
134 | 
135 | Building on John Gruber's
136 | \href{https://daringfireball.net/projects/markdown/syntax}{Markdown
137 | philosophy}, there is a strong urge here to learn to restrain oneself
138 | while writing, and try to restrict the number of layout-possibilities to
139 | a minimum. In this sense, with \texttt{pandoc-ling} I propose a
140 | Markdown-structure for linguistic examples that is simple, easy to type,
141 | easy to read, and portable through the Pandoc universe by way of an
142 | extension mechanism of Pandoc, called a `Pandoc Lua Filter'. This
143 | extension will not magically allow you to write every linguistic example
144 | thinkable, but my guess is that in practice the present proposal covers
145 | the majority of situations in linguistic publications (think 90\%-plus).
146 | As an example (and test case) I have included automatic conversions into
147 | various formats in this repository (chech them out in the directory
148 | \texttt{tests} to get an idea of the strengths and weaknesses of the
149 | current implementation).
150 | 
151 | \section{The basic structure of a linguistic
152 | example}\label{the-basic-structure-of-a-linguistic-example}
153 | 
154 | Basically, a linguistic example consists of 6 possible building blocks,
155 | of which only the number and at least one example line are necessary.
156 | The space between the building blocks is kept as minimal as possible
157 | without becoming cramped. When (optional) building blocks are not
158 | included, then the other blocks shift left and up (only exception: a
159 | preamble without labels is not shifted left completely, but left-aligned
160 | with the example, not with the judgement).
161 | 
162 | \begin{itemize}
163 | \tightlist
164 | \item
165 |   \textbf{Number}: Running tally of all examples in the work, possibly
166 |   restarting at chapters or other major headings. Typically between
167 |   round brackets, possibly with a chapter number added before in long
168 |   works, e.g.~example (7.26). Aligned top-left, typically left-aligned
169 |   to main text margin.
170 | \item
171 |   \textbf{Preamble}: Optional information about the content/kind of
172 |   example. Aligned top-left: to the top with the number, to the left
173 |   with the (optional) label. When there is no label, then preamble is
174 |   aligned with the example, not with the judgment.
175 | \item
176 |   \textbf{Label}: Indices for sub-examples. Only present when there are
177 |   more than one example grouped together inside one numbered entity.
178 |   Typically these sub-example labels use latin letters followed by a
179 |   full stop. They are left-aligned with the preamble, and each label is
180 |   top-aligned with the top-line of the corresponding example (important
181 |   for longer line-wrapped examples).
182 | \item
183 |   \textbf{Judgment}: Examples can optionally have grammaticality
184 |   judgments, typically symbols like **?!* sometimes in superscript
185 |   relative to the corresponding example. judgements are right-aligned to
186 |   each other, typically with only minimal space to the left-aligned
187 |   examples.
188 | \item
189 |   \textbf{Line example}: A minimal linguistic example has at least one
190 |   line example, i.e.~an utterance of interest. Building blocks in
191 |   general shift left and up when other (optional) building blocks are
192 |   not present. Minimally, this results in a number with one line
193 |   example.
194 | \item
195 |   \textbf{Interlinear example}: A complex structure typically used for
196 |   examples from languages unknown to most readers. Consist of three or
197 |   four lines that are left-aligned:
198 | 
199 |   \begin{itemize}
200 |   \tightlist
201 |   \item
202 |     \textbf{Header}: An optional header is typically used to display
203 |     information about the language of the example, including literature
204 |     references. When not present, then all other lines from the
205 |     interlinear example shift upwards.
206 |   \item
207 |     \textbf{Source}: The actual language utterance, often typeset in
208 |     italics. This line is internally separated at spaces, and each
209 |     sub-block is left-aligned with the corresponding sub-blocks of the
210 |     gloss.
211 |   \item
212 |     \textbf{Gloss}: Explanation of the meaning of the source, often
213 |     using abbreviations in small caps. This line is internally separated
214 |     at spaces, and each block is left-aligned with the block from
215 |     source.
216 |   \item
217 |     \textbf{Translation}: Free translation of the source, typically
218 |     quoted. Not separated in blocks, but freely extending to the right.
219 |     Left-aligned with the other lines from the interlinear example.
220 |   \end{itemize}
221 | \end{itemize}
222 | 
223 | \begin{figure}
224 | \centering
225 | \pandocbounded{\includegraphics[keepaspectratio,alt={The structure of a linguistic example.}]{figure/ExampleStructure.png}}
226 | \caption{The structure of a linguistic example.}
227 | \end{figure}
228 | 
229 | There are of course much more possibilities to extend the structure of a
230 | linguistic examples, like third or fourth subdivisions of labels (often
231 | using small roman numerals as a third level) or multiple glossing lines
232 | in the interlinear example. Also, the content of the header is sometimes
233 | found right-aligned to the right of the interlinear example (language
234 | into to the top, reference to the bottom). All such options are
235 | currently not supported by \texttt{pandoc-ling}.
236 | 
237 | Under the hood, this structure is prepared by \texttt{pandoc-ling} as a
238 | table. Tables are reasonably well transcoded to different document
239 | formats. Specific layout considerations mostly have to be set manually.
240 | Alignment of the text should work in most exports. Some \texttt{CSS}
241 | styling is proposed by \texttt{pandoc-ling}, but can of course be
242 | overruled. For latex (and beamer) special output is prepared using
243 | various available latex packages (see options, below).
244 | 
245 | \section{\texorpdfstring{Introducing
246 | \texttt{pandoc-ling}}{Introducing pandoc-ling}}\label{introducing-pandoc-ling}
247 | 
248 | \subsection{Editing linguistic
249 | examples}\label{editing-linguistic-examples}
250 | 
251 | To include a linguistic example in Markdown \texttt{pandoc-ling} uses
252 | the \texttt{div} structure, which is indicated in Pandoc-Markdown by
253 | typing three colons at the start and three colons at the end. To
254 | indicate the \texttt{class} of this \texttt{div} the letters `ex' (for
255 | `example') should be added after the top colons (with or without space
256 | in between). This `ex'-class is the signal for \texttt{pandoc-ling} to
257 | start processing such a \texttt{div}. The numbering of these examples
258 | will be inserted by \texttt{pandoc-ling}.
259 | 
260 | Empty lines can be added inside the \texttt{div} for visual pleasure, as
261 | they mostly do not have an influence on the output. Exception: do
262 | \emph{not} use empty lines between unlabelled line examples. Multiple
263 | lines of text can be used (without empty lines in between), but they
264 | will simply be interpreted as one sequential paragraph.
265 | 
266 | \begin{verbatim}
267 | ::: ex
268 | This is the most basic structure of a linguistic example. 
269 | :::
270 | \end{verbatim}
271 | 
272 | \begin{samepage}
273 | \ex<ex1> 
274 |   This is the most basic structure of a linguistic example.
275 | \xe
276 | \end{samepage}
277 | 
278 | Alternatively, the \texttt{class} can be put in curled brackets (and
279 | then a leading full stop is necessary before \texttt{ex}). Inside these
280 | brackets more attributes can be added (separated by space), for example
281 | an id, using a hash, or any attribute=value pairs that should apply to
282 | this example. Currently there is only one real attribute implemented
283 | (\texttt{formatGloss}), but in principle it is possible to add more
284 | attributes that can be used to fine-tune the typesetting of the example
285 | (see below for a description of such \texttt{local\ options}).
286 | 
287 | \begin{verbatim}
288 | ::: {#id .ex formatGloss=false}
289 | 
290 | This is a multi-line example.
291 | But that does not mean anything for the result
292 | All these lines are simply treated as one paragraph.
293 | They will become one example with one number.
294 | 
295 | :::
296 | \end{verbatim}
297 | 
298 | \begin{samepage}
299 | \ex<id> 
300 |   This is a multi-line example. But that does not mean anything for the
301 | result All these lines are simply treated as one paragraph. They will
302 | become one example with one number.
303 | \xe
304 | \end{samepage}
305 | 
306 | A preamble can be added by inserting an empty line between preamble and
307 | example. The same considerations about multiple text-lines apply.
308 | 
309 | \begin{verbatim}
310 | :::ex
311 | Preamble
312 | 
313 | This is an example with a preamble.
314 | :::
315 | \end{verbatim}
316 | 
317 | \begin{samepage}
318 | \ex<ex3> Preamble\\*
319 |   This is an example with a preamble.
320 | \xe
321 | \end{samepage}
322 | 
323 | Sub-examples with labels are entered by starting each sub-example with a
324 | small latin letter and a full stop. Empty lines between labels are
325 | allowed. Subsequent lines without labels are treated as one paragraph.
326 | Empty lines \emph{not} followed by a label with a full stop will result
327 | in errors.
328 | 
329 | \begin{verbatim}
330 | :::ex
331 | a. This is the first example.
332 | b. This is the second.
333 | a. The actual letters are not important, `pandoc-ling` will put them in order.
334 | 
335 | e. Empty lines are allowed between labelled lines
336 | Subsequent lines are again treated as one sequential paragraph.
337 | :::
338 | \end{verbatim}
339 | 
340 | \begin{samepage}
341 | \pex[*=]<ex4> 
342 |   \a This is the first example.
343 |   \a This is the second.
344 |   \a The actual letters are not important, \texttt{pandoc-ling} will put
345 | them in order.
346 |   \a Empty lines are allowed between labelled lines Subsequent lines are
347 | again treated as one sequential paragraph.
348 | \xe
349 | \end{samepage}
350 | 
351 | A labelled list can be combined with a preamble.
352 | 
353 | \begin{verbatim}
354 | :::ex
355 | Any nice description here
356 | 
357 | a. one example sentence.
358 | b. two
359 | c. three
360 | :::
361 | \end{verbatim}
362 | 
363 | \begin{samepage}
364 | \pex[*=]<ex5> Any nice description here\\*
365 |   \a one example sentence.
366 |   \a two
367 |   \a three
368 | \xe
369 | \end{samepage}
370 | 
371 | Grammaticality judgements should be added before an example, and after
372 | an optional label, separated from both by spaces (though four spaces in
373 | a row should be avoided, that could lead to layout errors). To indicate
374 | that any sequence of symbols is a judgements, prepend the judgement with
375 | a caret \texttt{\^{}}. Alignment will be figured out by
376 | \texttt{pandoc-ling}.
377 | 
378 | \begin{verbatim}
379 | :::ex
380 | Throwing in a preamble for good measure
381 | 
382 | a. ^* This traditionally signals ungrammaticality.
383 | b. ^? Question-marks indicate questionable grammaticality.
384 | c. ^^whynot?^ But in principle any sequence can be used (here even in superscript).
385 | d. However, such long sequences sometimes lead to undesirable effects in the layout.
386 | :::
387 | \end{verbatim}
388 | 
389 | \begin{samepage}
390 | \pex[*=whynot?]<ex6> Throwing in a preamble for good measure\\*
391 |   \a \ljudge{*}This traditionally signals ungrammaticality.
392 |   \a \ljudge{?}Question-marks indicate questionable grammaticality.
393 |   \a \ljudge{\textsuperscript{whynot?}}But in principle any sequence can
394 | be used (here even in superscript).
395 |   \a However, such long sequences sometimes lead to undesirable effects
396 | in the layout.
397 | \xe
398 | \end{samepage}
399 | 
400 | A minor detail is the alignment of a single example with a preamble and
401 | grammaticality judgements. In this case it looks better for the preamble
402 | to be left aligned with the example and not with the judgement.
403 | 
404 | \begin{verbatim}
405 | :::ex
406 | Here is a special case with a preamble
407 | 
408 | ^^???^ With a singly questionably example.
409 | Note the alignment! Especially with this very long example
410 | that should go over various lines in the output.
411 | :::
412 | \end{verbatim}
413 | 
414 | \begin{samepage}
415 | \ex<ex7> Here is a special case with a preamble\\*
416 |   
417 |   \judge{\textsuperscript{???}} With a singly questionably example. Note
418 | the alignment! Especially with this very long example that should go
419 | over various lines in the output.
420 | \xe
421 | \end{samepage}
422 | 
423 | For the lazy writers among us, it is also possible to use a simple
424 | bullet list instead of a labelled list. Note that the listed elements
425 | will still be formatted as a labelled list.
426 | 
427 | \begin{verbatim}
428 | :::ex
429 | - This is a lazy example.
430 | - ^# It should return letters at the start just as before.
431 | - ^% Also testing some unusual judgements.
432 | :::
433 | \end{verbatim}
434 | 
435 | \begin{samepage}
436 | \pex[*=\#]<ex8> 
437 |   \a This is a lazy example.
438 |   \a \ljudge{\#}It should return letters at the start just as before.
439 |   \a \ljudge{\%}Also testing some unusual judgements.
440 | \xe
441 | \end{samepage}
442 | 
443 | Just for testing: a single example with a judgement (which resulted in
444 | an error in earlier versions).
445 | 
446 | \begin{verbatim}
447 | ::: ex
448 | ^* This traditionally signals ungrammaticality.
449 | :::
450 | \end{verbatim}
451 | 
452 | \begin{samepage}
453 | \ex<ex9> 
454 |   
455 |   \judge{*} This traditionally signals ungrammaticality.
456 | \xe
457 | \end{samepage}
458 | 
459 | \subsection{Interlinear examples}\label{interlinear-examples}
460 | 
461 | For interlinear examples with aligned source and gloss, the structure of
462 | a \texttt{lineblock} is used, starting the lines with a vertical line
463 | \texttt{\textbar{}}. There should always be four vertical lines (for
464 | header, source, gloss and translation, respectively), although the
465 | content after the first vertical line can be empty. The source and gloss
466 | lines are separated at spaces, and all parts are right-aligned. If you
467 | want to have a space that is not separated, you will have to `protect'
468 | the space, either by putting a backslash before the space, or by
469 | inserting a non-breaking space instead of a normal space (either type
470 | \texttt{\&nbsp;} or insert an actual non-breaking space, i.e.~unicode
471 | character \texttt{U+00A0}).
472 | 
473 | \begin{verbatim}
474 | :::ex
475 | | Dutch (Germanic)
476 | | Deze zin is in het nederlands.
477 | | DEM sentence AUX in DET dutch.
478 | | This sentence is dutch.
479 | :::
480 | \end{verbatim}
481 | 
482 | \begin{samepage}
483 | \ex[*=]<ex10> 
484 |   \begingl
485 |   \glpreamble Dutch (Germanic)//
486 |   \gla Deze zin is in het nederlands. //
487 |   \glb DEM sentence AUX in DET dutch. //
488 |   \glft This sentence is dutch.//
489 |   \endgl
490 | \xe
491 | \end{samepage}
492 | 
493 | An attempt is made to format interlinear examples when the option
494 | \texttt{formatGloss=true} is added. This will:
495 | 
496 | \begin{itemize}
497 | \tightlist
498 | \item
499 |   remove formatting from the source and set everything in italics,
500 | \item
501 |   remove formatting from the gloss and set sequences (\textgreater1) of
502 |   capitals and numbers into small caps (note that the positioning of
503 |   small caps on web pages is
504 |   \href{https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align}{highly
505 |   complex}),
506 | \item
507 |   a tilde \texttt{\textasciitilde{}} between spaces in the gloss is
508 |   treated as a shortcut for an empty gloss (internally, the sequence
509 |   \texttt{space-tilde-space} is replaced by
510 |   \texttt{space-space-nonBreakingSpace-space-space}),
511 | \item
512 |   consistently put translations in single quotes, possibly removing
513 |   other quotes.
514 | \end{itemize}
515 | 
516 | \begin{verbatim}
517 | ::: {.ex formatGloss=true}
518 | | Dutch (Germanic)
519 | | Is deze zin in het nederlands ?
520 | | AUX DEM sentence in DET dutch Q
521 | | Is this sentence dutch?
522 | :::
523 | \end{verbatim}
524 | 
525 | \begin{samepage}
526 | \ex[*=]<ex11> 
527 |   \begingl
528 |   \glpreamble Dutch (Germanic)//
529 |   \gla \emph{Is} \emph{deze} \emph{zin} \emph{in} \emph{het}
530 | \emph{nederlands} \emph{?} //
531 |   \glb \textsc{aux} \textsc{dem} sentence in \textsc{det} dutch
532 | \textsc{q} //
533 |   \glft `Is this sentence dutch?'//
534 |   \endgl
535 | \xe
536 | \end{samepage}
537 | 
538 | The results of such formatting will not always work, but it seems to be
539 | quite robust in my testing. The next example brings everything together:
540 | 
541 | \begin{itemize}
542 | \tightlist
543 | \item
544 |   a preamble,
545 | \item
546 |   labels, both for single lines and for interlinear examples,
547 | \item
548 |   interlinear examples start on a new line immediately after the
549 |   letter-label,
550 | \item
551 |   grammaticality judgements with proper alignment,
552 | \item
553 |   when the header of an interlinear example is left out, everything is
554 |   shifted up,
555 | \item
556 |   The formatting of the interlinear is harmonised.
557 | \end{itemize}
558 | 
559 | \begin{verbatim}
560 | ::: {.ex formatGloss=true samePage=false}
561 | Completely superfluous preamble, but it works ...
562 | 
563 | a.
564 | | Dutch (Germanic) Note the grammaticality judgement!
565 | | ^^:–)^ Deze zin is (dit\ is&nbsp;test) nederlands.
566 | | DEM sentence AUX ~ dutch.
567 | | This sentence is dutch.
568 | 
569 | b.
570 | | 
571 | | Deze tweede zin heeft geen header.
572 | | DEM second sentence have.3SG.PRES no header.
573 | | This second sentence does not have a header.
574 | 
575 | a. Mixing single line examples with interlinear examples.
576 | a. This is of course highly unusal.
577 | Just for this example, let's add some extra material in this example.
578 | :::
579 | \end{verbatim}
580 | 
581 | \pex[*=:–)]<ex12> Completely superfluous preamble, but it works
582 | \ldots{}\\*
583 |   \a 
584 |   \begingl
585 |   \glpreamble Dutch (Germanic) Note the grammaticality judgement!//
586 |   \gla \ljudge{\textsuperscript{:--)}}\emph{Deze} \emph{zin} \emph{is}
587 | \emph{(dit~is~test)} \emph{nederlands.} //
588 |   \glb \textsc{dem} sentence \textsc{aux}  ~  dutch. //
589 |   \glft `This sentence is dutch.'//
590 |   \endgl
591 |   \a 
592 |   \begingl
593 |   \gla \emph{Deze} \emph{tweede} \emph{zin} \emph{heeft} \emph{geen}
594 | \emph{header.} //
595 |   \glb \textsc{dem} second sentence have.\textsc{3sg}.\textsc{pres} no
596 | header. //
597 |   \glft `This second sentence does not have a header.'//
598 |   \endgl
599 |   \a Mixing single line examples with interlinear examples.
600 |   \a This is of course highly unusal. Just for this example, let's add
601 | some extra material in this example.
602 | \xe
603 | 
604 | Also, as a quick workaround for showing multiple source lines without
605 | alignment with the glossing (e.g.~for phonetic or orthographic
606 | representations of the example), it is possible to use the header of
607 | interlinear example. For a line break in the header, use the double
608 | backslash \texttt{\textbackslash{}\textbackslash{}}, either inline or at
609 | the end of a line. When you type a header using multiple lines (as shown
610 | below), then subsequent lines have to start with space. For now, this
611 | only works in the header line.
612 | 
613 | \begin{verbatim}
614 | ::: ex
615 | | Example with an multiline header \\
616 |   *can be used for orthographic representations*, \\
617 |   or phonetic transcription, \\ or for whatever you like
618 | | Dit is een lui voorbeeld=je
619 | | DEM COP DET lazy example=DIM
620 | | This is a lazy example.
621 | :::
622 | \end{verbatim}
623 | 
624 | \begin{samepage}
625 | \ex[*=]<ex13> 
626 |   \begingl
627 |   \glpreamble Example with an multiline header \\
628 | \emph{can be used for orthographic representations}, \\
629 | or phonetic transcription, \\
630 | or for whatever you like//
631 |   \gla Dit is een lui voorbeeld=je //
632 |   \glb DEM COP DET lazy example=DIM //
633 |   \glft This is a lazy example.//
634 |   \endgl
635 | \xe
636 | \end{samepage}
637 | 
638 | \subsection{Cross-referencing
639 | examples}\label{cross-referencing-examples}
640 | 
641 | The examples are automatically numbered by \texttt{pandoc-ling}.
642 | Cross-references to examples inside a document can be made by using the
643 | \texttt{{[}@ID{]}} format (used by Pandoc for citations). When an
644 | example has an explicit identifier (like \texttt{\#test} in the next
645 | example), then a reference can be made to this example with
646 | \texttt{{[}@test{]}}, leading to (\getref{test}) when formatted (note
647 | that the formatting does not work on the github website. Please check
648 | the `docs' subdirectory).
649 | 
650 | \begin{verbatim}
651 | ::: {#test .ex}
652 | This is a test
653 | :::
654 | \end{verbatim}
655 | 
656 | \begin{samepage}
657 | \ex<test> 
658 |   This is a test
659 | \xe
660 | \end{samepage}
661 | 
662 | Inspired by the \texttt{linguex}-approach, you can also use the keywords
663 | \texttt{next} or \texttt{last} to refer to the next or the last example,
664 | e.g.~\texttt{{[}@last{]}} will be formatted as (\getref{test}). By
665 | doubling the first letters to \texttt{nnext} or \texttt{llast} reference
666 | to the next/last-but-one can be made. Actually, the number of starting
667 | letters can be repeated at will in \texttt{pandoc-ling}, so something
668 | like \texttt{{[}@llllllllast{]}} will also work. It will be formatted as
669 | (\getref{ex7}) after the processing of \texttt{pandoc-ling}. Needless to
670 | say that in such a situation an explicit identifier would be a better
671 | choice.
672 | 
673 | Referring to sub-examples can be done by manually adding a suffix into
674 | the cross reference, simply separated from the identifier by a space.
675 | For example, \texttt{{[}@lllast~c{]}} will refer to the third
676 | sub-example of the last-but-two example. Formatted this will look like
677 | this: (\getref{ex13}\,c), smile! However, note that the ``c'' has to be
678 | manually determined. It is simply a literal suffix that will be copied
679 | into the cross-reference. Something like \texttt{{[}@last\ hA1l0{]}}
680 | will work also, leading to (\getref{test}\,hA1l0) when formatted (which
681 | is of course nonsensical).
682 | 
683 | For exports that include attributes (like html), the examples have an
684 | explicit id of the form \texttt{exNUMBER} in which \texttt{NUMBER} is
685 | the actual number as given in the formatted output. This means that it
686 | is possible to refer to an example on any web-page by using the
687 | hash-mechanism to refer to a part of the web-page. For example
688 | \texttt{\#ex4.7} at can be used to refer to the seventh example in the
689 | html-output of this readme (try
690 | \href{https://cysouw.github.io/pandoc-ling/readme.html\#ex4.7}{this
691 | link}). The id in this example has a chapter number `4' because in the
692 | html conversion I have set the option \texttt{addChapterNumber} to
693 | \texttt{true}. (Note: when numbers restart the count in each chapter
694 | with the option \texttt{restartAtChapter}, then the id is of the form
695 | \texttt{exCHAPTER.NUMBER}. This is necessary to resolve clashing ids, as
696 | the same number might then be used in different chapters.)
697 | 
698 | I propose to use these ids also to refer to examples in citations when
699 | writing scholarly papers, e.g.~(Cysouw 2021: \#ex7), independent of
700 | whether the links actually resolve. In principle, such citations could
701 | easily be resolved when online publications are properly prepared. The
702 | same proposal could also work for other parts of research papers, for
703 | example using tags like \texttt{\#sec,\ \#fig,\ \#tab,\ \#eq} (see the
704 | Pandoc filter
705 | \href{https://github.com/cysouw/crossref-adapt}{\texttt{crossref-adapt}}).
706 | To refer to paragraphs (which should replace page numbers in a future of
707 | adaptive design), I propose to use no tag, but directly add the number
708 | to the hash (see the Pandoc filter
709 | \href{https://github.com/cysouw/count-para}{\texttt{count-para}} for a
710 | practical mechanism to add such numbering).
711 | 
712 | \subsection{\texorpdfstring{Options of
713 | \texttt{pandoc-ling}}{Options of pandoc-ling}}\label{options-of-pandoc-ling}
714 | 
715 | \subsubsection{Global options}\label{global-options}
716 | 
717 | The following global options are available with \texttt{pandoc-ling}.
718 | These can be added to the
719 | \href{https://pandoc.org/MANUAL.html\#metadata-blocks}{Pandoc metadata}.
720 | An example of such metadata can be found at the bottom of this
721 | \texttt{readme} in the form of a YAML-block. Pandoc allows for various
722 | methods to provide metadata (see the link above).
723 | 
724 | \begin{itemize}
725 | \tightlist
726 | \item
727 |   \textbf{\texttt{formatGloss}} (boolean, default \texttt{false}):
728 |   should all interlinear examples be consistently formatted? If you use
729 |   this option, you can simply use capital letters for abbreviations in
730 |   the gloss, and they will be changed to small caps. The source line is
731 |   set to italics, and the translations is put into single quotes.
732 | \item
733 |   \textbf{\texttt{samePage}} (boolean, default \texttt{true}, only for
734 |   Latex): should examples be kept together on the same page? Can also be
735 |   overriden for individual examples by adding
736 |   \texttt{\{.ex\ samePage=false\}} at the start of an example (cf.~below
737 |   on \texttt{local\ options}).
738 | \item
739 |   \textbf{\texttt{xrefSuffixSep}} (string, defaults to no-break-space):
740 |   When cross references have a suffix, how should the separator be
741 |   formatted? The defaults `no-break-space' is a safe options. I
742 |   personally like a `narrow no-break space' better (Unicode
743 |   \texttt{U+202F}), but this symbol does not work with all fonts, and
744 |   might thus lead to errors. For Latex typesetting, all space-like
745 |   symbols are converted to a Latex thin space
746 |   \texttt{\textbackslash{},}.
747 | \item
748 |   \textbf{\texttt{restartAtChapter}} (boolean, default \texttt{false}):
749 |   should the counting restart for each chapter?
750 | 
751 |   \begin{itemize}
752 |   \tightlist
753 |   \item
754 |     Actually, when \texttt{true} this setting will restart the counting
755 |     at the highest heading level, which for various output formats can
756 |     be set by the Pandoc option \texttt{top-level-division}.
757 |   \item
758 |     The id of each example will now be of the form
759 |     \texttt{exCHAPTER.NUMBER} to resolve any clashes when the same
760 |     number appears in different chapter.
761 |   \item
762 |     Depending on your Latex setup, an explicit entry
763 |     \texttt{top-level-division:\ chapter} might be necessary in your
764 |     metadata.
765 |   \end{itemize}
766 | \item
767 |   \textbf{\texttt{addChapterNumber}} (boolean, default \texttt{false}):
768 |   should the chapter (= highest heading level) number be added to the
769 |   number of the example? When setting this to \texttt{true} any setting
770 |   of \texttt{restartAtChapter} will be ignored. In most Latex situations
771 |   this only works in combination with a \texttt{documentclass:\ book}.
772 | \item
773 |   \textbf{\texttt{latexPackage}} (one of: \texttt{linguex},
774 |   \texttt{gb4e}, \texttt{langsci-gb4e}, \texttt{expex}, default
775 |   \texttt{linguex}): Various options for converting examples to Latex
776 |   packages that typeset linguistic examples. None of the conversions
777 |   works perfectly, though in should work in most normal situations
778 |   (think 90\%-plus). It might be necessary to first convert to
779 |   \texttt{Latex}, correct the output, and then typeset separately with a
780 |   latex compiler like \texttt{xelatex}. Using the direct option insider
781 |   Pandoc might also work in many situations. Export to
782 |   \textbf{\texttt{beamer}} seems to work reasonably well with the
783 |   \texttt{gb4e} package. All others have artefacts or errors.
784 | \end{itemize}
785 | 
786 | \subsubsection{Local options}\label{local-options}
787 | 
788 | Local options are options that can be set for each individual example.
789 | The \texttt{formatGloss} option can be used to have an individual
790 | example be formatted differently from the global setting. For example,
791 | when the global setting is \texttt{formatGloss:\ true} in the metadata,
792 | then adding \texttt{formatGloss=false} in the curly brackets of a
793 | specific example will block the formatting. This is especially useful
794 | when the automatic formatting does not give the desired result.
795 | 
796 | If you want to add something else (not a linguistic example) in a
797 | numbered example, then there is the local option \texttt{noFormat=true}.
798 | An attempt will be made to try and do a reasonable layout. Multiple
799 | paragraphs will simply we taken as is, and the number will be put in
800 | front. In HTML the number will be centred. It is usable for an
801 | incidental mathematical formula.
802 | 
803 | \begin{verbatim}
804 | ::: {.ex noFormat=true}
805 | $$\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}$$
806 | :::
807 | \end{verbatim}
808 | 
809 | \begin{samepage}
810 | \ex<ex15> 
811 |   \[\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}\]\\
812 |   
813 | \xe
814 | \end{samepage}
815 | 
816 | \subsection{\texorpdfstring{Issues with
817 | \texttt{pandoc-ling}}{Issues with pandoc-ling}}\label{issues-with-pandoc-ling}
818 | 
819 | \begin{itemize}
820 | \tightlist
821 | \item
822 |   Manually provided identifiers for examples should not be purely
823 |   numerical (so do not use e.g.~\texttt{\#5789}). In some situation this
824 |   interferes with the setting of the cross-references.
825 | \item
826 |   Because the cross-references use the same structure as citations in
827 |   Pandoc, the processing of citations (by \texttt{citeproc}) should be
828 |   performed \textbf{after} the processing by \texttt{pandoc-ling}.
829 |   Another Pandoc filter,
830 |   \href{https://github.com/lierdakil/pandoc-crossref}{\texttt{pandoc-crossref}},
831 |   for numbering figures and other captions, also uses the same system.
832 |   There seems to be no conflict between \texttt{pandoc-ling} and
833 |   \texttt{pandoc-crossref}.
834 | \item
835 |   Interlinear examples will will not wrap at the end of the page. There
836 |   is no solution yet for longer examples that are longer than the size
837 |   of the page.
838 | \item
839 |   It is not (yet) possible to have more than one glossing line.
840 | \item
841 |   When exporting to \texttt{docx} there is a problem because there are
842 |   paragraphs inserted after tables, which adds space in lists with
843 |   multiple interlinear examples (except when they have exactly the same
844 |   number of columns). This is
845 |   \href{https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2013_release/how-to-remove-extra-paragraph-after-table/995b3811-9f55-4df1-bbbc-9f672b1ad262}{by
846 |   design}. The official solution is to set font-size to 1 for this
847 |   paragraph inside MS Word.
848 | \item
849 |   Multi-column cells are crucial for \texttt{pandoc-ling} to work
850 |   properly. These are only introduced in new table format with Pandoc
851 |   2.10 (so older Pandoc version are not supported). Also note that these
852 |   structures are not yet exported to all formats, e.g.~it will not be
853 |   displayed correctly in \texttt{docx}. However, this is currently an
854 |   area of active development
855 | \item
856 |   \texttt{langsci-gb4e} is only available as part of the
857 |   \href{https://ctan.org/pkg/langsci?lang=en}{\texttt{langsci} package}.
858 |   You have to make it available to Pandoc, e.g.~by adding it into the
859 |   same directory as the pandoc-ling.lua filter. I have added a recent
860 |   version of \texttt{langsci-gb4e} here for convenience, but this one
861 |   might be outdated at some time in the future.
862 | \item
863 |   \texttt{beamer} output seems to work best with
864 |   \texttt{latexPackage:\ gb4e}.
865 | \end{itemize}
866 | 
867 | \subsection{A note on Latex
868 | conversion}\label{a-note-on-latex-conversion}
869 | 
870 | Originally, I decided to write this filter as a two-pronged conversion,
871 | making a markdown version myself, but using a mapping to one of the many
872 | latex libraries for linguistics examples as a quick fix. I assumed that
873 | such a mapping would be the easy part. However, it turned out that the
874 | mapping to latex was much more difficult that I anticipated. Basically,
875 | it turned out that the `common denominator' that I was aiming for was
876 | not necessarily the `common denominator' provided by the latex packages.
877 | I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and
878 | expex) with growing dismay. This approach resulted in a first version.
879 | However, after this version was (more or less) finished, I realised that
880 | it would be better to first define the `common denominator' more clearly
881 | (as done here), and then implement this purely in Pandoc. From that
882 | basis I have then made attempts to map them to the various latex
883 | packages.
884 | 
885 | \subsection{A note on implementation}\label{a-note-on-implementation}
886 | 
887 | The basic structure of the examples are transformed into Pandoc tables.
888 | Tables are reasonably safe for converting in other formats. Care has
889 | been taken to add \texttt{classes} to all elements of the tables
890 | (e.g.~the preamble has the class \texttt{linguistic-example-preamble}).
891 | When exported formats are aware of these classes, they can be used to
892 | fine-tune the formatting. I have used a few such fine-tunings into the
893 | html output of this filter by adding a few CSS-style statements. The
894 | naming of the classes is quite transparent, using the form
895 | \texttt{linguistic-example-STRUCTURE}.
896 | 
897 | The whole table is encapsulated in a \texttt{div} with class \texttt{ex}
898 | and an id of the form \texttt{exNUMBER}. This means that an example can
899 | be directly referred to in web-links by using the hash-mechanism. For
900 | example, adding \texttt{\#ex3} to the end of a link will immediately
901 | jump to this example in a browser.
902 | 
903 | The current implementation is completely independent from the
904 | \href{https://pandoc.org/MANUAL.html\#numbered-example-lists}{Pandoc
905 | numbered examples implementation} and both can work side by side, like
906 | (2):
907 | 
908 | \begin{enumerate}
909 | \def\labelenumi{(\arabic{enumi})}
910 | \item
911 |   These are native Pandoc numbered examples
912 | \item
913 |   They are independent of \texttt{pandoc-ling} but use the same output
914 |   formatting in many default exports, like latex.
915 | \end{enumerate}
916 | 
917 | However, in practice various output-formats of Pandoc (e.g.~latex) also
918 | use numbers in round brackets for these, so in practice it might be
919 | confusing to combine both.
920 | 
921 | \end{document}
922 | 


--------------------------------------------------------------------------------
/docs/readme_gb4e.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/docs/readme_gb4e.pdf


--------------------------------------------------------------------------------
/docs/readme_langsci-gb4e.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/docs/readme_langsci-gb4e.pdf


--------------------------------------------------------------------------------
/docs/readme_langsci-gb4e.tex:
--------------------------------------------------------------------------------
  1 | % Options for packages loaded elsewhere
  2 | \PassOptionsToPackage{unicode}{hyperref}
  3 | \PassOptionsToPackage{hyphens}{url}
  4 | \documentclass[
  5 | ]{article}
  6 | \usepackage{xcolor}
  7 | \usepackage{amsmath,amssymb}
  8 | \setcounter{secnumdepth}{5}
  9 | \usepackage{iftex}
 10 | \ifPDFTeX
 11 |   \usepackage[T1]{fontenc}
 12 |   \usepackage[utf8]{inputenc}
 13 |   \usepackage{textcomp} % provide euro and other symbols
 14 | \else % if luatex or xetex
 15 |   \usepackage{unicode-math} % this also loads fontspec
 16 |   \defaultfontfeatures{Scale=MatchLowercase}
 17 |   \defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1}
 18 | \fi
 19 | \usepackage{lmodern}
 20 | \ifPDFTeX\else
 21 |   % xetex/luatex font selection
 22 | \fi
 23 | % Use upquote if available, for straight quotes in verbatim environments
 24 | \IfFileExists{upquote.sty}{\usepackage{upquote}}{}
 25 | \IfFileExists{microtype.sty}{% use microtype if available
 26 |   \usepackage[]{microtype}
 27 |   \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
 28 | }{}
 29 | \makeatletter
 30 | \@ifundefined{KOMAClassName}{% if non-KOMA class
 31 |   \IfFileExists{parskip.sty}{%
 32 |     \usepackage{parskip}
 33 |   }{% else
 34 |     \setlength{\parindent}{0pt}
 35 |     \setlength{\parskip}{6pt plus 2pt minus 1pt}}
 36 | }{% if KOMA class
 37 |   \KOMAoptions{parskip=half}}
 38 | \makeatother
 39 | \usepackage{graphicx}
 40 | \makeatletter
 41 | \newsavebox\pandoc@box
 42 | \newcommand*\pandocbounded[1]{% scales image to fit in text height/width
 43 |   \sbox\pandoc@box{#1}%
 44 |   \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}%
 45 |   \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}%
 46 |   \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both
 47 |   \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}%
 48 |   \else\usebox{\pandoc@box}%
 49 |   \fi%
 50 | }
 51 | % Set default figure placement to htbp
 52 | \def\fps@figure{htbp}
 53 | \makeatother
 54 | \setlength{\emergencystretch}{3em} % prevent overfull lines
 55 | \providecommand{\tightlist}{%
 56 |   \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
 57 | \usepackage{langsci-gb4e}
 58 | \usepackage{chngcntr}
 59 | \counterwithin{xnumi}{section}
 60 | \exewidth{(9.123)}
 61 | \usepackage{bookmark}
 62 | \IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available
 63 | \urlstyle{same}
 64 | \hypersetup{
 65 |   pdftitle={Using pandoc-ling},
 66 |   pdfauthor={Michael Cysouw},
 67 |   hidelinks,
 68 |   pdfcreator={LaTeX via pandoc}}
 69 | 
 70 | \title{Using pandoc-ling}
 71 | \author{Michael Cysouw}
 72 | \date{}
 73 | 
 74 | \begin{document}
 75 | \maketitle
 76 | 
 77 | {
 78 | \setcounter{tocdepth}{3}
 79 | \tableofcontents
 80 | }
 81 | \section{pandoc-ling}\label{pandoc-ling}
 82 | 
 83 | \emph{Michael Cysouw}
 84 | \textless{}\href{mailto:cysouw@mac.com}{\nolinkurl{cysouw@mac.com}}\textgreater{}
 85 | 
 86 | A Pandoc filter for linguistic examples
 87 | 
 88 | tl;dr
 89 | 
 90 | \begin{itemize}
 91 | \tightlist
 92 | \item
 93 |   Easily write linguistic examples including basic interlinear glossing.
 94 | \item
 95 |   Let numbering and cross-referencing be done for you.
 96 | \item
 97 |   Export to (almost) any format of your wishes for final polishing.
 98 | \item
 99 |   As an example, check out this readme in
100 |   \href{https://cysouw.github.io/pandoc-ling/readme.html}{HTML} or
101 |   \href{https://cysouw.github.io/pandoc-ling/readme_gb4e.pdf}{Latex}.
102 | \end{itemize}
103 | 
104 | \section{Rationale}\label{rationale}
105 | 
106 | In the field of linguistics there is an outspoken tradition to format
107 | example sentences in research papers in a very specific way. In the
108 | field, it is a perennial problem to get such example sentences to look
109 | just right. Within Latex, there are numerous packages to deal with this
110 | problem (e.g.~covington, linguex, gb4e, expex, etc.). Depending on your
111 | needs, there is some Latex solution for almost everyone. However, these
112 | solutions in Latex are often cumbersome to type, and they are not
113 | portable to other formats. Specifically, transfer between latex, html,
114 | docx, odt or epub would actually be highly desirable. Such transfer is
115 | the hallmark of \href{https://pandoc.org}{Pandoc}, a tool by John
116 | MacFarlane that provides conversion between these (and many more)
117 | formats.
118 | 
119 | Any such conversion between text-formats naturally never works
120 | perfectly: every text-format has specific features that are not
121 | transferable to other formats. A central goal of Pandoc (at least in my
122 | interpretation) is to define a set of shared concepts for text-structure
123 | (a `common denominator' if you will, but surely not `least'!) that can
124 | then be mapped to other formats. In many ways, Pandoc tries (again) to
125 | define a set of logical concepts for text structure (`semantic markup'),
126 | which can then be formatted by your favourite typesetter. As long as you
127 | stay inside the realm of this `common denominator' (in practice that
128 | means Pandoc's extended version of Markdown/CommonMark), conversion
129 | works reasonably well (think 90\%-plus).
130 | 
131 | Building on John Gruber's
132 | \href{https://daringfireball.net/projects/markdown/syntax}{Markdown
133 | philosophy}, there is a strong urge here to learn to restrain oneself
134 | while writing, and try to restrict the number of layout-possibilities to
135 | a minimum. In this sense, with \texttt{pandoc-ling} I propose a
136 | Markdown-structure for linguistic examples that is simple, easy to type,
137 | easy to read, and portable through the Pandoc universe by way of an
138 | extension mechanism of Pandoc, called a `Pandoc Lua Filter'. This
139 | extension will not magically allow you to write every linguistic example
140 | thinkable, but my guess is that in practice the present proposal covers
141 | the majority of situations in linguistic publications (think 90\%-plus).
142 | As an example (and test case) I have included automatic conversions into
143 | various formats in this repository (chech them out in the directory
144 | \texttt{tests} to get an idea of the strengths and weaknesses of the
145 | current implementation).
146 | 
147 | \section{The basic structure of a linguistic
148 | example}\label{the-basic-structure-of-a-linguistic-example}
149 | 
150 | Basically, a linguistic example consists of 6 possible building blocks,
151 | of which only the number and at least one example line are necessary.
152 | The space between the building blocks is kept as minimal as possible
153 | without becoming cramped. When (optional) building blocks are not
154 | included, then the other blocks shift left and up (only exception: a
155 | preamble without labels is not shifted left completely, but left-aligned
156 | with the example, not with the judgement).
157 | 
158 | \begin{itemize}
159 | \tightlist
160 | \item
161 |   \textbf{Number}: Running tally of all examples in the work, possibly
162 |   restarting at chapters or other major headings. Typically between
163 |   round brackets, possibly with a chapter number added before in long
164 |   works, e.g.~example (7.26). Aligned top-left, typically left-aligned
165 |   to main text margin.
166 | \item
167 |   \textbf{Preamble}: Optional information about the content/kind of
168 |   example. Aligned top-left: to the top with the number, to the left
169 |   with the (optional) label. When there is no label, then preamble is
170 |   aligned with the example, not with the judgment.
171 | \item
172 |   \textbf{Label}: Indices for sub-examples. Only present when there are
173 |   more than one example grouped together inside one numbered entity.
174 |   Typically these sub-example labels use latin letters followed by a
175 |   full stop. They are left-aligned with the preamble, and each label is
176 |   top-aligned with the top-line of the corresponding example (important
177 |   for longer line-wrapped examples).
178 | \item
179 |   \textbf{Judgment}: Examples can optionally have grammaticality
180 |   judgments, typically symbols like **?!* sometimes in superscript
181 |   relative to the corresponding example. judgements are right-aligned to
182 |   each other, typically with only minimal space to the left-aligned
183 |   examples.
184 | \item
185 |   \textbf{Line example}: A minimal linguistic example has at least one
186 |   line example, i.e.~an utterance of interest. Building blocks in
187 |   general shift left and up when other (optional) building blocks are
188 |   not present. Minimally, this results in a number with one line
189 |   example.
190 | \item
191 |   \textbf{Interlinear example}: A complex structure typically used for
192 |   examples from languages unknown to most readers. Consist of three or
193 |   four lines that are left-aligned:
194 | 
195 |   \begin{itemize}
196 |   \tightlist
197 |   \item
198 |     \textbf{Header}: An optional header is typically used to display
199 |     information about the language of the example, including literature
200 |     references. When not present, then all other lines from the
201 |     interlinear example shift upwards.
202 |   \item
203 |     \textbf{Source}: The actual language utterance, often typeset in
204 |     italics. This line is internally separated at spaces, and each
205 |     sub-block is left-aligned with the corresponding sub-blocks of the
206 |     gloss.
207 |   \item
208 |     \textbf{Gloss}: Explanation of the meaning of the source, often
209 |     using abbreviations in small caps. This line is internally separated
210 |     at spaces, and each block is left-aligned with the block from
211 |     source.
212 |   \item
213 |     \textbf{Translation}: Free translation of the source, typically
214 |     quoted. Not separated in blocks, but freely extending to the right.
215 |     Left-aligned with the other lines from the interlinear example.
216 |   \end{itemize}
217 | \end{itemize}
218 | 
219 | \begin{figure}
220 | \centering
221 | \pandocbounded{\includegraphics[keepaspectratio,alt={The structure of a linguistic example.}]{figure/ExampleStructure.png}}
222 | \caption{The structure of a linguistic example.}
223 | \end{figure}
224 | 
225 | There are of course much more possibilities to extend the structure of a
226 | linguistic examples, like third or fourth subdivisions of labels (often
227 | using small roman numerals as a third level) or multiple glossing lines
228 | in the interlinear example. Also, the content of the header is sometimes
229 | found right-aligned to the right of the interlinear example (language
230 | into to the top, reference to the bottom). All such options are
231 | currently not supported by \texttt{pandoc-ling}.
232 | 
233 | Under the hood, this structure is prepared by \texttt{pandoc-ling} as a
234 | table. Tables are reasonably well transcoded to different document
235 | formats. Specific layout considerations mostly have to be set manually.
236 | Alignment of the text should work in most exports. Some \texttt{CSS}
237 | styling is proposed by \texttt{pandoc-ling}, but can of course be
238 | overruled. For latex (and beamer) special output is prepared using
239 | various available latex packages (see options, below).
240 | 
241 | \section{\texorpdfstring{Introducing
242 | \texttt{pandoc-ling}}{Introducing pandoc-ling}}\label{introducing-pandoc-ling}
243 | 
244 | \subsection{Editing linguistic
245 | examples}\label{editing-linguistic-examples}
246 | 
247 | To include a linguistic example in Markdown \texttt{pandoc-ling} uses
248 | the \texttt{div} structure, which is indicated in Pandoc-Markdown by
249 | typing three colons at the start and three colons at the end. To
250 | indicate the \texttt{class} of this \texttt{div} the letters `ex' (for
251 | `example') should be added after the top colons (with or without space
252 | in between). This `ex'-class is the signal for \texttt{pandoc-ling} to
253 | start processing such a \texttt{div}. The numbering of these examples
254 | will be inserted by \texttt{pandoc-ling}.
255 | 
256 | Empty lines can be added inside the \texttt{div} for visual pleasure, as
257 | they mostly do not have an influence on the output. Exception: do
258 | \emph{not} use empty lines between unlabelled line examples. Multiple
259 | lines of text can be used (without empty lines in between), but they
260 | will simply be interpreted as one sequential paragraph.
261 | 
262 | \begin{verbatim}
263 | ::: ex
264 | This is the most basic structure of a linguistic example. 
265 | :::
266 | \end{verbatim}
267 | 
268 | \begin{samepage}
269 | \ea \judgewidth{} \label{ex4.1} 
270 |   This is the most basic structure of a linguistic example.
271 | \z
272 | \end{samepage}
273 | 
274 | Alternatively, the \texttt{class} can be put in curled brackets (and
275 | then a leading full stop is necessary before \texttt{ex}). Inside these
276 | brackets more attributes can be added (separated by space), for example
277 | an id, using a hash, or any attribute=value pairs that should apply to
278 | this example. Currently there is only one real attribute implemented
279 | (\texttt{formatGloss}), but in principle it is possible to add more
280 | attributes that can be used to fine-tune the typesetting of the example
281 | (see below for a description of such \texttt{local\ options}).
282 | 
283 | \begin{verbatim}
284 | ::: {#id .ex formatGloss=false}
285 | 
286 | This is a multi-line example.
287 | But that does not mean anything for the result
288 | All these lines are simply treated as one paragraph.
289 | They will become one example with one number.
290 | 
291 | :::
292 | \end{verbatim}
293 | 
294 | \begin{samepage}
295 | \ea \judgewidth{} \label{id} 
296 |   This is a multi-line example. But that does not mean anything for the
297 | result All these lines are simply treated as one paragraph. They will
298 | become one example with one number.
299 | \z
300 | \end{samepage}
301 | 
302 | A preamble can be added by inserting an empty line between preamble and
303 | example. The same considerations about multiple text-lines apply.
304 | 
305 | \begin{verbatim}
306 | :::ex
307 | Preamble
308 | 
309 | This is an example with a preamble.
310 | :::
311 | \end{verbatim}
312 | 
313 | \begin{samepage}
314 | \ea \judgewidth{} \label{ex4.3} Preamble\\*
315 |   This is an example with a preamble.
316 | \z
317 | \end{samepage}
318 | 
319 | Sub-examples with labels are entered by starting each sub-example with a
320 | small latin letter and a full stop. Empty lines between labels are
321 | allowed. Subsequent lines without labels are treated as one paragraph.
322 | Empty lines \emph{not} followed by a label with a full stop will result
323 | in errors.
324 | 
325 | \begin{verbatim}
326 | :::ex
327 | a. This is the first example.
328 | b. This is the second.
329 | a. The actual letters are not important, `pandoc-ling` will put them in order.
330 | 
331 | e. Empty lines are allowed between labelled lines
332 | Subsequent lines are again treated as one sequential paragraph.
333 | :::
334 | \end{verbatim}
335 | 
336 | \begin{samepage}
337 | \ea \judgewidth{} \label{ex4.4} 
338 |   \ea [] { This is the first example. }
339 |   \ex [] { This is the second. }
340 |   \ex [] { The actual letters are not important, \texttt{pandoc-ling}
341 | will put them in order. }
342 |   \ex [] { Empty lines are allowed between labelled lines Subsequent
343 | lines are again treated as one sequential paragraph. }
344 |   \z
345 | \z
346 | \end{samepage}
347 | 
348 | A labelled list can be combined with a preamble.
349 | 
350 | \begin{verbatim}
351 | :::ex
352 | Any nice description here
353 | 
354 | a. one example sentence.
355 | b. two
356 | c. three
357 | :::
358 | \end{verbatim}
359 | 
360 | \begin{samepage}
361 | \ea \judgewidth{} \label{ex4.5} Any nice description here
362 |   \ea [] { one example sentence. }
363 |   \ex [] { two }
364 |   \ex [] { three }
365 |   \z
366 | \z
367 | \end{samepage}
368 | 
369 | Grammaticality judgements should be added before an example, and after
370 | an optional label, separated from both by spaces (though four spaces in
371 | a row should be avoided, that could lead to layout errors). To indicate
372 | that any sequence of symbols is a judgements, prepend the judgement with
373 | a caret \texttt{\^{}}. Alignment will be figured out by
374 | \texttt{pandoc-ling}.
375 | 
376 | \begin{verbatim}
377 | :::ex
378 | Throwing in a preamble for good measure
379 | 
380 | a. ^* This traditionally signals ungrammaticality.
381 | b. ^? Question-marks indicate questionable grammaticality.
382 | c. ^^whynot?^ But in principle any sequence can be used (here even in superscript).
383 | d. However, such long sequences sometimes lead to undesirable effects in the layout.
384 | :::
385 | \end{verbatim}
386 | 
387 | \begin{samepage}
388 | \ea \judgewidth{whynot?} \label{ex4.6} Throwing in a preamble for good
389 | measure
390 |   \ea [*] { This traditionally signals ungrammaticality. }
391 |   \ex [?] { Question-marks indicate questionable grammaticality. }
392 |   \ex [\textsuperscript{whynot?}] { But in principle any sequence can be
393 | used (here even in superscript). }
394 |   \ex [] { However, such long sequences sometimes lead to undesirable
395 | effects in the layout. }
396 |   \z
397 | \z
398 | \end{samepage}
399 | 
400 | A minor detail is the alignment of a single example with a preamble and
401 | grammaticality judgements. In this case it looks better for the preamble
402 | to be left aligned with the example and not with the judgement.
403 | 
404 | \begin{verbatim}
405 | :::ex
406 | Here is a special case with a preamble
407 | 
408 | ^^???^ With a singly questionably example.
409 | Note the alignment! Especially with this very long example
410 | that should go over various lines in the output.
411 | :::
412 | \end{verbatim}
413 | 
414 | \begin{samepage}
415 | \ea \judgewidth{???} \label{ex4.7} Here is a special case with a
416 | preamble\\*
417 |   \textsuperscript{???}With a singly questionably example. Note the
418 | alignment! Especially with this very long example that should go over
419 | various lines in the output.
420 | \z
421 | \end{samepage}
422 | 
423 | For the lazy writers among us, it is also possible to use a simple
424 | bullet list instead of a labelled list. Note that the listed elements
425 | will still be formatted as a labelled list.
426 | 
427 | \begin{verbatim}
428 | :::ex
429 | - This is a lazy example.
430 | - ^# It should return letters at the start just as before.
431 | - ^% Also testing some unusual judgements.
432 | :::
433 | \end{verbatim}
434 | 
435 | \begin{samepage}
436 | \ea \judgewidth{\#} \label{ex4.8} 
437 |   \ea [] { This is a lazy example. }
438 |   \ex [\#] { It should return letters at the start just as before. }
439 |   \ex [\%] { Also testing some unusual judgements. }
440 |   \z
441 | \z
442 | \end{samepage}
443 | 
444 | Just for testing: a single example with a judgement (which resulted in
445 | an error in earlier versions).
446 | 
447 | \begin{verbatim}
448 | ::: ex
449 | ^* This traditionally signals ungrammaticality.
450 | :::
451 | \end{verbatim}
452 | 
453 | \begin{samepage}
454 | \ea \judgewidth{*} \label{ex4.9} 
455 |   *This traditionally signals ungrammaticality.
456 | \z
457 | \end{samepage}
458 | 
459 | \subsection{Interlinear examples}\label{interlinear-examples}
460 | 
461 | For interlinear examples with aligned source and gloss, the structure of
462 | a \texttt{lineblock} is used, starting the lines with a vertical line
463 | \texttt{\textbar{}}. There should always be four vertical lines (for
464 | header, source, gloss and translation, respectively), although the
465 | content after the first vertical line can be empty. The source and gloss
466 | lines are separated at spaces, and all parts are right-aligned. If you
467 | want to have a space that is not separated, you will have to `protect'
468 | the space, either by putting a backslash before the space, or by
469 | inserting a non-breaking space instead of a normal space (either type
470 | \texttt{\&nbsp;} or insert an actual non-breaking space, i.e.~unicode
471 | character \texttt{U+00A0}).
472 | 
473 | \begin{verbatim}
474 | :::ex
475 | | Dutch (Germanic)
476 | | Deze zin is in het nederlands.
477 | | DEM sentence AUX in DET dutch.
478 | | This sentence is dutch.
479 | :::
480 | \end{verbatim}
481 | 
482 | \begin{samepage}
483 | \ea [] { \judgewidth{} \label{ex4.10} 
484 |        Dutch (Germanic)\\*
485 |   \gll Deze zin is in het nederlands. \\
486 |        DEM sentence AUX in DET dutch. \\
487 |   \glt This sentence is dutch. }
488 | \z
489 | \end{samepage}
490 | 
491 | An attempt is made to format interlinear examples when the option
492 | \texttt{formatGloss=true} is added. This will:
493 | 
494 | \begin{itemize}
495 | \tightlist
496 | \item
497 |   remove formatting from the source and set everything in italics,
498 | \item
499 |   remove formatting from the gloss and set sequences (\textgreater1) of
500 |   capitals and numbers into small caps (note that the positioning of
501 |   small caps on web pages is
502 |   \href{https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align}{highly
503 |   complex}),
504 | \item
505 |   a tilde \texttt{\textasciitilde{}} between spaces in the gloss is
506 |   treated as a shortcut for an empty gloss (internally, the sequence
507 |   \texttt{space-tilde-space} is replaced by
508 |   \texttt{space-space-nonBreakingSpace-space-space}),
509 | \item
510 |   consistently put translations in single quotes, possibly removing
511 |   other quotes.
512 | \end{itemize}
513 | 
514 | \begin{verbatim}
515 | ::: {.ex formatGloss=true}
516 | | Dutch (Germanic)
517 | | Is deze zin in het nederlands ?
518 | | AUX DEM sentence in DET dutch Q
519 | | Is this sentence dutch?
520 | :::
521 | \end{verbatim}
522 | 
523 | \begin{samepage}
524 | \ea [] { \judgewidth{} \label{ex4.11} 
525 |        Dutch (Germanic)\\*
526 |   \gll \emph{Is} \emph{deze} \emph{zin} \emph{in} \emph{het}
527 | \emph{nederlands} \emph{?} \\
528 |        \textsc{aux} \textsc{dem} sentence in \textsc{det} dutch
529 | \textsc{q} \\
530 |   \glt `Is this sentence dutch?' }
531 | \z
532 | \end{samepage}
533 | 
534 | The results of such formatting will not always work, but it seems to be
535 | quite robust in my testing. The next example brings everything together:
536 | 
537 | \begin{itemize}
538 | \tightlist
539 | \item
540 |   a preamble,
541 | \item
542 |   labels, both for single lines and for interlinear examples,
543 | \item
544 |   interlinear examples start on a new line immediately after the
545 |   letter-label,
546 | \item
547 |   grammaticality judgements with proper alignment,
548 | \item
549 |   when the header of an interlinear example is left out, everything is
550 |   shifted up,
551 | \item
552 |   The formatting of the interlinear is harmonised.
553 | \end{itemize}
554 | 
555 | \begin{verbatim}
556 | ::: {.ex formatGloss=true samePage=false}
557 | Completely superfluous preamble, but it works ...
558 | 
559 | a.
560 | | Dutch (Germanic) Note the grammaticality judgement!
561 | | ^^:–)^ Deze zin is (dit\ is&nbsp;test) nederlands.
562 | | DEM sentence AUX ~ dutch.
563 | | This sentence is dutch.
564 | 
565 | b.
566 | | 
567 | | Deze tweede zin heeft geen header.
568 | | DEM second sentence have.3SG.PRES no header.
569 | | This second sentence does not have a header.
570 | 
571 | a. Mixing single line examples with interlinear examples.
572 | a. This is of course highly unusal.
573 | Just for this example, let's add some extra material in this example.
574 | :::
575 | \end{verbatim}
576 | 
577 | \ea \judgewidth{:–)} \label{ex4.12} Completely superfluous preamble, but
578 | it works \ldots{}
579 |   \ea [\textsuperscript{:--)}] { 
580 |        Dutch (Germanic) Note the grammaticality judgement!\\*
581 |   \gll \emph{Deze} \emph{zin} \emph{is} \emph{(dit~is~test)}
582 | \emph{nederlands.} \\
583 |        \textsc{dem} sentence \textsc{aux}  ~  dutch. \\
584 |   \glt `This sentence is dutch.' }
585 |   \ex [] { 
586 |   \gll \emph{Deze} \emph{tweede} \emph{zin} \emph{heeft} \emph{geen}
587 | \emph{header.} \\
588 |        \textsc{dem} second sentence have.\textsc{3sg}.\textsc{pres} no
589 | header. \\
590 |   \glt `This second sentence does not have a header.' }
591 |   \ex [] { Mixing single line examples with interlinear examples. }
592 |   \ex [] { This is of course highly unusal. Just for this example, let's
593 | add some extra material in this example. }
594 |   \z
595 | \z
596 | 
597 | Also, as a quick workaround for showing multiple source lines without
598 | alignment with the glossing (e.g.~for phonetic or orthographic
599 | representations of the example), it is possible to use the header of
600 | interlinear example. For a line break in the header, use the double
601 | backslash \texttt{\textbackslash{}\textbackslash{}}, either inline or at
602 | the end of a line. When you type a header using multiple lines (as shown
603 | below), then subsequent lines have to start with space. For now, this
604 | only works in the header line.
605 | 
606 | \begin{verbatim}
607 | ::: ex
608 | | Example with an multiline header \\
609 |   *can be used for orthographic representations*, \\
610 |   or phonetic transcription, \\ or for whatever you like
611 | | Dit is een lui voorbeeld=je
612 | | DEM COP DET lazy example=DIM
613 | | This is a lazy example.
614 | :::
615 | \end{verbatim}
616 | 
617 | \begin{samepage}
618 | \ea [] { \judgewidth{} \label{ex4.13} 
619 |        Example with an multiline header \\
620 | \emph{can be used for orthographic representations}, \\
621 | or phonetic transcription, \\
622 | or for whatever you like\\*
623 |   \gll Dit is een lui voorbeeld=je \\
624 |        DEM COP DET lazy example=DIM \\
625 |   \glt This is a lazy example. }
626 | \z
627 | \end{samepage}
628 | 
629 | \subsection{Cross-referencing
630 | examples}\label{cross-referencing-examples}
631 | 
632 | The examples are automatically numbered by \texttt{pandoc-ling}.
633 | Cross-references to examples inside a document can be made by using the
634 | \texttt{{[}@ID{]}} format (used by Pandoc for citations). When an
635 | example has an explicit identifier (like \texttt{\#test} in the next
636 | example), then a reference can be made to this example with
637 | \texttt{{[}@test{]}}, leading to (\ref{test}) when formatted (note that
638 | the formatting does not work on the github website. Please check the
639 | `docs' subdirectory).
640 | 
641 | \begin{verbatim}
642 | ::: {#test .ex}
643 | This is a test
644 | :::
645 | \end{verbatim}
646 | 
647 | \begin{samepage}
648 | \ea \judgewidth{} \label{test} 
649 |   This is a test
650 | \z
651 | \end{samepage}
652 | 
653 | Inspired by the \texttt{linguex}-approach, you can also use the keywords
654 | \texttt{next} or \texttt{last} to refer to the next or the last example,
655 | e.g.~\texttt{{[}@last{]}} will be formatted as (\ref{test}). By doubling
656 | the first letters to \texttt{nnext} or \texttt{llast} reference to the
657 | next/last-but-one can be made. Actually, the number of starting letters
658 | can be repeated at will in \texttt{pandoc-ling}, so something like
659 | \texttt{{[}@llllllllast{]}} will also work. It will be formatted as
660 | (\ref{ex4.7}) after the processing of \texttt{pandoc-ling}. Needless to
661 | say that in such a situation an explicit identifier would be a better
662 | choice.
663 | 
664 | Referring to sub-examples can be done by manually adding a suffix into
665 | the cross reference, simply separated from the identifier by a space.
666 | For example, \texttt{{[}@lllast~c{]}} will refer to the third
667 | sub-example of the last-but-two example. Formatted this will look like
668 | this: (\ref{ex4.13}\,c), smile! However, note that the ``c'' has to be
669 | manually determined. It is simply a literal suffix that will be copied
670 | into the cross-reference. Something like \texttt{{[}@last\ hA1l0{]}}
671 | will work also, leading to (\ref{test}\,hA1l0) when formatted (which is
672 | of course nonsensical).
673 | 
674 | For exports that include attributes (like html), the examples have an
675 | explicit id of the form \texttt{exNUMBER} in which \texttt{NUMBER} is
676 | the actual number as given in the formatted output. This means that it
677 | is possible to refer to an example on any web-page by using the
678 | hash-mechanism to refer to a part of the web-page. For example
679 | \texttt{\#ex4.7} at can be used to refer to the seventh example in the
680 | html-output of this readme (try
681 | \href{https://cysouw.github.io/pandoc-ling/readme.html\#ex4.7}{this
682 | link}). The id in this example has a chapter number `4' because in the
683 | html conversion I have set the option \texttt{addChapterNumber} to
684 | \texttt{true}. (Note: when numbers restart the count in each chapter
685 | with the option \texttt{restartAtChapter}, then the id is of the form
686 | \texttt{exCHAPTER.NUMBER}. This is necessary to resolve clashing ids, as
687 | the same number might then be used in different chapters.)
688 | 
689 | I propose to use these ids also to refer to examples in citations when
690 | writing scholarly papers, e.g.~(Cysouw 2021: \#ex7), independent of
691 | whether the links actually resolve. In principle, such citations could
692 | easily be resolved when online publications are properly prepared. The
693 | same proposal could also work for other parts of research papers, for
694 | example using tags like \texttt{\#sec,\ \#fig,\ \#tab,\ \#eq} (see the
695 | Pandoc filter
696 | \href{https://github.com/cysouw/crossref-adapt}{\texttt{crossref-adapt}}).
697 | To refer to paragraphs (which should replace page numbers in a future of
698 | adaptive design), I propose to use no tag, but directly add the number
699 | to the hash (see the Pandoc filter
700 | \href{https://github.com/cysouw/count-para}{\texttt{count-para}} for a
701 | practical mechanism to add such numbering).
702 | 
703 | \subsection{\texorpdfstring{Options of
704 | \texttt{pandoc-ling}}{Options of pandoc-ling}}\label{options-of-pandoc-ling}
705 | 
706 | \subsubsection{Global options}\label{global-options}
707 | 
708 | The following global options are available with \texttt{pandoc-ling}.
709 | These can be added to the
710 | \href{https://pandoc.org/MANUAL.html\#metadata-blocks}{Pandoc metadata}.
711 | An example of such metadata can be found at the bottom of this
712 | \texttt{readme} in the form of a YAML-block. Pandoc allows for various
713 | methods to provide metadata (see the link above).
714 | 
715 | \begin{itemize}
716 | \tightlist
717 | \item
718 |   \textbf{\texttt{formatGloss}} (boolean, default \texttt{false}):
719 |   should all interlinear examples be consistently formatted? If you use
720 |   this option, you can simply use capital letters for abbreviations in
721 |   the gloss, and they will be changed to small caps. The source line is
722 |   set to italics, and the translations is put into single quotes.
723 | \item
724 |   \textbf{\texttt{samePage}} (boolean, default \texttt{true}, only for
725 |   Latex): should examples be kept together on the same page? Can also be
726 |   overriden for individual examples by adding
727 |   \texttt{\{.ex\ samePage=false\}} at the start of an example (cf.~below
728 |   on \texttt{local\ options}).
729 | \item
730 |   \textbf{\texttt{xrefSuffixSep}} (string, defaults to no-break-space):
731 |   When cross references have a suffix, how should the separator be
732 |   formatted? The defaults `no-break-space' is a safe options. I
733 |   personally like a `narrow no-break space' better (Unicode
734 |   \texttt{U+202F}), but this symbol does not work with all fonts, and
735 |   might thus lead to errors. For Latex typesetting, all space-like
736 |   symbols are converted to a Latex thin space
737 |   \texttt{\textbackslash{},}.
738 | \item
739 |   \textbf{\texttt{restartAtChapter}} (boolean, default \texttt{false}):
740 |   should the counting restart for each chapter?
741 | 
742 |   \begin{itemize}
743 |   \tightlist
744 |   \item
745 |     Actually, when \texttt{true} this setting will restart the counting
746 |     at the highest heading level, which for various output formats can
747 |     be set by the Pandoc option \texttt{top-level-division}.
748 |   \item
749 |     The id of each example will now be of the form
750 |     \texttt{exCHAPTER.NUMBER} to resolve any clashes when the same
751 |     number appears in different chapter.
752 |   \item
753 |     Depending on your Latex setup, an explicit entry
754 |     \texttt{top-level-division:\ chapter} might be necessary in your
755 |     metadata.
756 |   \end{itemize}
757 | \item
758 |   \textbf{\texttt{addChapterNumber}} (boolean, default \texttt{false}):
759 |   should the chapter (= highest heading level) number be added to the
760 |   number of the example? When setting this to \texttt{true} any setting
761 |   of \texttt{restartAtChapter} will be ignored. In most Latex situations
762 |   this only works in combination with a \texttt{documentclass:\ book}.
763 | \item
764 |   \textbf{\texttt{latexPackage}} (one of: \texttt{linguex},
765 |   \texttt{gb4e}, \texttt{langsci-gb4e}, \texttt{expex}, default
766 |   \texttt{linguex}): Various options for converting examples to Latex
767 |   packages that typeset linguistic examples. None of the conversions
768 |   works perfectly, though in should work in most normal situations
769 |   (think 90\%-plus). It might be necessary to first convert to
770 |   \texttt{Latex}, correct the output, and then typeset separately with a
771 |   latex compiler like \texttt{xelatex}. Using the direct option insider
772 |   Pandoc might also work in many situations. Export to
773 |   \textbf{\texttt{beamer}} seems to work reasonably well with the
774 |   \texttt{gb4e} package. All others have artefacts or errors.
775 | \end{itemize}
776 | 
777 | \subsubsection{Local options}\label{local-options}
778 | 
779 | Local options are options that can be set for each individual example.
780 | The \texttt{formatGloss} option can be used to have an individual
781 | example be formatted differently from the global setting. For example,
782 | when the global setting is \texttt{formatGloss:\ true} in the metadata,
783 | then adding \texttt{formatGloss=false} in the curly brackets of a
784 | specific example will block the formatting. This is especially useful
785 | when the automatic formatting does not give the desired result.
786 | 
787 | If you want to add something else (not a linguistic example) in a
788 | numbered example, then there is the local option \texttt{noFormat=true}.
789 | An attempt will be made to try and do a reasonable layout. Multiple
790 | paragraphs will simply we taken as is, and the number will be put in
791 | front. In HTML the number will be centred. It is usable for an
792 | incidental mathematical formula.
793 | 
794 | \begin{verbatim}
795 | ::: {.ex noFormat=true}
796 | $$\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}$$
797 | :::
798 | \end{verbatim}
799 | 
800 | \begin{samepage}
801 | \ea \judgewidth{} \label{ex4.15} 
802 |   \[\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}\]\\
803 |   
804 | \z
805 | \end{samepage}
806 | 
807 | \subsection{\texorpdfstring{Issues with
808 | \texttt{pandoc-ling}}{Issues with pandoc-ling}}\label{issues-with-pandoc-ling}
809 | 
810 | \begin{itemize}
811 | \tightlist
812 | \item
813 |   Manually provided identifiers for examples should not be purely
814 |   numerical (so do not use e.g.~\texttt{\#5789}). In some situation this
815 |   interferes with the setting of the cross-references.
816 | \item
817 |   Because the cross-references use the same structure as citations in
818 |   Pandoc, the processing of citations (by \texttt{citeproc}) should be
819 |   performed \textbf{after} the processing by \texttt{pandoc-ling}.
820 |   Another Pandoc filter,
821 |   \href{https://github.com/lierdakil/pandoc-crossref}{\texttt{pandoc-crossref}},
822 |   for numbering figures and other captions, also uses the same system.
823 |   There seems to be no conflict between \texttt{pandoc-ling} and
824 |   \texttt{pandoc-crossref}.
825 | \item
826 |   Interlinear examples will will not wrap at the end of the page. There
827 |   is no solution yet for longer examples that are longer than the size
828 |   of the page.
829 | \item
830 |   It is not (yet) possible to have more than one glossing line.
831 | \item
832 |   When exporting to \texttt{docx} there is a problem because there are
833 |   paragraphs inserted after tables, which adds space in lists with
834 |   multiple interlinear examples (except when they have exactly the same
835 |   number of columns). This is
836 |   \href{https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2013_release/how-to-remove-extra-paragraph-after-table/995b3811-9f55-4df1-bbbc-9f672b1ad262}{by
837 |   design}. The official solution is to set font-size to 1 for this
838 |   paragraph inside MS Word.
839 | \item
840 |   Multi-column cells are crucial for \texttt{pandoc-ling} to work
841 |   properly. These are only introduced in new table format with Pandoc
842 |   2.10 (so older Pandoc version are not supported). Also note that these
843 |   structures are not yet exported to all formats, e.g.~it will not be
844 |   displayed correctly in \texttt{docx}. However, this is currently an
845 |   area of active development
846 | \item
847 |   \texttt{langsci-gb4e} is only available as part of the
848 |   \href{https://ctan.org/pkg/langsci?lang=en}{\texttt{langsci} package}.
849 |   You have to make it available to Pandoc, e.g.~by adding it into the
850 |   same directory as the pandoc-ling.lua filter. I have added a recent
851 |   version of \texttt{langsci-gb4e} here for convenience, but this one
852 |   might be outdated at some time in the future.
853 | \item
854 |   \texttt{beamer} output seems to work best with
855 |   \texttt{latexPackage:\ gb4e}.
856 | \end{itemize}
857 | 
858 | \subsection{A note on Latex
859 | conversion}\label{a-note-on-latex-conversion}
860 | 
861 | Originally, I decided to write this filter as a two-pronged conversion,
862 | making a markdown version myself, but using a mapping to one of the many
863 | latex libraries for linguistics examples as a quick fix. I assumed that
864 | such a mapping would be the easy part. However, it turned out that the
865 | mapping to latex was much more difficult that I anticipated. Basically,
866 | it turned out that the `common denominator' that I was aiming for was
867 | not necessarily the `common denominator' provided by the latex packages.
868 | I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and
869 | expex) with growing dismay. This approach resulted in a first version.
870 | However, after this version was (more or less) finished, I realised that
871 | it would be better to first define the `common denominator' more clearly
872 | (as done here), and then implement this purely in Pandoc. From that
873 | basis I have then made attempts to map them to the various latex
874 | packages.
875 | 
876 | \subsection{A note on implementation}\label{a-note-on-implementation}
877 | 
878 | The basic structure of the examples are transformed into Pandoc tables.
879 | Tables are reasonably safe for converting in other formats. Care has
880 | been taken to add \texttt{classes} to all elements of the tables
881 | (e.g.~the preamble has the class \texttt{linguistic-example-preamble}).
882 | When exported formats are aware of these classes, they can be used to
883 | fine-tune the formatting. I have used a few such fine-tunings into the
884 | html output of this filter by adding a few CSS-style statements. The
885 | naming of the classes is quite transparent, using the form
886 | \texttt{linguistic-example-STRUCTURE}.
887 | 
888 | The whole table is encapsulated in a \texttt{div} with class \texttt{ex}
889 | and an id of the form \texttt{exNUMBER}. This means that an example can
890 | be directly referred to in web-links by using the hash-mechanism. For
891 | example, adding \texttt{\#ex3} to the end of a link will immediately
892 | jump to this example in a browser.
893 | 
894 | The current implementation is completely independent from the
895 | \href{https://pandoc.org/MANUAL.html\#numbered-example-lists}{Pandoc
896 | numbered examples implementation} and both can work side by side, like
897 | (2):
898 | 
899 | \begin{enumerate}
900 | \def\labelenumi{(\arabic{enumi})}
901 | \item
902 |   These are native Pandoc numbered examples
903 | \item
904 |   They are independent of \texttt{pandoc-ling} but use the same output
905 |   formatting in many default exports, like latex.
906 | \end{enumerate}
907 | 
908 | However, in practice various output-formats of Pandoc (e.g.~latex) also
909 | use numbers in round brackets for these, so in practice it might be
910 | confusing to combine both.
911 | 
912 | \end{document}
913 | 


--------------------------------------------------------------------------------
/docs/readme_linguex.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/docs/readme_linguex.pdf


--------------------------------------------------------------------------------
/docs/readme_linguex.tex:
--------------------------------------------------------------------------------
  1 | % Options for packages loaded elsewhere
  2 | \PassOptionsToPackage{unicode}{hyperref}
  3 | \PassOptionsToPackage{hyphens}{url}
  4 | \documentclass[
  5 | ]{article}
  6 | \usepackage{xcolor}
  7 | \usepackage{amsmath,amssymb}
  8 | \setcounter{secnumdepth}{5}
  9 | \usepackage{iftex}
 10 | \ifPDFTeX
 11 |   \usepackage[T1]{fontenc}
 12 |   \usepackage[utf8]{inputenc}
 13 |   \usepackage{textcomp} % provide euro and other symbols
 14 | \else % if luatex or xetex
 15 |   \usepackage{unicode-math} % this also loads fontspec
 16 |   \defaultfontfeatures{Scale=MatchLowercase}
 17 |   \defaultfontfeatures[\rmfamily]{Ligatures=TeX,Scale=1}
 18 | \fi
 19 | \usepackage{lmodern}
 20 | \ifPDFTeX\else
 21 |   % xetex/luatex font selection
 22 | \fi
 23 | % Use upquote if available, for straight quotes in verbatim environments
 24 | \IfFileExists{upquote.sty}{\usepackage{upquote}}{}
 25 | \IfFileExists{microtype.sty}{% use microtype if available
 26 |   \usepackage[]{microtype}
 27 |   \UseMicrotypeSet[protrusion]{basicmath} % disable protrusion for tt fonts
 28 | }{}
 29 | \makeatletter
 30 | \@ifundefined{KOMAClassName}{% if non-KOMA class
 31 |   \IfFileExists{parskip.sty}{%
 32 |     \usepackage{parskip}
 33 |   }{% else
 34 |     \setlength{\parindent}{0pt}
 35 |     \setlength{\parskip}{6pt plus 2pt minus 1pt}}
 36 | }{% if KOMA class
 37 |   \KOMAoptions{parskip=half}}
 38 | \makeatother
 39 | \usepackage{graphicx}
 40 | \makeatletter
 41 | \newsavebox\pandoc@box
 42 | \newcommand*\pandocbounded[1]{% scales image to fit in text height/width
 43 |   \sbox\pandoc@box{#1}%
 44 |   \Gscale@div\@tempa{\textheight}{\dimexpr\ht\pandoc@box+\dp\pandoc@box\relax}%
 45 |   \Gscale@div\@tempb{\linewidth}{\wd\pandoc@box}%
 46 |   \ifdim\@tempb\p@<\@tempa\p@\let\@tempa\@tempb\fi% select the smaller of both
 47 |   \ifdim\@tempa\p@<\p@\scalebox{\@tempa}{\usebox\pandoc@box}%
 48 |   \else\usebox{\pandoc@box}%
 49 |   \fi%
 50 | }
 51 | % Set default figure placement to htbp
 52 | \def\fps@figure{htbp}
 53 | \makeatother
 54 | \setlength{\emergencystretch}{3em} % prevent overfull lines
 55 | \providecommand{\tightlist}{%
 56 |   \setlength{\itemsep}{0pt}\setlength{\parskip}{0pt}}
 57 | \usepackage{linguex}
 58 | \renewcommand{\theExLBr}{}
 59 | \renewcommand{\theExRBr}{}
 60 | \newcommand{\jdg}[1]{\makebox[0.4em][r]{\normalfont#1\ignorespaces}}
 61 | \usepackage{chngcntr}
 62 | \counterwithin{ExNo}{section}
 63 | \renewcommand{\Exarabic}{\thesection.\arabic}
 64 | \usepackage{bookmark}
 65 | \IfFileExists{xurl.sty}{\usepackage{xurl}}{} % add URL line breaks if available
 66 | \urlstyle{same}
 67 | \hypersetup{
 68 |   pdftitle={Using pandoc-ling},
 69 |   pdfauthor={Michael Cysouw},
 70 |   hidelinks,
 71 |   pdfcreator={LaTeX via pandoc}}
 72 | 
 73 | \title{Using pandoc-ling}
 74 | \author{Michael Cysouw}
 75 | \date{}
 76 | 
 77 | \begin{document}
 78 | \maketitle
 79 | 
 80 | {
 81 | \setcounter{tocdepth}{3}
 82 | \tableofcontents
 83 | }
 84 | \section{pandoc-ling}\label{pandoc-ling}
 85 | 
 86 | \emph{Michael Cysouw}
 87 | \textless{}\href{mailto:cysouw@mac.com}{\nolinkurl{cysouw@mac.com}}\textgreater{}
 88 | 
 89 | A Pandoc filter for linguistic examples
 90 | 
 91 | tl;dr
 92 | 
 93 | \begin{itemize}
 94 | \tightlist
 95 | \item
 96 |   Easily write linguistic examples including basic interlinear glossing.
 97 | \item
 98 |   Let numbering and cross-referencing be done for you.
 99 | \item
100 |   Export to (almost) any format of your wishes for final polishing.
101 | \item
102 |   As an example, check out this readme in
103 |   \href{https://cysouw.github.io/pandoc-ling/readme.html}{HTML} or
104 |   \href{https://cysouw.github.io/pandoc-ling/readme_gb4e.pdf}{Latex}.
105 | \end{itemize}
106 | 
107 | \section{Rationale}\label{rationale}
108 | 
109 | In the field of linguistics there is an outspoken tradition to format
110 | example sentences in research papers in a very specific way. In the
111 | field, it is a perennial problem to get such example sentences to look
112 | just right. Within Latex, there are numerous packages to deal with this
113 | problem (e.g.~covington, linguex, gb4e, expex, etc.). Depending on your
114 | needs, there is some Latex solution for almost everyone. However, these
115 | solutions in Latex are often cumbersome to type, and they are not
116 | portable to other formats. Specifically, transfer between latex, html,
117 | docx, odt or epub would actually be highly desirable. Such transfer is
118 | the hallmark of \href{https://pandoc.org}{Pandoc}, a tool by John
119 | MacFarlane that provides conversion between these (and many more)
120 | formats.
121 | 
122 | Any such conversion between text-formats naturally never works
123 | perfectly: every text-format has specific features that are not
124 | transferable to other formats. A central goal of Pandoc (at least in my
125 | interpretation) is to define a set of shared concepts for text-structure
126 | (a `common denominator' if you will, but surely not `least'!) that can
127 | then be mapped to other formats. In many ways, Pandoc tries (again) to
128 | define a set of logical concepts for text structure (`semantic markup'),
129 | which can then be formatted by your favourite typesetter. As long as you
130 | stay inside the realm of this `common denominator' (in practice that
131 | means Pandoc's extended version of Markdown/CommonMark), conversion
132 | works reasonably well (think 90\%-plus).
133 | 
134 | Building on John Gruber's
135 | \href{https://daringfireball.net/projects/markdown/syntax}{Markdown
136 | philosophy}, there is a strong urge here to learn to restrain oneself
137 | while writing, and try to restrict the number of layout-possibilities to
138 | a minimum. In this sense, with \texttt{pandoc-ling} I propose a
139 | Markdown-structure for linguistic examples that is simple, easy to type,
140 | easy to read, and portable through the Pandoc universe by way of an
141 | extension mechanism of Pandoc, called a `Pandoc Lua Filter'. This
142 | extension will not magically allow you to write every linguistic example
143 | thinkable, but my guess is that in practice the present proposal covers
144 | the majority of situations in linguistic publications (think 90\%-plus).
145 | As an example (and test case) I have included automatic conversions into
146 | various formats in this repository (chech them out in the directory
147 | \texttt{tests} to get an idea of the strengths and weaknesses of the
148 | current implementation).
149 | 
150 | \section{The basic structure of a linguistic
151 | example}\label{the-basic-structure-of-a-linguistic-example}
152 | 
153 | Basically, a linguistic example consists of 6 possible building blocks,
154 | of which only the number and at least one example line are necessary.
155 | The space between the building blocks is kept as minimal as possible
156 | without becoming cramped. When (optional) building blocks are not
157 | included, then the other blocks shift left and up (only exception: a
158 | preamble without labels is not shifted left completely, but left-aligned
159 | with the example, not with the judgement).
160 | 
161 | \begin{itemize}
162 | \tightlist
163 | \item
164 |   \textbf{Number}: Running tally of all examples in the work, possibly
165 |   restarting at chapters or other major headings. Typically between
166 |   round brackets, possibly with a chapter number added before in long
167 |   works, e.g.~example (7.26). Aligned top-left, typically left-aligned
168 |   to main text margin.
169 | \item
170 |   \textbf{Preamble}: Optional information about the content/kind of
171 |   example. Aligned top-left: to the top with the number, to the left
172 |   with the (optional) label. When there is no label, then preamble is
173 |   aligned with the example, not with the judgment.
174 | \item
175 |   \textbf{Label}: Indices for sub-examples. Only present when there are
176 |   more than one example grouped together inside one numbered entity.
177 |   Typically these sub-example labels use latin letters followed by a
178 |   full stop. They are left-aligned with the preamble, and each label is
179 |   top-aligned with the top-line of the corresponding example (important
180 |   for longer line-wrapped examples).
181 | \item
182 |   \textbf{Judgment}: Examples can optionally have grammaticality
183 |   judgments, typically symbols like **?!* sometimes in superscript
184 |   relative to the corresponding example. judgements are right-aligned to
185 |   each other, typically with only minimal space to the left-aligned
186 |   examples.
187 | \item
188 |   \textbf{Line example}: A minimal linguistic example has at least one
189 |   line example, i.e.~an utterance of interest. Building blocks in
190 |   general shift left and up when other (optional) building blocks are
191 |   not present. Minimally, this results in a number with one line
192 |   example.
193 | \item
194 |   \textbf{Interlinear example}: A complex structure typically used for
195 |   examples from languages unknown to most readers. Consist of three or
196 |   four lines that are left-aligned:
197 | 
198 |   \begin{itemize}
199 |   \tightlist
200 |   \item
201 |     \textbf{Header}: An optional header is typically used to display
202 |     information about the language of the example, including literature
203 |     references. When not present, then all other lines from the
204 |     interlinear example shift upwards.
205 |   \item
206 |     \textbf{Source}: The actual language utterance, often typeset in
207 |     italics. This line is internally separated at spaces, and each
208 |     sub-block is left-aligned with the corresponding sub-blocks of the
209 |     gloss.
210 |   \item
211 |     \textbf{Gloss}: Explanation of the meaning of the source, often
212 |     using abbreviations in small caps. This line is internally separated
213 |     at spaces, and each block is left-aligned with the block from
214 |     source.
215 |   \item
216 |     \textbf{Translation}: Free translation of the source, typically
217 |     quoted. Not separated in blocks, but freely extending to the right.
218 |     Left-aligned with the other lines from the interlinear example.
219 |   \end{itemize}
220 | \end{itemize}
221 | 
222 | \begin{figure}
223 | \centering
224 | \pandocbounded{\includegraphics[keepaspectratio,alt={The structure of a linguistic example.}]{figure/ExampleStructure.png}}
225 | \caption{The structure of a linguistic example.}
226 | \end{figure}
227 | 
228 | There are of course much more possibilities to extend the structure of a
229 | linguistic examples, like third or fourth subdivisions of labels (often
230 | using small roman numerals as a third level) or multiple glossing lines
231 | in the interlinear example. Also, the content of the header is sometimes
232 | found right-aligned to the right of the interlinear example (language
233 | into to the top, reference to the bottom). All such options are
234 | currently not supported by \texttt{pandoc-ling}.
235 | 
236 | Under the hood, this structure is prepared by \texttt{pandoc-ling} as a
237 | table. Tables are reasonably well transcoded to different document
238 | formats. Specific layout considerations mostly have to be set manually.
239 | Alignment of the text should work in most exports. Some \texttt{CSS}
240 | styling is proposed by \texttt{pandoc-ling}, but can of course be
241 | overruled. For latex (and beamer) special output is prepared using
242 | various available latex packages (see options, below).
243 | 
244 | \section{\texorpdfstring{Introducing
245 | \texttt{pandoc-ling}}{Introducing pandoc-ling}}\label{introducing-pandoc-ling}
246 | 
247 | \subsection{Editing linguistic
248 | examples}\label{editing-linguistic-examples}
249 | 
250 | To include a linguistic example in Markdown \texttt{pandoc-ling} uses
251 | the \texttt{div} structure, which is indicated in Pandoc-Markdown by
252 | typing three colons at the start and three colons at the end. To
253 | indicate the \texttt{class} of this \texttt{div} the letters `ex' (for
254 | `example') should be added after the top colons (with or without space
255 | in between). This `ex'-class is the signal for \texttt{pandoc-ling} to
256 | start processing such a \texttt{div}. The numbering of these examples
257 | will be inserted by \texttt{pandoc-ling}.
258 | 
259 | Empty lines can be added inside the \texttt{div} for visual pleasure, as
260 | they mostly do not have an influence on the output. Exception: do
261 | \emph{not} use empty lines between unlabelled line examples. Multiple
262 | lines of text can be used (without empty lines in between), but they
263 | will simply be interpreted as one sequential paragraph.
264 | 
265 | \begin{verbatim}
266 | ::: ex
267 | This is the most basic structure of a linguistic example. 
268 | :::
269 | \end{verbatim}
270 | 
271 | \begin{samepage}
272 | 
273 | \ex. \label{ex4.1} 
274 |   This is the most basic structure of a linguistic example.
275 | 
276 | \end{samepage}
277 | 
278 | Alternatively, the \texttt{class} can be put in curled brackets (and
279 | then a leading full stop is necessary before \texttt{ex}). Inside these
280 | brackets more attributes can be added (separated by space), for example
281 | an id, using a hash, or any attribute=value pairs that should apply to
282 | this example. Currently there is only one real attribute implemented
283 | (\texttt{formatGloss}), but in principle it is possible to add more
284 | attributes that can be used to fine-tune the typesetting of the example
285 | (see below for a description of such \texttt{local\ options}).
286 | 
287 | \begin{verbatim}
288 | ::: {#id .ex formatGloss=false}
289 | 
290 | This is a multi-line example.
291 | But that does not mean anything for the result
292 | All these lines are simply treated as one paragraph.
293 | They will become one example with one number.
294 | 
295 | :::
296 | \end{verbatim}
297 | 
298 | \begin{samepage}
299 | 
300 | \ex. \label{id} 
301 |   This is a multi-line example. But that does not mean anything for the
302 | result All these lines are simply treated as one paragraph. They will
303 | become one example with one number.
304 | 
305 | \end{samepage}
306 | 
307 | A preamble can be added by inserting an empty line between preamble and
308 | example. The same considerations about multiple text-lines apply.
309 | 
310 | \begin{verbatim}
311 | :::ex
312 | Preamble
313 | 
314 | This is an example with a preamble.
315 | :::
316 | \end{verbatim}
317 | 
318 | \begin{samepage}
319 | 
320 | \ex. \label{ex4.3} Preamble\\*
321 |   This is an example with a preamble.
322 | 
323 | \end{samepage}
324 | 
325 | Sub-examples with labels are entered by starting each sub-example with a
326 | small latin letter and a full stop. Empty lines between labels are
327 | allowed. Subsequent lines without labels are treated as one paragraph.
328 | Empty lines \emph{not} followed by a label with a full stop will result
329 | in errors.
330 | 
331 | \begin{verbatim}
332 | :::ex
333 | a. This is the first example.
334 | b. This is the second.
335 | a. The actual letters are not important, `pandoc-ling` will put them in order.
336 | 
337 | e. Empty lines are allowed between labelled lines
338 | Subsequent lines are again treated as one sequential paragraph.
339 | :::
340 | \end{verbatim}
341 | 
342 | \begin{samepage}
343 | 
344 | \ex. \label{ex4.4} 
345 |   \a. This is the first example.
346 |   \b. This is the second.
347 |   \b. The actual letters are not important, \texttt{pandoc-ling} will
348 | put them in order.
349 |   \b. Empty lines are allowed between labelled lines Subsequent lines
350 | are again treated as one sequential paragraph.
351 | 
352 | \end{samepage}
353 | 
354 | A labelled list can be combined with a preamble.
355 | 
356 | \begin{verbatim}
357 | :::ex
358 | Any nice description here
359 | 
360 | a. one example sentence.
361 | b. two
362 | c. three
363 | :::
364 | \end{verbatim}
365 | 
366 | \begin{samepage}
367 | 
368 | \ex. \label{ex4.5} Any nice description here
369 |   \a. one example sentence.
370 |   \b. two
371 |   \b. three
372 | 
373 | \end{samepage}
374 | 
375 | Grammaticality judgements should be added before an example, and after
376 | an optional label, separated from both by spaces (though four spaces in
377 | a row should be avoided, that could lead to layout errors). To indicate
378 | that any sequence of symbols is a judgements, prepend the judgement with
379 | a caret \texttt{\^{}}. Alignment will be figured out by
380 | \texttt{pandoc-ling}.
381 | 
382 | \begin{verbatim}
383 | :::ex
384 | Throwing in a preamble for good measure
385 | 
386 | a. ^* This traditionally signals ungrammaticality.
387 | b. ^? Question-marks indicate questionable grammaticality.
388 | c. ^^whynot?^ But in principle any sequence can be used (here even in superscript).
389 | d. However, such long sequences sometimes lead to undesirable effects in the layout.
390 | :::
391 | \end{verbatim}
392 | 
393 | \begin{samepage}
394 | 
395 | \ex. \label{ex4.6} Throwing in a preamble for good measure
396 |   \a. *This traditionally signals ungrammaticality.
397 |   \b. ?Question-marks indicate questionable grammaticality.
398 |   \b. \textsuperscript{whynot?}But in principle any sequence can be used
399 | (here even in superscript).
400 |   \b. However, such long sequences sometimes lead to undesirable effects
401 | in the layout.
402 | 
403 | \end{samepage}
404 | 
405 | A minor detail is the alignment of a single example with a preamble and
406 | grammaticality judgements. In this case it looks better for the preamble
407 | to be left aligned with the example and not with the judgement.
408 | 
409 | \begin{verbatim}
410 | :::ex
411 | Here is a special case with a preamble
412 | 
413 | ^^???^ With a singly questionably example.
414 | Note the alignment! Especially with this very long example
415 | that should go over various lines in the output.
416 | :::
417 | \end{verbatim}
418 | 
419 | \begin{samepage}
420 | 
421 | \ex. \label{ex4.7} Here is a special case with a preamble\\*
422 |   \textsuperscript{???}With a singly questionably example. Note the
423 | alignment! Especially with this very long example that should go over
424 | various lines in the output.
425 | 
426 | \end{samepage}
427 | 
428 | For the lazy writers among us, it is also possible to use a simple
429 | bullet list instead of a labelled list. Note that the listed elements
430 | will still be formatted as a labelled list.
431 | 
432 | \begin{verbatim}
433 | :::ex
434 | - This is a lazy example.
435 | - ^# It should return letters at the start just as before.
436 | - ^% Also testing some unusual judgements.
437 | :::
438 | \end{verbatim}
439 | 
440 | \begin{samepage}
441 | 
442 | \ex. \label{ex4.8} 
443 |   \a. This is a lazy example.
444 |   \b. \#It should return letters at the start just as before.
445 |   \b. \%Also testing some unusual judgements.
446 | 
447 | \end{samepage}
448 | 
449 | Just for testing: a single example with a judgement (which resulted in
450 | an error in earlier versions).
451 | 
452 | \begin{verbatim}
453 | ::: ex
454 | ^* This traditionally signals ungrammaticality.
455 | :::
456 | \end{verbatim}
457 | 
458 | \begin{samepage}
459 | 
460 | \ex. \label{ex4.9} 
461 |   *This traditionally signals ungrammaticality.
462 | 
463 | \end{samepage}
464 | 
465 | \subsection{Interlinear examples}\label{interlinear-examples}
466 | 
467 | For interlinear examples with aligned source and gloss, the structure of
468 | a \texttt{lineblock} is used, starting the lines with a vertical line
469 | \texttt{\textbar{}}. There should always be four vertical lines (for
470 | header, source, gloss and translation, respectively), although the
471 | content after the first vertical line can be empty. The source and gloss
472 | lines are separated at spaces, and all parts are right-aligned. If you
473 | want to have a space that is not separated, you will have to `protect'
474 | the space, either by putting a backslash before the space, or by
475 | inserting a non-breaking space instead of a normal space (either type
476 | \texttt{\&nbsp;} or insert an actual non-breaking space, i.e.~unicode
477 | character \texttt{U+00A0}).
478 | 
479 | \begin{verbatim}
480 | :::ex
481 | | Dutch (Germanic)
482 | | Deze zin is in het nederlands.
483 | | DEM sentence AUX in DET dutch.
484 | | This sentence is dutch.
485 | :::
486 | \end{verbatim}
487 | 
488 | \begin{samepage}
489 | 
490 | \ex. \label{ex4.10} Dutch (Germanic)
491 |   \gll Deze zin is in het nederlands. \\
492 |        DEM sentence AUX in DET dutch. \\
493 |   \glt This sentence is dutch.
494 | 
495 | \end{samepage}
496 | 
497 | An attempt is made to format interlinear examples when the option
498 | \texttt{formatGloss=true} is added. This will:
499 | 
500 | \begin{itemize}
501 | \tightlist
502 | \item
503 |   remove formatting from the source and set everything in italics,
504 | \item
505 |   remove formatting from the gloss and set sequences (\textgreater1) of
506 |   capitals and numbers into small caps (note that the positioning of
507 |   small caps on web pages is
508 |   \href{https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align}{highly
509 |   complex}),
510 | \item
511 |   a tilde \texttt{\textasciitilde{}} between spaces in the gloss is
512 |   treated as a shortcut for an empty gloss (internally, the sequence
513 |   \texttt{space-tilde-space} is replaced by
514 |   \texttt{space-space-nonBreakingSpace-space-space}),
515 | \item
516 |   consistently put translations in single quotes, possibly removing
517 |   other quotes.
518 | \end{itemize}
519 | 
520 | \begin{verbatim}
521 | ::: {.ex formatGloss=true}
522 | | Dutch (Germanic)
523 | | Is deze zin in het nederlands ?
524 | | AUX DEM sentence in DET dutch Q
525 | | Is this sentence dutch?
526 | :::
527 | \end{verbatim}
528 | 
529 | \begin{samepage}
530 | 
531 | \ex. \label{ex4.11} Dutch (Germanic)
532 |   \gll \emph{Is} \emph{deze} \emph{zin} \emph{in} \emph{het}
533 | \emph{nederlands} \emph{?} \\
534 |        \textsc{aux} \textsc{dem} sentence in \textsc{det} dutch
535 | \textsc{q} \\
536 |   \glt `Is this sentence dutch?'
537 | 
538 | \end{samepage}
539 | 
540 | The results of such formatting will not always work, but it seems to be
541 | quite robust in my testing. The next example brings everything together:
542 | 
543 | \begin{itemize}
544 | \tightlist
545 | \item
546 |   a preamble,
547 | \item
548 |   labels, both for single lines and for interlinear examples,
549 | \item
550 |   interlinear examples start on a new line immediately after the
551 |   letter-label,
552 | \item
553 |   grammaticality judgements with proper alignment,
554 | \item
555 |   when the header of an interlinear example is left out, everything is
556 |   shifted up,
557 | \item
558 |   The formatting of the interlinear is harmonised.
559 | \end{itemize}
560 | 
561 | \begin{verbatim}
562 | ::: {.ex formatGloss=true samePage=false}
563 | Completely superfluous preamble, but it works ...
564 | 
565 | a.
566 | | Dutch (Germanic) Note the grammaticality judgement!
567 | | ^^:–)^ Deze zin is (dit\ is&nbsp;test) nederlands.
568 | | DEM sentence AUX ~ dutch.
569 | | This sentence is dutch.
570 | 
571 | b.
572 | | 
573 | | Deze tweede zin heeft geen header.
574 | | DEM second sentence have.3SG.PRES no header.
575 | | This second sentence does not have a header.
576 | 
577 | a. Mixing single line examples with interlinear examples.
578 | a. This is of course highly unusal.
579 | Just for this example, let's add some extra material in this example.
580 | :::
581 | \end{verbatim}
582 | 
583 | \ex. \label{ex4.12} Completely superfluous preamble, but it works
584 | \ldots{}
585 |   \a. Dutch (Germanic) Note the grammaticality judgement!
586 |   \gll \textsuperscript{:--)}\emph{Deze} \emph{zin} \emph{is}
587 | \emph{(dit~is~test)} \emph{nederlands.} \\
588 |        \textsc{dem} sentence \textsc{aux}  ~  dutch. \\
589 |   \glt `This sentence is dutch.'
590 |   \b. 
591 |   \gll \emph{Deze} \emph{tweede} \emph{zin} \emph{heeft} \emph{geen}
592 | \emph{header.} \\
593 |        \textsc{dem} second sentence have.\textsc{3sg}.\textsc{pres} no
594 | header. \\
595 |   \glt `This second sentence does not have a header.'
596 |   \b. Mixing single line examples with interlinear examples.
597 |   \b. This is of course highly unusal. Just for this example, let's add
598 | some extra material in this example.
599 | 
600 | Also, as a quick workaround for showing multiple source lines without
601 | alignment with the glossing (e.g.~for phonetic or orthographic
602 | representations of the example), it is possible to use the header of
603 | interlinear example. For a line break in the header, use the double
604 | backslash \texttt{\textbackslash{}\textbackslash{}}, either inline or at
605 | the end of a line. When you type a header using multiple lines (as shown
606 | below), then subsequent lines have to start with space. For now, this
607 | only works in the header line.
608 | 
609 | \begin{verbatim}
610 | ::: ex
611 | | Example with an multiline header \\
612 |   *can be used for orthographic representations*, \\
613 |   or phonetic transcription, \\ or for whatever you like
614 | | Dit is een lui voorbeeld=je
615 | | DEM COP DET lazy example=DIM
616 | | This is a lazy example.
617 | :::
618 | \end{verbatim}
619 | 
620 | \begin{samepage}
621 | 
622 | \ex. \label{ex4.13} Example with an multiline header \\
623 | \emph{can be used for orthographic representations}, \\
624 | or phonetic transcription, \\
625 | or for whatever you like
626 |   \gll Dit is een lui voorbeeld=je \\
627 |        DEM COP DET lazy example=DIM \\
628 |   \glt This is a lazy example.
629 | 
630 | \end{samepage}
631 | 
632 | \subsection{Cross-referencing
633 | examples}\label{cross-referencing-examples}
634 | 
635 | The examples are automatically numbered by \texttt{pandoc-ling}.
636 | Cross-references to examples inside a document can be made by using the
637 | \texttt{{[}@ID{]}} format (used by Pandoc for citations). When an
638 | example has an explicit identifier (like \texttt{\#test} in the next
639 | example), then a reference can be made to this example with
640 | \texttt{{[}@test{]}}, leading to (\ref{test}) when formatted (note that
641 | the formatting does not work on the github website. Please check the
642 | `docs' subdirectory).
643 | 
644 | \begin{verbatim}
645 | ::: {#test .ex}
646 | This is a test
647 | :::
648 | \end{verbatim}
649 | 
650 | \begin{samepage}
651 | 
652 | \ex. \label{test} 
653 |   This is a test
654 | 
655 | \end{samepage}
656 | 
657 | Inspired by the \texttt{linguex}-approach, you can also use the keywords
658 | \texttt{next} or \texttt{last} to refer to the next or the last example,
659 | e.g.~\texttt{{[}@last{]}} will be formatted as (\ref{test}). By doubling
660 | the first letters to \texttt{nnext} or \texttt{llast} reference to the
661 | next/last-but-one can be made. Actually, the number of starting letters
662 | can be repeated at will in \texttt{pandoc-ling}, so something like
663 | \texttt{{[}@llllllllast{]}} will also work. It will be formatted as
664 | (\ref{ex4.7}) after the processing of \texttt{pandoc-ling}. Needless to
665 | say that in such a situation an explicit identifier would be a better
666 | choice.
667 | 
668 | Referring to sub-examples can be done by manually adding a suffix into
669 | the cross reference, simply separated from the identifier by a space.
670 | For example, \texttt{{[}@lllast~c{]}} will refer to the third
671 | sub-example of the last-but-two example. Formatted this will look like
672 | this: (\ref{ex4.13}\,c), smile! However, note that the ``c'' has to be
673 | manually determined. It is simply a literal suffix that will be copied
674 | into the cross-reference. Something like \texttt{{[}@last\ hA1l0{]}}
675 | will work also, leading to (\ref{test}\,hA1l0) when formatted (which is
676 | of course nonsensical).
677 | 
678 | For exports that include attributes (like html), the examples have an
679 | explicit id of the form \texttt{exNUMBER} in which \texttt{NUMBER} is
680 | the actual number as given in the formatted output. This means that it
681 | is possible to refer to an example on any web-page by using the
682 | hash-mechanism to refer to a part of the web-page. For example
683 | \texttt{\#ex4.7} at can be used to refer to the seventh example in the
684 | html-output of this readme (try
685 | \href{https://cysouw.github.io/pandoc-ling/readme.html\#ex4.7}{this
686 | link}). The id in this example has a chapter number `4' because in the
687 | html conversion I have set the option \texttt{addChapterNumber} to
688 | \texttt{true}. (Note: when numbers restart the count in each chapter
689 | with the option \texttt{restartAtChapter}, then the id is of the form
690 | \texttt{exCHAPTER.NUMBER}. This is necessary to resolve clashing ids, as
691 | the same number might then be used in different chapters.)
692 | 
693 | I propose to use these ids also to refer to examples in citations when
694 | writing scholarly papers, e.g.~(Cysouw 2021: \#ex7), independent of
695 | whether the links actually resolve. In principle, such citations could
696 | easily be resolved when online publications are properly prepared. The
697 | same proposal could also work for other parts of research papers, for
698 | example using tags like \texttt{\#sec,\ \#fig,\ \#tab,\ \#eq} (see the
699 | Pandoc filter
700 | \href{https://github.com/cysouw/crossref-adapt}{\texttt{crossref-adapt}}).
701 | To refer to paragraphs (which should replace page numbers in a future of
702 | adaptive design), I propose to use no tag, but directly add the number
703 | to the hash (see the Pandoc filter
704 | \href{https://github.com/cysouw/count-para}{\texttt{count-para}} for a
705 | practical mechanism to add such numbering).
706 | 
707 | \subsection{\texorpdfstring{Options of
708 | \texttt{pandoc-ling}}{Options of pandoc-ling}}\label{options-of-pandoc-ling}
709 | 
710 | \subsubsection{Global options}\label{global-options}
711 | 
712 | The following global options are available with \texttt{pandoc-ling}.
713 | These can be added to the
714 | \href{https://pandoc.org/MANUAL.html\#metadata-blocks}{Pandoc metadata}.
715 | An example of such metadata can be found at the bottom of this
716 | \texttt{readme} in the form of a YAML-block. Pandoc allows for various
717 | methods to provide metadata (see the link above).
718 | 
719 | \begin{itemize}
720 | \tightlist
721 | \item
722 |   \textbf{\texttt{formatGloss}} (boolean, default \texttt{false}):
723 |   should all interlinear examples be consistently formatted? If you use
724 |   this option, you can simply use capital letters for abbreviations in
725 |   the gloss, and they will be changed to small caps. The source line is
726 |   set to italics, and the translations is put into single quotes.
727 | \item
728 |   \textbf{\texttt{samePage}} (boolean, default \texttt{true}, only for
729 |   Latex): should examples be kept together on the same page? Can also be
730 |   overriden for individual examples by adding
731 |   \texttt{\{.ex\ samePage=false\}} at the start of an example (cf.~below
732 |   on \texttt{local\ options}).
733 | \item
734 |   \textbf{\texttt{xrefSuffixSep}} (string, defaults to no-break-space):
735 |   When cross references have a suffix, how should the separator be
736 |   formatted? The defaults `no-break-space' is a safe options. I
737 |   personally like a `narrow no-break space' better (Unicode
738 |   \texttt{U+202F}), but this symbol does not work with all fonts, and
739 |   might thus lead to errors. For Latex typesetting, all space-like
740 |   symbols are converted to a Latex thin space
741 |   \texttt{\textbackslash{},}.
742 | \item
743 |   \textbf{\texttt{restartAtChapter}} (boolean, default \texttt{false}):
744 |   should the counting restart for each chapter?
745 | 
746 |   \begin{itemize}
747 |   \tightlist
748 |   \item
749 |     Actually, when \texttt{true} this setting will restart the counting
750 |     at the highest heading level, which for various output formats can
751 |     be set by the Pandoc option \texttt{top-level-division}.
752 |   \item
753 |     The id of each example will now be of the form
754 |     \texttt{exCHAPTER.NUMBER} to resolve any clashes when the same
755 |     number appears in different chapter.
756 |   \item
757 |     Depending on your Latex setup, an explicit entry
758 |     \texttt{top-level-division:\ chapter} might be necessary in your
759 |     metadata.
760 |   \end{itemize}
761 | \item
762 |   \textbf{\texttt{addChapterNumber}} (boolean, default \texttt{false}):
763 |   should the chapter (= highest heading level) number be added to the
764 |   number of the example? When setting this to \texttt{true} any setting
765 |   of \texttt{restartAtChapter} will be ignored. In most Latex situations
766 |   this only works in combination with a \texttt{documentclass:\ book}.
767 | \item
768 |   \textbf{\texttt{latexPackage}} (one of: \texttt{linguex},
769 |   \texttt{gb4e}, \texttt{langsci-gb4e}, \texttt{expex}, default
770 |   \texttt{linguex}): Various options for converting examples to Latex
771 |   packages that typeset linguistic examples. None of the conversions
772 |   works perfectly, though in should work in most normal situations
773 |   (think 90\%-plus). It might be necessary to first convert to
774 |   \texttt{Latex}, correct the output, and then typeset separately with a
775 |   latex compiler like \texttt{xelatex}. Using the direct option insider
776 |   Pandoc might also work in many situations. Export to
777 |   \textbf{\texttt{beamer}} seems to work reasonably well with the
778 |   \texttt{gb4e} package. All others have artefacts or errors.
779 | \end{itemize}
780 | 
781 | \subsubsection{Local options}\label{local-options}
782 | 
783 | Local options are options that can be set for each individual example.
784 | The \texttt{formatGloss} option can be used to have an individual
785 | example be formatted differently from the global setting. For example,
786 | when the global setting is \texttt{formatGloss:\ true} in the metadata,
787 | then adding \texttt{formatGloss=false} in the curly brackets of a
788 | specific example will block the formatting. This is especially useful
789 | when the automatic formatting does not give the desired result.
790 | 
791 | If you want to add something else (not a linguistic example) in a
792 | numbered example, then there is the local option \texttt{noFormat=true}.
793 | An attempt will be made to try and do a reasonable layout. Multiple
794 | paragraphs will simply we taken as is, and the number will be put in
795 | front. In HTML the number will be centred. It is usable for an
796 | incidental mathematical formula.
797 | 
798 | \begin{verbatim}
799 | ::: {.ex noFormat=true}
800 | $$\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}$$
801 | :::
802 | \end{verbatim}
803 | 
804 | \begin{samepage}
805 | 
806 | \ex. \label{ex4.15} 
807 |   \[\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}\]\\
808 |   
809 | 
810 | \end{samepage}
811 | 
812 | \subsection{\texorpdfstring{Issues with
813 | \texttt{pandoc-ling}}{Issues with pandoc-ling}}\label{issues-with-pandoc-ling}
814 | 
815 | \begin{itemize}
816 | \tightlist
817 | \item
818 |   Manually provided identifiers for examples should not be purely
819 |   numerical (so do not use e.g.~\texttt{\#5789}). In some situation this
820 |   interferes with the setting of the cross-references.
821 | \item
822 |   Because the cross-references use the same structure as citations in
823 |   Pandoc, the processing of citations (by \texttt{citeproc}) should be
824 |   performed \textbf{after} the processing by \texttt{pandoc-ling}.
825 |   Another Pandoc filter,
826 |   \href{https://github.com/lierdakil/pandoc-crossref}{\texttt{pandoc-crossref}},
827 |   for numbering figures and other captions, also uses the same system.
828 |   There seems to be no conflict between \texttt{pandoc-ling} and
829 |   \texttt{pandoc-crossref}.
830 | \item
831 |   Interlinear examples will will not wrap at the end of the page. There
832 |   is no solution yet for longer examples that are longer than the size
833 |   of the page.
834 | \item
835 |   It is not (yet) possible to have more than one glossing line.
836 | \item
837 |   When exporting to \texttt{docx} there is a problem because there are
838 |   paragraphs inserted after tables, which adds space in lists with
839 |   multiple interlinear examples (except when they have exactly the same
840 |   number of columns). This is
841 |   \href{https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2013_release/how-to-remove-extra-paragraph-after-table/995b3811-9f55-4df1-bbbc-9f672b1ad262}{by
842 |   design}. The official solution is to set font-size to 1 for this
843 |   paragraph inside MS Word.
844 | \item
845 |   Multi-column cells are crucial for \texttt{pandoc-ling} to work
846 |   properly. These are only introduced in new table format with Pandoc
847 |   2.10 (so older Pandoc version are not supported). Also note that these
848 |   structures are not yet exported to all formats, e.g.~it will not be
849 |   displayed correctly in \texttt{docx}. However, this is currently an
850 |   area of active development
851 | \item
852 |   \texttt{langsci-gb4e} is only available as part of the
853 |   \href{https://ctan.org/pkg/langsci?lang=en}{\texttt{langsci} package}.
854 |   You have to make it available to Pandoc, e.g.~by adding it into the
855 |   same directory as the pandoc-ling.lua filter. I have added a recent
856 |   version of \texttt{langsci-gb4e} here for convenience, but this one
857 |   might be outdated at some time in the future.
858 | \item
859 |   \texttt{beamer} output seems to work best with
860 |   \texttt{latexPackage:\ gb4e}.
861 | \end{itemize}
862 | 
863 | \subsection{A note on Latex
864 | conversion}\label{a-note-on-latex-conversion}
865 | 
866 | Originally, I decided to write this filter as a two-pronged conversion,
867 | making a markdown version myself, but using a mapping to one of the many
868 | latex libraries for linguistics examples as a quick fix. I assumed that
869 | such a mapping would be the easy part. However, it turned out that the
870 | mapping to latex was much more difficult that I anticipated. Basically,
871 | it turned out that the `common denominator' that I was aiming for was
872 | not necessarily the `common denominator' provided by the latex packages.
873 | I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and
874 | expex) with growing dismay. This approach resulted in a first version.
875 | However, after this version was (more or less) finished, I realised that
876 | it would be better to first define the `common denominator' more clearly
877 | (as done here), and then implement this purely in Pandoc. From that
878 | basis I have then made attempts to map them to the various latex
879 | packages.
880 | 
881 | \subsection{A note on implementation}\label{a-note-on-implementation}
882 | 
883 | The basic structure of the examples are transformed into Pandoc tables.
884 | Tables are reasonably safe for converting in other formats. Care has
885 | been taken to add \texttt{classes} to all elements of the tables
886 | (e.g.~the preamble has the class \texttt{linguistic-example-preamble}).
887 | When exported formats are aware of these classes, they can be used to
888 | fine-tune the formatting. I have used a few such fine-tunings into the
889 | html output of this filter by adding a few CSS-style statements. The
890 | naming of the classes is quite transparent, using the form
891 | \texttt{linguistic-example-STRUCTURE}.
892 | 
893 | The whole table is encapsulated in a \texttt{div} with class \texttt{ex}
894 | and an id of the form \texttt{exNUMBER}. This means that an example can
895 | be directly referred to in web-links by using the hash-mechanism. For
896 | example, adding \texttt{\#ex3} to the end of a link will immediately
897 | jump to this example in a browser.
898 | 
899 | The current implementation is completely independent from the
900 | \href{https://pandoc.org/MANUAL.html\#numbered-example-lists}{Pandoc
901 | numbered examples implementation} and both can work side by side, like
902 | (2):
903 | 
904 | \begin{enumerate}
905 | \def\labelenumi{(\arabic{enumi})}
906 | \item
907 |   These are native Pandoc numbered examples
908 | \item
909 |   They are independent of \texttt{pandoc-ling} but use the same output
910 |   formatting in many default exports, like latex.
911 | \end{enumerate}
912 | 
913 | However, in practice various output-formats of Pandoc (e.g.~latex) also
914 | use numbers in round brackets for these, so in practice it might be
915 | confusing to combine both.
916 | 
917 | \end{document}
918 | 


--------------------------------------------------------------------------------
/docs/test.sh:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/env bash
 2 | 
 3 | # produces the readme in various formats
 4 | # the filter processVerbatim.lua add the verbatim examples as real markdown
 5 | 
 6 | # assumes Pandoc and a full Latex install
 7 | # langsci-gb4e.sty is made available here
 8 | 
 9 | # note that there are various errors in the output
10 | # they show current limitations
11 | 
12 | # basic formats
13 | 
14 | for format in html docx epub
15 | do
16 | 	pandoc ../readme.md -t markdown -L processVerbatim.lua -s --wrap=preserve | \
17 | 	pandoc -t $format -o readme.$format -L ../pandoc-ling.lua -s -N --toc --mathml -F pandoc-crossref --wrap=preserve
18 | done
19 | 
20 | # various latex variants, both tex and pdf
21 | 
22 | for package in linguex gb4e langsci-gb4e
23 | do
24 | 	pandoc ../readme.md -t markdown -L processVerbatim.lua -s | \
25 | 	pandoc -t latex -o readme_$package.tex -L ../pandoc-ling.lua -s -N --toc \
26 | 	--metadata latexPackage="$package"
27 | 
28 | 	pandoc ../readme.md -t markdown -L processVerbatim.lua -s | \
29 | 	pandoc -o readme_$package.pdf -L ../pandoc-ling.lua -N --toc \
30 | 	--metadata latexPackage="$package" --pdf-engine=xelatex
31 | done
32 | 
33 | # special settings for expex, errors with chapternumbers
34 | 
35 | pandoc ../readme.md -t markdown -L processVerbatim.lua -s | \
36 | pandoc -t latex -o readme_expex.tex -L ../pandoc-ling.lua -s -N --toc \
37 | --metadata latexPackage="expex" --metadata addChapterNumber="false"
38 | 
39 | pandoc ../readme.md -t markdown -L processVerbatim.lua -s | \
40 | pandoc -o readme_expex.pdf -L ../pandoc-ling.lua -N --toc \
41 | --metadata latexPackage="expex" --metadata addChapterNumber="false"
42 | 


--------------------------------------------------------------------------------
/figure/ExampleStructure.pages:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/figure/ExampleStructure.pages


--------------------------------------------------------------------------------
/figure/ExampleStructure.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/figure/ExampleStructure.pdf


--------------------------------------------------------------------------------
/figure/ExampleStructure.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/cysouw/pandoc-ling/4f416e600f693f1402bdcd1d1cfedbe693df1c61/figure/ExampleStructure.png


--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
  1 | # pandoc-ling
  2 | 
  3 | *Michael Cysouw* <<cysouw@mac.com>>
  4 | 
  5 | A Pandoc filter for linguistic examples
  6 | 
  7 | tl;dr
  8 | 
  9 | - Easily write linguistic examples including basic interlinear glossing. 
 10 | - Let numbering and cross-referencing be done for you. 
 11 | - Export to (almost) any format of your wishes for final polishing.
 12 | - As an example, check out this readme in [HTML](https://cysouw.github.io/pandoc-ling/readme.html) or [Latex](https://cysouw.github.io/pandoc-ling/readme_gb4e.pdf).
 13 | 
 14 | # Rationale
 15 | 
 16 | In the field of linguistics there is an outspoken tradition to format example sentences in research papers in a very specific way. In the field, it is a perennial problem to get such example sentences to look just right. Within Latex, there are numerous packages to deal with this problem (e.g. covington, linguex, gb4e, expex, etc.). Depending on your needs, there is some Latex solution for almost everyone. However, these solutions in Latex are often cumbersome to type, and they are not portable to other formats. Specifically, transfer between latex, html, docx, odt or epub would actually be highly desirable. Such transfer is the hallmark of [Pandoc](https://pandoc.org), a tool by John MacFarlane that provides conversion between these (and many more) formats. 
 17 | 
 18 | Any such conversion between text-formats naturally never works perfectly: every text-format has specific features that are not transferable to other formats. A central goal of Pandoc (at least in my interpretation) is to define a set of shared concepts for text-structure (a 'common denominator' if you will, but surely not 'least'!) that can then be mapped to other formats. In many ways, Pandoc tries (again) to define a set of logical concepts for text structure ('semantic markup'), which can then be formatted by your favourite typesetter. As long as you stay inside the realm of this 'common denominator' (in practice that means Pandoc's extended version of Markdown/CommonMark), conversion works reasonably well (think 90%-plus). 
 19 | 
 20 | Building on John Gruber's [Markdown philosophy](https://daringfireball.net/projects/markdown/syntax), there is a strong urge here to learn to restrain oneself while writing, and try to restrict the number of layout-possibilities to a minimum. In this sense, with `pandoc-ling` I propose a Markdown-structure for linguistic examples that is simple, easy to type, easy to read, and portable through the Pandoc universe by way of an extension mechanism of Pandoc, called a 'Pandoc Lua Filter'. This extension will not magically allow you to write every linguistic example thinkable, but my guess is that in practice the present proposal covers the majority of situations in linguistic publications (think 90%-plus). As an example (and test case) I have included automatic conversions into various formats in this repository (chech them out in the directory `tests` to get an idea of the strengths and weaknesses of the current implementation).
 21 | 
 22 | # The basic structure of a linguistic example
 23 | 
 24 | Basically, a linguistic example consists of 6 possible building blocks, of which only the number and at least one example line are necessary. The space between the building blocks is kept as minimal as possible without becoming cramped. When (optional) building blocks are not included, then the other blocks shift left and up (only exception: a preamble without labels is not shifted left completely, but left-aligned with the example, not with the judgement).
 25 | 
 26 | - **Number**: Running tally of all examples in the work, possibly restarting at chapters or other major headings. Typically between round brackets, possibly with a chapter number added before in long works, e.g. example (7.26). Aligned top-left, typically left-aligned to main text margin.
 27 | - **Preamble**: Optional information about the content/kind of example. Aligned top-left: to the top with the number, to the left with the (optional) label. When there is no label, then preamble is aligned with the example, not with the judgment.
 28 | - **Label**: Indices for sub-examples. Only present when there are more than one example grouped together inside one numbered entity. Typically these sub-example labels use latin letters followed by a full stop. They are left-aligned with the preamble, and each label is top-aligned with the top-line of the corresponding example (important for longer line-wrapped examples).
 29 | - **Judgment**: Examples can optionally have grammaticality judgments, typically symbols like **?!* sometimes in superscript relative to the corresponding example. judgements are right-aligned to each other, typically with only minimal space to the left-aligned examples.
 30 | - **Line example**: A minimal linguistic example has at least one line example, i.e. an utterance of interest. Building blocks in general shift left and up when other (optional) building blocks are not present. Minimally, this results in a number with one line example.
 31 | - **Interlinear example**: A complex structure typically used for examples from languages unknown to most readers. Consist of three or four lines that are left-aligned:
 32 | 	* **Header**: An optional header is typically used to display information about the language of the example, including literature references. When not present, then all other lines from the interlinear example shift upwards.
 33 | 	* **Source**: The actual language utterance, often typeset in italics. This line is internally separated at spaces, and each sub-block is left-aligned with the corresponding sub-blocks of the gloss.
 34 | 	* **Gloss**: Explanation of the meaning of the source, often using abbreviations in small caps. This line is internally separated at spaces, and each block is left-aligned with the block from source.
 35 | 	* **Translation**: Free translation of the source, typically quoted. Not separated in blocks, but freely extending to the right. Left-aligned with the other lines from the interlinear example. 
 36 | 
 37 | ![The structure of a linguistic example.](figure/ExampleStructure.png)
 38 | 
 39 | There are of course much more possibilities to extend the structure of a linguistic examples, like third or fourth subdivisions of labels (often using small roman numerals as a third level) or multiple glossing lines in the interlinear example. Also, the content of the header is sometimes found right-aligned to the right of the interlinear example (language into to the top, reference to the bottom). All such options are currently not supported by `pandoc-ling`.
 40 | 
 41 | Under the hood, this structure is prepared by `pandoc-ling` as a table. Tables are reasonably well transcoded to different document formats. Specific layout considerations mostly have to be set manually. Alignment of the text should work in most exports. Some `CSS` styling is proposed by `pandoc-ling`, but can of course be overruled. For latex (and beamer) special output is prepared using various available latex packages (see options, below).
 42 | 
 43 | # Introducing `pandoc-ling`
 44 | 
 45 | ## Editing linguistic examples
 46 | 
 47 | To include a linguistic example in Markdown `pandoc-ling` uses the `div` structure, which is indicated in Pandoc-Markdown by typing three colons at the start and three colons at the end. To indicate the `class` of this `div` the letters 'ex' (for 'example') should be added after the top colons (with or without space in between). This 'ex'-class is the signal for `pandoc-ling` to start processing such a `div`. The numbering of these examples will be inserted by `pandoc-ling`.
 48 | 
 49 | Empty lines can be added inside the `div` for visual pleasure, as they mostly do not have an influence on the output. Exception: do *not* use empty lines between unlabelled line examples. Multiple lines of text can be used (without empty lines in between), but they will simply be interpreted as one sequential paragraph.
 50 | 
 51 | ```
 52 | ::: ex
 53 | This is the most basic structure of a linguistic example. 
 54 | :::
 55 | ```
 56 | 
 57 | Alternatively, the `class` can be put in curled brackets (and then a leading full stop is necessary before `ex`). Inside these brackets more attributes can be added (separated by space), for example an id, using a hash, or any attribute=value pairs that should apply to this example. Currently there is only one real attribute implemented (`formatGloss`), but in principle it is possible to add more attributes that can be used to fine-tune the typesetting of the example (see below for a description of such `local options`).
 58 | 
 59 | ```
 60 | ::: {#id .ex formatGloss=false}
 61 | 
 62 | This is a multi-line example.
 63 | But that does not mean anything for the result
 64 | All these lines are simply treated as one paragraph.
 65 | They will become one example with one number.
 66 | 
 67 | :::
 68 | ```
 69 | 
 70 | A preamble can be added by inserting an empty line between preamble and example. The same considerations about multiple text-lines apply.
 71 | 
 72 | ```
 73 | :::ex
 74 | Preamble
 75 | 
 76 | This is an example with a preamble.
 77 | :::
 78 | ```
 79 | 
 80 | Sub-examples with labels are entered by starting each sub-example with a small latin letter and a full stop. Empty lines between labels are allowed. Subsequent lines without labels are treated as one paragraph. Empty lines *not* followed by a label with a full stop will result in errors.
 81 | 
 82 | ```
 83 | :::ex
 84 | a. This is the first example.
 85 | b. This is the second.
 86 | a. The actual letters are not important, `pandoc-ling` will put them in order.
 87 | 
 88 | e. Empty lines are allowed between labelled lines
 89 | Subsequent lines are again treated as one sequential paragraph.
 90 | :::
 91 | ```
 92 | 
 93 | A labelled list can be combined with a preamble.
 94 | 
 95 | ```
 96 | :::ex
 97 | Any nice description here
 98 | 
 99 | a. one example sentence.
100 | b. two
101 | c. three
102 | :::
103 | ```
104 | 
105 | Grammaticality judgements should be added before an example, and after an optional label, separated from both by spaces (though four spaces in a row should be avoided, that could lead to layout errors). To indicate that any sequence of symbols is a judgements, prepend the judgement with a caret `^`. Alignment will be figured out by `pandoc-ling`.
106 | 
107 | ```
108 | :::ex
109 | Throwing in a preamble for good measure
110 | 
111 | a. ^* This traditionally signals ungrammaticality.
112 | b. ^? Question-marks indicate questionable grammaticality.
113 | c. ^^whynot?^ But in principle any sequence can be used (here even in superscript).
114 | d. However, such long sequences sometimes lead to undesirable effects in the layout.
115 | :::
116 | ```
117 | 
118 | A minor detail is the alignment of a single example with a preamble and grammaticality judgements. In this case it looks better for the preamble to be left aligned with the example and not with the judgement.
119 | 
120 | ```
121 | :::ex
122 | Here is a special case with a preamble
123 | 
124 | ^^???^ With a singly questionably example.
125 | Note the alignment! Especially with this very long example
126 | that should go over various lines in the output.
127 | :::
128 | ```
129 | 
130 | For the lazy writers among us, it is also possible to use a simple bullet list instead of a labelled list. Note that the listed elements will still be formatted as a labelled list.
131 | 
132 | ```
133 | :::ex
134 | - This is a lazy example.
135 | - ^# It should return letters at the start just as before.
136 | - ^% Also testing some unusual judgements.
137 | :::
138 | ```
139 | 
140 | Just for testing: a single example with a judgement (which resulted in an error in earlier versions).
141 | 
142 | ```
143 | ::: ex
144 | ^* This traditionally signals ungrammaticality.
145 | :::
146 | ```
147 | 
148 | ## Interlinear examples
149 | 
150 | For interlinear examples with aligned source and gloss, the structure of a `lineblock` is used, starting the lines with a vertical line `|`. There should always be four vertical lines (for header, source, gloss and translation, respectively), although the content after the first vertical line can be empty. The source and gloss lines are separated at spaces, and all parts are right-aligned. If you want to have a space that is not separated, you will have to 'protect' the space, either by putting a backslash before the space, or by inserting a non-breaking space instead of a normal space (either type `&nbsp;` or insert an actual non-breaking space, i.e. unicode character `U+00A0`).
151 | 
152 | ```
153 | :::ex
154 | | Dutch (Germanic)
155 | | Deze zin is in het nederlands.
156 | | DEM sentence AUX in DET dutch.
157 | | This sentence is dutch.
158 | :::
159 | ```
160 | 
161 | An attempt is made to format interlinear examples when the option `formatGloss=true` is added. This will:
162 | 
163 | - remove formatting from the source and set everything in italics,
164 | - remove formatting from the gloss and set sequences (>1) of capitals and numbers into small caps (note that the positioning of small caps on web pages is [highly complex](https://iamvdo.me/en/blog/css-font-metrics-line-height-and-vertical-align)),
165 | - a tilde `~` between spaces in the gloss is treated as a shortcut for an empty gloss (internally, the sequence `space-tilde-space` is replaced by `space-space-nonBreakingSpace-space-space`),
166 | - consistently put translations in single quotes, possibly removing other quotes.
167 | 
168 | ```
169 | ::: {.ex formatGloss=true}
170 | | Dutch (Germanic)
171 | | Is deze zin in het nederlands ?
172 | | AUX DEM sentence in DET dutch Q
173 | | Is this sentence dutch?
174 | :::
175 | ```
176 | 
177 | The results of such formatting will not always work, but it seems to be quite robust in my testing. The next example brings everything together:
178 | 
179 | - a preamble,
180 | - labels, both for single lines and for interlinear examples,
181 | - interlinear examples start on a new line immediately after the letter-label,
182 | - grammaticality judgements with proper alignment,
183 | - when the header of an interlinear example is left out, everything is shifted up,
184 | - The formatting of the interlinear is harmonised.
185 | 
186 | ```
187 | ::: {.ex formatGloss=true samePage=false}
188 | Completely superfluous preamble, but it works ...
189 | 
190 | a.
191 | | Dutch (Germanic) Note the grammaticality judgement!
192 | | ^^:–)^ Deze zin is (dit\ is&nbsp;test) nederlands.
193 | | DEM sentence AUX ~ dutch.
194 | | This sentence is dutch.
195 | 
196 | b.
197 | | 
198 | | Deze tweede zin heeft geen header.
199 | | DEM second sentence have.3SG.PRES no header.
200 | | This second sentence does not have a header.
201 | 
202 | a. Mixing single line examples with interlinear examples.
203 | a. This is of course highly unusal.
204 | Just for this example, let's add some extra material in this example.
205 | :::
206 | ```
207 | 
208 | Also, as a quick workaround for showing multiple source lines without alignment with the glossing (e.g. for phonetic or orthographic representations of the example), it is possible to use the header of interlinear example. For a line break in the header, use the double backslash `\\`, either inline or at the end of a line. When you type a header using multiple lines (as shown below), then subsequent lines have to start with space. For now, this only works in the header line.
209 | 
210 | ```
211 | ::: ex
212 | | Example with an multiline header \\
213 |   *can be used for orthographic representations*, \\
214 |   or phonetic transcription, \\ or for whatever you like
215 | | Dit is een lui voorbeeld=je
216 | | DEM COP DET lazy example=DIM
217 | | This is a lazy example.
218 | :::
219 | ```
220 | 
221 | ## Cross-referencing examples
222 | 
223 | The examples are automatically numbered by `pandoc-ling`. Cross-references to examples inside a document can be made by using the `[@ID]` format (used by Pandoc for citations). When an example has an explicit identifier (like `#test` in the next example), then a reference can be made to this example with `[@test]`, leading to [@test] when formatted (note that the formatting does not work on the github website. Please check the 'docs' subdirectory).
224 | 
225 | ```
226 | ::: {#test .ex}
227 | This is a test
228 | :::
229 | ```
230 | 
231 | Inspired by the `linguex`-approach, you can also use the keywords `next` or `last` to refer to the next or the last example, e.g. `[@last]` will be formatted as [@last]. By doubling the first letters to `nnext` or `llast` reference to the next/last-but-one can be made. Actually, the number of starting letters can be repeated at will in `pandoc-ling`, so something like `[@llllllllast]` will also work. It will be formatted as [@llllllllast] after the processing of `pandoc-ling`. Needless to say that in such a situation an explicit identifier would be a better choice.
232 | 
233 | Referring to sub-examples can be done by manually adding a suffix into the cross reference, simply separated from the identifier by a space. For example, `[@lllast c]` will refer to the third sub-example of the last-but-two example. Formatted this will look like this: [@llast c], smile! However, note that the "c" has to be manually determined. It is simply a literal suffix that will be copied into the cross-reference. Something like `[@last hA1l0]` will work also, leading to [@last hA1l0] when formatted (which is of course nonsensical).
234 | 
235 | For exports that include attributes (like html), the examples have an explicit id of the form `exNUMBER` in which `NUMBER` is the actual number as given in the formatted output. This means that it is possible to refer to an example on any web-page by using the hash-mechanism to refer to a part of the web-page. For example `#ex4.7` at can be used to refer to the seventh example in the html-output of this readme (try [this link](https://cysouw.github.io/pandoc-ling/readme.html#ex4.7)). The id in this example has a chapter number '4' because in the html conversion I have set the option `addChapterNumber` to `true`. (Note: when numbers restart the count in each chapter with the option `restartAtChapter`, then the id is of the form `exCHAPTER.NUMBER`. This is necessary to resolve clashing ids, as the same number might then be used in different chapters.)
236 | 
237 | I propose to use these ids also to refer to examples in citations when writing scholarly papers, e.g. (Cysouw 2021: #ex7), independent of whether the links actually resolve. In principle, such citations could easily be resolved when online publications are properly prepared. The same proposal could also work for other parts of research papers, for example using tags like `#sec, #fig, #tab, #eq` (see the Pandoc filter [`crossref-adapt`](https://github.com/cysouw/crossref-adapt)). To refer to paragraphs (which should replace page numbers in a future of adaptive design), I propose to use no tag, but directly add the number to the hash (see the Pandoc filter [`count-para`](https://github.com/cysouw/count-para) for a practical mechanism to add such numbering).
238 | 
239 | ## Options of `pandoc-ling`
240 | 
241 | ### Global options
242 | 
243 | The following global options are available with `pandoc-ling`. These can be added to the [Pandoc metadata](https://pandoc.org/MANUAL.html#metadata-blocks). An example of such metadata can be found at the bottom of this `readme` in the form of a YAML-block. Pandoc allows for various methods to provide metadata (see the link above).
244 | 
245 | - **`formatGloss`** (boolean, default `false`): should all interlinear examples be consistently formatted? If you use this option, you can simply use capital letters for abbreviations in the gloss, and they will be changed to small caps. The source line is set to italics, and the translations is put into single quotes.
246 | -  **`samePage`** (boolean, default `true`, only for Latex): should examples be kept together on the same page? Can also be overriden for individual examples by adding `{.ex samePage=false}` at the start of an example (cf. below on `local options`).
247 | - **`xrefSuffixSep`** (string, defaults to no-break-space): When cross references have a suffix, how should the separator be formatted? The defaults 'no-break-space' is a safe options. I personally like a 'narrow no-break space' better (Unicode `U+202F`), but this symbol does not work with all fonts, and might thus lead to errors. For Latex typesetting, all space-like symbols are converted to a Latex thin space `\,`. 
248 | - **`restartAtChapter`** (boolean, default `false`): should the counting restart for each chapter? 
249 |   * Actually, when `true` this setting will restart the counting at the highest heading level, which for various output formats can be set by the Pandoc option `top-level-division`. 
250 |   * The id of each example will now be of the form `exCHAPTER.NUMBER` to resolve any clashes when the same number appears in different chapter.
251 |   * Depending on your Latex setup, an explicit entry `top-level-division: chapter` might be necessary in your metadata. 
252 | - **`addChapterNumber`** (boolean, default `false`): should the chapter (= highest heading level) number be added to the number of the example? When setting this to `true` any setting of `restartAtChapter` will be ignored. In most Latex situations this only works in combination with a `documentclass: book`.
253 | - **`latexPackage`** (one of: `linguex`, `gb4e`, `langsci-gb4e`, `expex`, default `linguex`): Various options for converting examples to Latex packages that typeset linguistic examples. None of the conversions works perfectly, though in should work in most normal situations (think 90%-plus). It might be necessary to first convert to `Latex`, correct the output, and then typeset separately with a latex compiler like `xelatex`. Using the direct option insider Pandoc might also work in many situations. Export to **`beamer`** seems to work reasonably well with the `gb4e` package. All others have artefacts or errors.
254 | 
255 | ### Local options
256 | 
257 | Local options are options that can be set for each individual example. The `formatGloss` option can be used to have an individual example be formatted differently from the global setting. For example, when the global setting is `formatGloss: true` in the metadata, then adding `formatGloss=false` in the curly brackets of a specific example will block the formatting. This is especially useful when the automatic formatting does not give the desired result.
258 | 
259 | If you want to add something else (not a linguistic example) in a numbered example, then there is the local option `noFormat=true`. An attempt will be made to try and do a reasonable layout. Multiple paragraphs will simply we taken as is, and the number will be put in front. In HTML the number will be centred. It is usable for an incidental mathematical formula.
260 | 
261 | ```
262 | ::: {.ex noFormat=true}
263 | $$\sum_{i=1}^{n}{i}=\frac{n^2-n}{2}$$
264 | :::
265 | ```
266 | 
267 | ## Issues with `pandoc-ling`
268 | 
269 | - Manually provided identifiers for examples should not be purely numerical (so do not use e.g. `#5789`). In some situation this interferes with the setting of the cross-references.
270 | - Because the cross-references use the same structure as citations in Pandoc, the processing of citations (by `citeproc`) should be performed **after** the processing by `pandoc-ling`. Another Pandoc filter, [`pandoc-crossref`](https://github.com/lierdakil/pandoc-crossref), for numbering figures and other captions, also uses the same system. There seems to be no conflict between `pandoc-ling` and `pandoc-crossref`.
271 | - Interlinear examples will will not wrap at the end of the page. There is no solution yet for longer examples that are longer than the size of the page.
272 | - It is not (yet) possible to have more than one glossing line.
273 | - When exporting to `docx` there is a problem because there are paragraphs inserted after tables, which adds space in lists with multiple interlinear examples (except when they have exactly the same number of columns). This is [by design](https://answers.microsoft.com/en-us/msoffice/forum/msoffice_word-mso_windows8-mso_2013_release/how-to-remove-extra-paragraph-after-table/995b3811-9f55-4df1-bbbc-9f672b1ad262). The official solution is to set font-size to 1 for this paragraph inside MS Word.
274 | - Multi-column cells are crucial for `pandoc-ling` to work properly. These are only introduced in new table format with Pandoc 2.10 (so older Pandoc version are not supported). Also note that these structures are not yet exported to all formats, e.g. it will not be displayed correctly in `docx`. However, this is currently an area of active development
275 | - `langsci-gb4e` is only available as part of the [`langsci` package](https://ctan.org/pkg/langsci?lang=en). You have to make it available to Pandoc, e.g. by adding it into the same directory as the pandoc-ling.lua filter. I have added a recent version of `langsci-gb4e`  here for convenience, but this one might be outdated at some time in the future.
276 | - `beamer` output seems to work best with `latexPackage: gb4e`.
277 | 
278 | ## A note on Latex conversion
279 | 
280 | Originally, I decided to write this filter as a two-pronged conversion, making a markdown version myself, but using a mapping to one of the many latex libraries for linguistics examples as a quick fix. I assumed that such a mapping would be the easy part. However, it turned out that the mapping to latex was much more difficult that I anticipated. Basically, it turned out that the 'common denominator' that I was aiming for was not necessarily the 'common denominator' provided by the latex packages. I worked on mapping to various packages (linguex, gb4e, langsci-gb4e and expex) with growing dismay. This approach resulted in a first version. However, after this version was (more or less) finished, I realised that it would be better to first define the 'common denominator' more clearly (as done here), and then implement this purely in Pandoc. From that basis I have then made attempts to map them to the various latex packages.
281 | 
282 | ## A note on implementation
283 | 
284 | The basic structure of the examples are transformed into Pandoc tables. Tables are reasonably safe for converting in other formats. Care has been taken to add `classes` to all elements of the tables (e.g. the preamble has the class `linguistic-example-preamble`). When exported formats are aware of these classes, they can be used to fine-tune the formatting. I have used a few such fine-tunings into the html output of this filter by adding a few CSS-style statements. The naming of the classes is quite transparent, using the form `linguistic-example-STRUCTURE`.
285 | 
286 | The whole table is encapsulated in a `div` with class `ex` and an id of the form `exNUMBER`. This means that an example can be directly referred to in web-links by using the hash-mechanism. For example, adding `#ex3` to the end of a link will immediately jump to this example in a browser.
287 | 
288 | The current implementation is completely independent from the [Pandoc numbered examples implementation](https://pandoc.org/MANUAL.html#numbered-example-lists) and both can work side by side, like (@second):
289 | 
290 | (@) These are native Pandoc numbered examples
291 | 
292 | (@second) They are independent of `pandoc-ling` but use the same output formatting in many default exports, like latex.
293 | 
294 | However, in practice various output-formats of Pandoc (e.g. latex) also use numbers in round brackets for these, so in practice it might be confusing to combine both.
295 | 
296 | ---
297 | author: Michael Cysouw
298 | title: Using pandoc-ling
299 | addChapterNumber: true
300 | ...
301 | 


--------------------------------------------------------------------------------