├── .github
└── workflows
│ ├── bb.yml
│ └── main.yml
├── .gitignore
├── .npmrc
├── logo.svg
├── package.json
└── readme.md
/.github/workflows/bb.yml:
--------------------------------------------------------------------------------
1 | jobs:
2 | main:
3 | runs-on: ubuntu-latest
4 | steps:
5 | - uses: unifiedjs/beep-boop-beta@main
6 | with:
7 | repo-token: ${{secrets.GITHUB_TOKEN}}
8 | name: bb
9 | on:
10 | issues:
11 | types: [closed, edited, labeled, opened, reopened, unlabeled]
12 | pull_request_target:
13 | types: [closed, edited, labeled, opened, reopened, unlabeled]
14 |
--------------------------------------------------------------------------------
/.github/workflows/main.yml:
--------------------------------------------------------------------------------
1 | jobs:
2 | main:
3 | runs-on: ubuntu-latest
4 | steps:
5 | - uses: actions/checkout@v4
6 | - uses: actions/setup-node@v4
7 | with:
8 | node-version: node
9 | - run: npm install
10 | - run: npm test
11 | name: main
12 | on:
13 | - pull_request
14 | - push
15 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | *.log
3 | node_modules/
4 |
--------------------------------------------------------------------------------
/.npmrc:
--------------------------------------------------------------------------------
1 | ignore-scripts=true
2 | package-lock=false
3 |
--------------------------------------------------------------------------------
/logo.svg:
--------------------------------------------------------------------------------
1 |
14 |
--------------------------------------------------------------------------------
/package.json:
--------------------------------------------------------------------------------
1 | {
2 | "author": "Titus Wormer (wooorm.com)",
3 | "bugs": "https://github.com/syntax-tree/nlcst/issues",
4 | "contributors": [
5 | "Eugene Sharygin ",
6 | "Titus Wormer (wooorm.com)"
7 | ],
8 | "description": "natural language concrete syntax tree",
9 | "devDependencies": {
10 | "remark-cli": "^12.0.0",
11 | "remark-preset-wooorm": "^10.0.0"
12 | },
13 | "keywords": [],
14 | "license": "MIT",
15 | "name": "nlcst",
16 | "private": true,
17 | "remarkConfig": {
18 | "plugins": [
19 | "remark-preset-wooorm"
20 | ]
21 | },
22 | "repository": "syntax-tree/nlcst",
23 | "scripts": {
24 | "format": "remark . -qfo",
25 | "test": "npm run format"
26 | },
27 | "version": "0.0.0"
28 | }
29 |
--------------------------------------------------------------------------------
/readme.md:
--------------------------------------------------------------------------------
1 | # ![nlcst][logo]
2 |
3 | **N**atural **L**anguage **C**oncrete **S**yntax **T**ree format.
4 |
5 | ***
6 |
7 | **nlcst** is a specification for representing natural language in a [syntax
8 | tree][syntax-tree].
9 | It implements the **[unist][]** spec.
10 |
11 | This document may not be released.
12 | See [releases][] for released documents.
13 | The latest released version is [`1.0.2`][latest].
14 |
15 | ## Contents
16 |
17 | * [Introduction](#introduction)
18 | * [Where this specification fits](#where-this-specification-fits)
19 | * [Types](#types)
20 | * [Nodes (abstract)](#nodes-abstract)
21 | * [`Literal`](#literal)
22 | * [`Parent`](#parent)
23 | * [Nodes](#nodes)
24 | * [`Paragraph`](#paragraph)
25 | * [`Punctuation`](#punctuation)
26 | * [`Root`](#root)
27 | * [`Sentence`](#sentence)
28 | * [`Source`](#source)
29 | * [`Symbol`](#symbol)
30 | * [`Text`](#text)
31 | * [`WhiteSpace`](#whitespace)
32 | * [`Word`](#word)
33 | * [Glossary](#glossary)
34 | * [List of utilities](#list-of-utilities)
35 | * [Related](#related)
36 | * [References](#references)
37 | * [Contribute](#contribute)
38 | * [Acknowledgments](#acknowledgments)
39 | * [License](#license)
40 |
41 | ## Introduction
42 |
43 | This document defines a format for representing natural language as a [concrete
44 | syntax tree][syntax-tree].
45 | Development of nlcst started in May 2014,
46 | in the now deprecated [textom][] project for [retext][],
47 | before [unist][] existed.
48 | This specification is written in a [Web IDL][webidl]-like grammar.
49 |
50 | ### Where this specification fits
51 |
52 | nlcst extends [unist][],
53 | a format for syntax trees,
54 | to benefit from its [ecosystem of utilities][utilities].
55 |
56 | nlcst relates to [JavaScript][] in that it has an [ecosystem of
57 | utilities][list-of-utilities] for working with compliant syntax trees in
58 | JavaScript.
59 | However,
60 | nlcst is not limited to JavaScript and can be used in other programming
61 | languages.
62 |
63 | nlcst relates to the [unified][] and [retext][] projects in that nlcst syntax
64 | trees are used throughout their ecosystems.
65 |
66 | ## Types
67 |
68 | If you are using TypeScript,
69 | you can use the nlcst types by installing them with npm:
70 |
71 | ```sh
72 | npm install @types/nlcst
73 | ```
74 |
75 | ## Nodes (abstract)
76 |
77 | ### `Literal`
78 |
79 | ```idl
80 | interface Literal <: UnistLiteral {
81 | value: string
82 | }
83 | ```
84 |
85 | **Literal** ([**UnistLiteral**][dfn-unist-literal]) represents a node in nlcst
86 | containing a value.
87 |
88 | Its `value` field is a `string`.
89 |
90 | ### `Parent`
91 |
92 | ```idl
93 | interface Parent <: UnistParent {
94 | children: [Paragraph | Punctuation | Sentence | Source | Symbol | Text | WhiteSpace | Word]
95 | }
96 | ```
97 |
98 | **Parent** ([**UnistParent**][dfn-unist-parent]) represents a node in nlcst
99 | containing other nodes (said to be [*children*][term-child]).
100 |
101 | Its content is limited to only other nlcst content.
102 |
103 | ## Nodes
104 |
105 | ### `Paragraph`
106 |
107 | ```idl
108 | interface Paragraph <: Parent {
109 | type: 'ParagraphNode'
110 | children: [Sentence | Source | WhiteSpace]
111 | }
112 | ```
113 |
114 | **Paragraph** ([**Parent**][dfn-parent]) represents a unit of discourse dealing
115 | with a particular point or idea.
116 |
117 | **Paragraph** can be used in a [**root**][dfn-root] node.
118 | It can contain [**sentence**][dfn-sentence],
119 | [**whitespace**][dfn-whitespace],
120 | and [**source**][dfn-source] nodes.
121 |
122 | ### `Punctuation`
123 |
124 | ```idl
125 | interface Punctuation <: Literal {
126 | type: 'PunctuationNode'
127 | }
128 | ```
129 |
130 | **Punctuation** ([**Literal**][dfn-literal]) represents typographical devices
131 | which aid understanding and correct reading of other grammatical units.
132 |
133 | **Punctuation** can be used in [**sentence**][dfn-sentence] or
134 | [**word**][dfn-word] nodes.
135 |
136 | ### `Root`
137 |
138 | ```idl
139 | interface Root <: Parent {
140 | type: 'RootNode'
141 | }
142 | ```
143 |
144 | **Root** ([**Parent**][dfn-parent]) represents a document.
145 |
146 | **Root** can be used as the [*root*][term-root] of a [*tree*][term-tree],
147 | never as a [*child*][term-child].
148 | Its content model is not limited,
149 | it can contain any nlcst content,
150 | with the restriction that all content must be of the same category.
151 |
152 | ### `Sentence`
153 |
154 | ```idl
155 | interface Sentence <: Parent {
156 | type: 'SentenceNode'
157 | children: [Punctuation | Source | Symbol | WhiteSpace | Word]
158 | }
159 | ```
160 |
161 | **Sentence** ([**Parent**][dfn-parent]) represents grouping of grammatically
162 | linked words,
163 | that in principle tells a complete thought,
164 | although it may make little sense taken in isolation out of context.
165 |
166 | **Sentence** can be used in a [**paragraph**][dfn-paragraph] node.
167 | It can contain [**word**][dfn-word],
168 | [**symbol**][dfn-symbol],
169 | [**punctuation**][dfn-punctuation],
170 | [**whitespace**][dfn-whitespace],
171 | and [**source**][dfn-source] nodes.
172 |
173 | ### `Source`
174 |
175 | ```idl
176 | interface Source <: Literal {
177 | type: 'SourceNode'
178 | }
179 | ```
180 |
181 | **Source** ([**Literal**][dfn-literal]) represents an external (ungrammatical)
182 | value embedded into a grammatical unit: a hyperlink,
183 | code,
184 | and such.
185 |
186 | **Source** can be used in [**root**][dfn-root],
187 | [**paragraph**][dfn-paragraph],
188 | [**sentence**][dfn-sentence],
189 | or [**word**][dfn-word] nodes.
190 |
191 | ### `Symbol`
192 |
193 | ```idl
194 | interface Symbol <: Literal {
195 | type: 'SymbolNode'
196 | }
197 | ```
198 |
199 | **Symbol** ([**Literal**][dfn-literal]) represents typographical devices
200 | different from characters which represent sounds (like letters and numerals),
201 | white space,
202 | or punctuation.
203 |
204 | **Symbol** can be used in [**sentence**][dfn-sentence] or [**word**][dfn-word]
205 | nodes.
206 |
207 | ### `Text`
208 |
209 | ```idl
210 | interface Text <: Literal {
211 | type: 'TextNode'
212 | }
213 | ```
214 |
215 | **Text** ([**Literal**][dfn-literal]) represents actual content in nlcst
216 | documents: one or more characters.
217 |
218 | **Text** can be used in [**word**][dfn-word] nodes.
219 |
220 | ### `WhiteSpace`
221 |
222 | ```idl
223 | interface WhiteSpace <: Literal {
224 | type: 'WhiteSpaceNode'
225 | }
226 | ```
227 |
228 | **WhiteSpace** ([**Literal**][dfn-literal]) represents typographical devices
229 | devoid of content,
230 | separating other units.
231 |
232 | **WhiteSpace** can be used in [**root**][dfn-root],
233 | [**paragraph**][dfn-paragraph],
234 | or [**sentence**][dfn-sentence] nodes.
235 |
236 | ### `Word`
237 |
238 | ```idl
239 | interface Word <: Parent {
240 | type: 'WordNode'
241 | children: [Punctuation | Source | Symbol | Text]
242 | }
243 | ```
244 |
245 | **Word** ([**Parent**][dfn-parent]) represents the smallest element that may be
246 | uttered in isolation with semantic or pragmatic content.
247 |
248 | **Word** can be used in a [**sentence**][dfn-sentence] node.
249 | It can contain [**text**][dfn-text],
250 | [**symbol**][dfn-symbol],
251 | [**punctuation**][dfn-punctuation],
252 | and [**source**][dfn-source] nodes.
253 |
254 | ## Glossary
255 |
256 | See the [unist glossary][glossary].
257 |
258 | ## List of utilities
259 |
260 | See the [unist list of utilities][utilities] for more utilities.
261 |
262 | * [`nlcst-affix-emoticon-modifier`](https://github.com/syntax-tree/nlcst-affix-emoticon-modifier)
263 | — merge affix emoticons into the previous sentence
264 | * [`nlcst-emoji-modifier`](https://github.com/syntax-tree/nlcst-emoji-modifier)
265 | — support emoji
266 | * [`nlcst-emoticon-modifier`](https://github.com/syntax-tree/nlcst-emoticon-modifier)
267 | — support emoticons
268 | * [`nlcst-is-literal`](https://github.com/syntax-tree/nlcst-is-literal)
269 | — check whether a node is meant literally
270 | * [`nlcst-normalize`](https://github.com/syntax-tree/nlcst-normalize)
271 | — normalize a word for easier comparison
272 | * [`nlcst-search`](https://github.com/syntax-tree/nlcst-search)
273 | — search for patterns
274 | * [`nlcst-to-string`](https://github.com/syntax-tree/nlcst-to-string)
275 | — serialize a node
276 | * [`nlcst-test`](https://github.com/syntax-tree/nlcst-test)
277 | — validate a node
278 | * [`mdast-util-to-nlcst`](https://github.com/syntax-tree/mdast-util-to-nlcst)
279 | — transform mdast to nlcst
280 | * [`hast-util-to-nlcst`](https://github.com/syntax-tree/hast-util-to-nlcst)
281 | — transform hast to nlcst
282 |
283 | ## Related
284 |
285 | * [mdast](https://github.com/syntax-tree/mdast)
286 | — Markdown Abstract Syntax Tree format
287 | * [hast](https://github.com/syntax-tree/hast)
288 | — Hypertext Abstract Syntax Tree format
289 | * [xast](https://github.com/syntax-tree/xast)
290 | — Extensible Abstract Syntax Tree
291 |
292 | ## References
293 |
294 | * **unist**:
295 | [Universal Syntax Tree][unist].
296 | T. Wormer; et al.
297 | * **JavaScript**:
298 | [ECMAScript Language Specification][javascript].
299 | Ecma International.
300 | * **Web IDL**:
301 | [Web IDL][webidl],
302 | C. McCormack.
303 | W3C.
304 |
305 | ## Contribute
306 |
307 | See [`contributing.md`][contributing] in [`syntax-tree/.github`][health] for
308 | ways to get started.
309 | See [`support.md`][support] for ways to get help.
310 | Ideas for new utilities and tools can be posted in [`syntax-tree/ideas`][ideas].
311 |
312 | A curated list of awesome syntax-tree,
313 | unist,
314 | mdast,
315 | hast,
316 | xast,
317 | and nlcst resources can be found in [awesome syntax-tree][awesome].
318 |
319 | This project has a [code of conduct][coc].
320 | By interacting with this repository,
321 | organization,
322 | or community you agree to abide by its terms.
323 |
324 | ## Acknowledgments
325 |
326 | The initial release of this project was authored by
327 | [**@wooorm**](https://github.com/wooorm).
328 |
329 | Thanks to
330 | [**@nwtn**](https://github.com/nwtn),
331 | [**@tmcw**](https://github.com/tmcw),
332 | [**@muraken720**](https://github.com/muraken720),
333 | and [**@dozoisch**](https://github.com/dozoisch)
334 | for contributing to nlcst and related projects!
335 |
336 | ## License
337 |
338 | [CC-BY-4.0][license] © [Titus Wormer][author]
339 |
340 |
341 |
342 | [license]: https://creativecommons.org/licenses/by/4.0/
343 |
344 | [author]: https://wooorm.com
345 |
346 | [logo]: https://raw.githubusercontent.com/syntax-tree/nlcst/a89561d/logo.svg?sanitize=true
347 |
348 | [health]: https://github.com/syntax-tree/.github
349 |
350 | [contributing]: https://github.com/syntax-tree/.github/blob/HEAD/contributing.md
351 |
352 | [support]: https://github.com/syntax-tree/.github/blob/HEAD/support.md
353 |
354 | [coc]: https://github.com/syntax-tree/.github/blob/HEAD/code-of-conduct.md
355 |
356 | [awesome]: https://github.com/syntax-tree/awesome-syntax-tree
357 |
358 | [ideas]: https://github.com/syntax-tree/ideas
359 |
360 | [releases]: https://github.com/syntax-tree/nlcst/releases
361 |
362 | [latest]: https://github.com/syntax-tree/nlcst/releases/tag/1.0.2
363 |
364 | [list-of-utilities]: #list-of-utilities
365 |
366 | [dfn-unist-parent]: https://github.com/syntax-tree/unist#parent
367 |
368 | [dfn-unist-literal]: https://github.com/syntax-tree/unist#literal
369 |
370 | [dfn-parent]: #parent
371 |
372 | [dfn-literal]: #literal
373 |
374 | [dfn-root]: #root
375 |
376 | [dfn-paragraph]: #paragraph
377 |
378 | [dfn-sentence]: #sentence
379 |
380 | [dfn-word]: #word
381 |
382 | [dfn-symbol]: #symbol
383 |
384 | [dfn-punctuation]: #punctuation
385 |
386 | [dfn-whitespace]: #whitespace
387 |
388 | [dfn-text]: #text
389 |
390 | [dfn-source]: #source
391 |
392 | [term-tree]: https://github.com/syntax-tree/unist#tree
393 |
394 | [term-child]: https://github.com/syntax-tree/unist#child
395 |
396 | [term-root]: https://github.com/syntax-tree/unist#root
397 |
398 | [unist]: https://github.com/syntax-tree/unist
399 |
400 | [syntax-tree]: https://github.com/syntax-tree/unist#syntax-tree
401 |
402 | [javascript]: https://www.ecma-international.org/ecma-262/9.0/index.html
403 |
404 | [webidl]: https://heycam.github.io/webidl/
405 |
406 | [glossary]: https://github.com/syntax-tree/unist#glossary
407 |
408 | [utilities]: https://github.com/syntax-tree/unist#list-of-utilities
409 |
410 | [unified]: https://github.com/unifiedjs/unified
411 |
412 | [retext]: https://github.com/retextjs/retext
413 |
414 | [textom]: https://github.com/wooorm/textom
415 |
--------------------------------------------------------------------------------