├── includeme2.sam ├── samparser.bat ├── circle.png ├── includeme1.sam ├── docsource ├── menu.sami ├── index.sam ├── samschemaspec.sam ├── parser.sam ├── recipes.sam └── quickstart.sam ├── includeme.sam ├── editors.txt ├── license.txt ├── .gitattributes ├── .gitignore ├── statemachine.py ├── schema └── ideas │ ├── recipe.schema.sam │ └── schema.schema.sam ├── structuredocs.sams ├── schema.schema.sam ├── sq.xml ├── docs ├── index.html ├── sam.css ├── samschemaspec.html ├── parser.html └── recipes.html ├── statemachine_test.py ├── test.xsd ├── README.md ├── test.xslt ├── LICENSE-2.0.txt ├── Eclipse Public License - Version 1.0.html └── samsparser.py /includeme2.sam: -------------------------------------------------------------------------------- 1 | Nested includes work! Yipee! 2 | -------------------------------------------------------------------------------- /samparser.bat: -------------------------------------------------------------------------------- 1 | @py -3 %~dp0samparser.py %* 2 | 3 | -------------------------------------------------------------------------------- /circle.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mbakeranalecta/sam/HEAD/circle.png -------------------------------------------------------------------------------- /includeme1.sam: -------------------------------------------------------------------------------- 1 | 2 | 3 | message: 4 | This file includes another file. 5 | 6 | <<<(includeme2.sam) 7 | -------------------------------------------------------------------------------- /docsource/menu.sami: -------------------------------------------------------------------------------- 1 | menu: 2 | {Home}(link "index.html") | {Quickstart}(link "quickstart.html") | {Parser}(link "parser.html") | {Language}(link "language.html") | {Recipes}(link "recipes.html") | {Github}(https://github.com/mbakeranalecta/sam) -------------------------------------------------------------------------------- /includeme.sam: -------------------------------------------------------------------------------- 1 | 2 | 3 | message: Hello World. 4 | 5 | This is the "include" test -- file. 6 | 7 | +++ 8 | foo | bar 9 | baz | bat 10 | 11 | Annotation lookup test: {test phrase}(test). 12 | -------------------------------------------------------------------------------- /editors.txt: -------------------------------------------------------------------------------- 1 | List of Text Editors that work well with SAM 2 | 3 | * Notepad ++. Wraps lines to the indent of the first line. Some quirkiness in where the cursor goes when you hit return twice. 4 | 5 | * Sublime Text. Wraps lines to the indent of the first line. Position correct after hitting return twice. 6 | 7 | * Atom. Ditto. -------------------------------------------------------------------------------- /license.txt: -------------------------------------------------------------------------------- 1 | SAM is licensed for use, at the user's election, under the 2 | Eclipse Public License v1.0 or the Apache Software Foundation License v2.0. 3 | 4 | A copy of the Eclipse Public License 1.0 is 5 | available at 6 | http://www.eclipse.org/legal/epl-v10.html 7 | A copy of the Apache Software Foundation License 2.0 is available at 8 | http://opensource.org/licenses/apache2.0.php 9 | 10 | This statement must be included in any copies of SAM. 11 | -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | # Auto detect text files and perform LF normalization 2 | * text=auto 3 | 4 | # Custom for Visual Studio 5 | *.cs diff=csharp 6 | *.sln merge=union 7 | *.csproj merge=union 8 | *.vbproj merge=union 9 | *.fsproj merge=union 10 | *.dbproj merge=union 11 | 12 | # Standard to msysgit 13 | *.doc diff=astextplain 14 | *.DOC diff=astextplain 15 | *.docx diff=astextplain 16 | *.DOCX diff=astextplain 17 | *.dot diff=astextplain 18 | *.DOT diff=astextplain 19 | *.pdf diff=astextplain 20 | *.PDF diff=astextplain 21 | *.rtf diff=astextplain 22 | *.RTF diff=astextplain 23 | -------------------------------------------------------------------------------- /docsource/index.sam: -------------------------------------------------------------------------------- 1 | page: SAM Documentation Home 2 | 3 | <<<(menu.sami) 4 | 5 | SAM is an extensible markup language with syntax similar to Markdown but semantic capability similar to XML. 6 | 7 | * For a quick introduction to SAM, see {Quickstart}(link "quickstart.html"). 8 | * For a full description of the language, see {Language}(link "language.html"). 9 | * For some recipes for solving common markup issues with SAM, see {Recipes}(link "recipes.html"). 10 | * To run the SAM parser, see {Parser}(link "parser.html"). 11 | * To install the SAM parser, see the {Github page}(https://github.com/mbakeranalecta/sam). -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Windows image file caches 2 | Thumbs.db 3 | ehthumbs.db 4 | 5 | # Folder config file 6 | Desktop.ini 7 | 8 | # Recycle Bin used on file shares 9 | $RECYCLE.BIN/ 10 | 11 | # Windows Installer files 12 | *.cab 13 | *.msi 14 | *.msm 15 | *.msp 16 | 17 | # ========================= 18 | # Operating System Files 19 | # ========================= 20 | 21 | # OSX 22 | # ========================= 23 | 24 | .DS_Store 25 | .AppleDouble 26 | .LSOverride 27 | 28 | # Icon must ends with two \r. 29 | Icon 30 | 31 | 32 | # Thumbnails 33 | ._* 34 | 35 | # Files that might appear on external disk 36 | .Spotlight-V100 37 | .Trashes 38 | 39 | *.idea/* 40 | __pycache__/* 41 | *.sublime* -------------------------------------------------------------------------------- /statemachine.py: -------------------------------------------------------------------------------- 1 | class StateMachine: 2 | def __init__(self): 3 | self.handlers = {} 4 | self.startState = None 5 | self.endStates = [] 6 | 7 | def add_state(self, name, handler, end_state=0): 8 | name = name.upper() 9 | self.handlers[name] = handler 10 | if end_state: 11 | self.endStates.append(name) 12 | 13 | def set_start(self, name): 14 | self.startState = name.upper() 15 | 16 | def run(self, cargo): 17 | try: 18 | handler = self.handlers[self.startState] 19 | except: 20 | raise Exception("InitializationError: must call .set_start() before .run()") 21 | if not self.endStates: 22 | raise Exception("InitializationError: at least one state must be an end_state") 23 | 24 | while 1: 25 | (newState, cargo) = handler(cargo) 26 | if newState.upper() in self.endStates: 27 | break 28 | else: 29 | handler = self.handlers[newState.upper()] 30 | 31 | -------------------------------------------------------------------------------- /schema/ideas/recipe.schema.sam: -------------------------------------------------------------------------------- 1 | sam-schema: 2 | $namespace = http://example.com/ns/recipe 3 | $pattern.unit = each|tsp|tbsp|oz 4 | $pattern.path = 5 | 6 | templates: 7 | recipe: 8 | description: 9 | >>>(#text-general) 10 | ingredients:: ingredient, quality, unit 11 | xs:string, xs:int, >($pattern.unit) 12 | preparation: 13 | >>>(#ol) 14 | 15 | structures: 16 | ~~~(#text-general)(?many) 17 | p: 18 | >>>(#ol) 19 | >>>(#ul) 20 | 21 | ~~~(#ol) 22 | ol: 23 | li: 24 | p:(?repeat) 25 | 26 | ~~~(#ul) 27 | ul: 28 | li: 29 | p:(?repeat) 30 | 31 | annotations: 32 | |ingredient| xs:string 33 | |tool| xs:string 34 | |task| xs:string 35 | 36 | decorations: 37 | * bold 38 | * italic 39 | * code 40 | 41 | rename: 42 | |code| pre 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | -------------------------------------------------------------------------------- /structuredocs.sams: -------------------------------------------------------------------------------- 1 | ## Template based approach 2 | 3 | !namespace: http://spfeopentoolkit.org/ns/sam-schema 4 | $namespace = http://example.com/ns/structure-docs 5 | $word = \W+ 6 | 7 | structure-docs: xs:string 8 | 9 | description: 10 | ~~~(*text-general)(?many) 11 | p: 12 | ul: 13 | ol: 14 | 15 | group:(?repeat) 16 | title: xs:string 17 | # The title of the group 18 | description: 19 | # A description of the group 20 | >>>(*text-general) 21 | element:(?repeat) 22 | index:: type, term 23 | >($word), xs:string 24 | type: 25 | xs:string 26 | description: 27 | >>>(*text-general) 28 | 29 | 30 | # declarative approach using SAM syntax 31 | 32 | schema: 33 | block: sturcuture-docs 34 | >>>(#text-general) 35 | 36 | group:(#text-general)(?many) 37 | paragraph: 38 | block: ol 39 | block: ul 40 | 41 | 42 | # declarative approach using custom syntax 43 | 44 | structure-docs{} 45 | description 46 | @text-general 47 | *group 48 | title 49 | description 50 | @text-general 51 | +element 52 | @index 53 | type(xs:string) 54 | description 55 | @text-general 56 | 57 | @text-general 58 | *{ 59 | para=p 60 | ul 61 | ol 62 | } 63 | 64 | @annotations 65 | element-name 66 | group-name 67 | 68 | @decorations 69 | bold 70 | italic 71 | quotes 72 | code=pre 73 | 74 | @index 75 | index 76 | *entry::type(xs:NMTOKEN),+term 77 | 78 | -------------------------------------------------------------------------------- /schema.schema.sam: -------------------------------------------------------------------------------- 1 | sam-schema: 2 | $namespace = sam-schema 3 | 4 | template: 5 | sam-schema:(?any) 6 | $attributes = # 7 | template:(?anything) 8 | structures: 9 | fragment:(#repeat) 10 | annotations:(?anything) 11 | decorations:(?any) 12 | bold: >($pattern) 13 | italic: >($pattern) 14 | code: >($pattern) 15 | rename: 16 | ll: 17 | patterns: 18 | string-definition: 19 | 20 | 21 | structures: 22 | ~~~(#text-general)(?many) 23 | p: 24 | >>>(#ol) 25 | >>>(#ul) 26 | 27 | ~~~(#ol) 28 | ol: 29 | li: 30 | p:(?repeat) 31 | 32 | ~~~(#ul) 33 | ul: 34 | li: 35 | p:(?repeat) 36 | 37 | annotations: 38 | ingredient: xs:string 39 | tool: xs:string 40 | task: xs:string 41 | 42 | decorations: 43 | bold: xs:string 44 | italic: xs:string 45 | code: xs:string 46 | 47 | attributes: 48 | id: 49 | name: 50 | condition: 51 | 52 | citations: 53 | value: 54 | id: 55 | key: 56 | name: 57 | 58 | rename: 59 | |code| pre 60 | 61 | patterns: 62 | $unit = each|tsp|tbsp|oz 63 | 64 | -------------------------------------------------------------------------------- /schema/ideas/schema.schema.sam: -------------------------------------------------------------------------------- 1 | sam-schema: 2 | $namespace = sam-schema 3 | 4 | template: 5 | sam-schema:(?any) 6 | $attributes = # 7 | template:(?anything) 8 | structures: 9 | fragment:(#repeat) 10 | annotations:(?anything) 11 | decorations:(?any) 12 | bold: >($pattern) 13 | italic: >($pattern) 14 | code: >($pattern) 15 | rename: 16 | ll: 17 | patterns: 18 | string-definition: 19 | 20 | 21 | structures: 22 | ~~~(#text-general)(?many) 23 | p: 24 | >>>(#ol) 25 | >>>(#ul) 26 | 27 | ~~~(#ol) 28 | ol: 29 | li: 30 | p:(?repeat) 31 | 32 | ~~~(#ul) 33 | ul: 34 | li: 35 | p:(?repeat) 36 | 37 | annotations: 38 | ingredient: xs:string 39 | tool: xs:string 40 | task: xs:string 41 | 42 | decorations: 43 | bold: xs:string 44 | italic: xs:string 45 | code: xs:string 46 | 47 | attributes: 48 | id: 49 | name: 50 | condition: 51 | 52 | citations: 53 | value: 54 | id: 55 | key: 56 | name: 57 | 58 | rename: 59 | |code| pre 60 | 61 | patterns: 62 | $unit = each|tsp|tbsp|oz 63 | 64 | -------------------------------------------------------------------------------- /sq.xml: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | justquotes 5 | 6 | double_quote_close 7 | 8 | 9 | 10 | 11 | re_double_quote_open 12 | 13 | 14 | 15 | 16 | single_quote_close 17 | 18 | 19 | 20 | 21 | single_quote_open 22 | 23 | 24 | 25 | 26 | apostrophe 27 | 28 | 29 | 30 | 31 | 32 | justdashes 33 | 34 | en_dash 35 | 36 | 37 | 38 | 39 | em_dash 40 | 41 | 42 | 43 | 44 | 45 | -------------------------------------------------------------------------------- /docs/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | SAM Documentation Home 5 | 6 | 7 | 8 | 9 |
10 |

SAM Documentation Home

11 | 12 | 15 |

SAM is an extensible markup language with syntax similar to Markdown but semantic capability similar to XML.

16 | 33 |
34 | 35 | -------------------------------------------------------------------------------- /statemachine_test.py: -------------------------------------------------------------------------------- 1 | from statemachine import StateMachine 2 | 3 | def ones_counter(val): 4 | print("ONES State: ", end=' ') 5 | while 1: 6 | if val <= 0 or val >= 30: 7 | newState = "Out_of_Range" 8 | break 9 | elif 20 <= val < 30: 10 | newState = "TWENTIES" 11 | break 12 | elif 10 <= val < 20: 13 | newState = "TENS" 14 | break 15 | else: 16 | print(" @ %2.1f+" % val, end=' ') 17 | val = math_func(val) 18 | print(" >>") 19 | return newState, val 20 | 21 | def tens_counter(val): 22 | print("TENS State: ", end=' ') 23 | while 1: 24 | if val <= 0 or val >= 30: 25 | newState = "Out_of_Range" 26 | break 27 | elif 1 <= val < 10: 28 | newState = "ONES" 29 | break 30 | elif 20 <= val < 30: 31 | newState = "TWENTIES" 32 | break 33 | else: 34 | print(" #%2.1f+" % val, end=' ') 35 | val = math_func(val) 36 | print(" >>") 37 | return (newState, val) 38 | 39 | def twenties_counter(val): 40 | print("TWENTIES State:", end=' ') 41 | while 1: 42 | if val <= 0 or val >= 30: 43 | newState = "Out_of_Range" 44 | break 45 | elif 1 <= val < 10: 46 | newState = "ONES" 47 | break 48 | elif 10 <= val < 20: 49 | newState = "TENS" 50 | break 51 | else: 52 | print(" *%2.1f+" % val, end=' ') 53 | val = math_func(val) 54 | print(" >>") 55 | return (newState, val) 56 | 57 | def math_func(n): 58 | from math import sin 59 | return abs(sin(n))*31 60 | 61 | if __name__== "__main__": 62 | m = StateMachine() 63 | m.add_state("ONES", ones_counter) 64 | m.add_state("TENS", tens_counter) 65 | m.add_state("TWENTIES", twenties_counter) 66 | m.add_state("OUT_OF_RANGE", None, end_state=1) 67 | m.set_start("ONES") 68 | m.run(1) 69 | 70 | -------------------------------------------------------------------------------- /test.xsd: -------------------------------------------------------------------------------- 1 | 2 | 3 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 | 23 | 24 | 25 | 26 | 27 | 28 | 29 | 30 | 31 | 32 | 33 | 34 | 35 | 36 | 37 | 38 | 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | -------------------------------------------------------------------------------- /docs/sam.css: -------------------------------------------------------------------------------- 1 | :root 2 | { 3 | --title-font: "Gill Sans", sans-serif; 4 | } 5 | 6 | body 7 | { 8 | max-width: 42rem 9 | } 10 | 11 | .note 12 | { 13 | box-sizing: border-box; 14 | max-width: 42rem; 15 | width: 100%; 16 | display: table; 17 | margin: 0 0 20px; 18 | border-style: solid; 19 | background: #fff3d4; 20 | border-collapse: separate; 21 | border-color: #f6b73c; 22 | clear: left; 23 | padding: 12px 32px 12px 12px; 24 | border-width: 0 0 0 5px; 25 | } 26 | 27 | .note::before 28 | { 29 | display: inline-block; 30 | width: 30px; 31 | font-family: var(--title-font); 32 | font-size: 12pt; 33 | font-style: normal; 34 | font-weight: 400; 35 | content: 'NOTE:'; 36 | 37 | } 38 | 39 | pre.codeblock 40 | { 41 | box-sizing: border-box; 42 | max-width: 42rem; 43 | width: 100%; 44 | display: table; 45 | margin: 0 0 20px; 46 | border-style: solid; 47 | background: #e4f0f5; 48 | border-collapse: separate; 49 | border-color: #3f87a6; 50 | clear: left; 51 | padding: 12px 32px 12px 12px; 52 | border-width: 0 0 0 5px; 53 | } 54 | 55 | .title 56 | { 57 | font-family: var(--title-font); 58 | } 59 | 60 | .structures::before 61 | { 62 | font-family: var(--title-font); 63 | font-size: 18pt; 64 | content: "Structures"; 65 | font-weight: bold; 66 | } 67 | .syntax::before 68 | { 69 | font-family: var(--title-font); 70 | font-size: 12pt; 71 | content: "Syntax"; 72 | font-weight: bold; 73 | } 74 | .semantics::before 75 | { 76 | font-family: var(--title-font); 77 | font-size: 12pt; 78 | content: "Semantics"; 79 | font-weight: bold; 80 | } 81 | .model::before 82 | { 83 | font-family: var(--title-font); 84 | font-size: 12pt; 85 | content: "Model"; 86 | font-weight: bold; 87 | } 88 | .xml-serialization::before 89 | { 90 | font-family: var(--title-font); 91 | font-size: 12pt; 92 | content: "XML Serialization"; 93 | font-weight: bold; 94 | } 95 | .SOM::before 96 | { 97 | font-family: var(--title-font); 98 | font-size: 12pt; 99 | content: "SAM Object Model"; 100 | font-weight: bold; 101 | } 102 | 103 | dt 104 | { 105 | font-weight: bold; 106 | } 107 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | sam 2 | === 3 | 4 | Semantic Authoring Markdown 5 | 6 | XML was designed to be human readable, but not human writable. Graphical editors help, though they have their issues 7 | (like trying to find the right insertion point to add permitted elements). But graphical editors have a problem: when 8 | they hide the tags, they also hide the structure. 9 | 10 | Semantic Authoring Markdown brings the ideas behind Markdown -- a natural syntax for writing HTML documents -- to 11 | structured writing. It creates a syntax that captures structure, like XML, but is easy to write in a text editor 12 | -- like markdown. 13 | 14 | ### Documentation 15 | 16 | See the documentation at https://mbakeranalecta.github.io/sam/. 17 | 18 | ### NOTE 19 | 20 | The SAM Language is still being defined and both the language itself and the way it is serialized and represented internally by the parser is subject to change. I hope to stabilize the language definition soon. 21 | 22 | ### Backward-incompatible changes 23 | 24 | Since SAM is under active development, there may be backward-incompatible changes in the language. They will be noted here as they occur, up until we get to the point where the language or its serialization are considered final. 25 | 26 | * Revision 227cb3dd7bb322f5579858806071c1ff8456c0b6 introduced a change in the 27 | way the XML representation of a record is generate. A record 28 | used to output as "row". It is now output as "record". 29 | 30 | * Revision 3fdd6528d88b1a7f0a72c10ce5b5e768433eaf19 introduced a change in how inline code is serialized. It is now serialized as a `` element rather than as a `` element with an `` nested element. 31 | 32 | * Revision 8e8c6a0b4c9c41bd72fab5fd53e3d967e9688110 removed the `===` flag for a block of embedded code, which had been briefly introduced in an earlier revision. Blocks of embed code should now be represented as regular code blocks using an encoding attribute `(=svg)` rather than a language attribute `(svg)`. 33 | 34 | * Revision fac3fea6a9570a20c825369417ab2eaf94d34d2b made annotation lookup case insensitive. Case sensitive lookup can be turned on using the declaration `!annotation-lookup: case sensitive` 35 | 36 | * Revision 828ef33d291f1364a6edf036588ac5f21fac0abb addressed issue #142 by detecting recursive includes. This had the side effect of changing the behavior when the parser encounters an error in an included file. Previously this error was demoted to a warning (not sure why). Now it is treated as an error and stops the parser. Without this change, the error would not get noted in the error count for batch processing, which is clearly not a good idea. To allow for more lenient error handling while retaining appropriate error reporting, we would need to introduce a reportable non-fatal error type. Issue #148 has been raised to consider this. 37 | 38 | * Revision e0fa711d14219cbad19636515e2dc2bbe3a82f28: 39 | 40 | * Changed the format of error messages to report the original line on which the error occurred rather than a representation of the object created. 41 | 42 | * Changed the format produced by the `__str__()` method on doc structure objects to a normalized representation of the input text generated by the new `regurgitate()` method. 43 | 44 | * Changed the serialization of citations on a block so they come before the title, not after it. 45 | 46 | * Changed the object model of Blockinsert and Inlineinsert object to make the type and item value separate fields of the object rather than simple attributes. 47 | 48 | * Changed serialization of block embed from "embed" to "embedblock" to be consistent with "codeblock". 49 | 50 | * Changed type of embedded blocks from Codeblock to Embedblock. 51 | 52 | * Removed support for embeded XML fragments per the discussion of issue #145. SAM has outgrown this feature, which is incompatible with the plan to introduce SAM Schemas. 53 | 54 | * Revision 1d16fd6d0544c32fa23930f303989b1b4a82c477 addressed #157 by changing the serialization of citations as described in #157 and adding support of the use of keys in citations. 55 | 56 | * Revision ad4365064bdfe61fa43228991a31b3174feb2957 removes the smart quotes parser option (the flag that turned smart quotes on and off on the command line) and introduced the `!smart-quotes` document declaration and the option to add custom smart quotes rules to the parser. 57 | 58 | * Revision b4ca40baa03233ff306ed20a59da92668e4e0872 changes the syntax for inserting a value by reference. 59 | It used to be `>(#foo)` but this was confusing because parenthese are used to create names and ids, not 60 | to reference them. The syntax for referencing a name or id is [#foo]. So, the syntax for inserting a value by 61 | reference is now `>[#foo]`. This applies to strings, ids, names, fragments, and keys. Note that the syntax for 62 | inserting a value by URI still uses parentheses, since this is new information, not a reference to another 63 | internal value. Also note the difference between `[#foo]` which means generate a reference to the content with 64 | the name `foo` and `>[#foo]` which means insert the content with the name foo at the current location. (These are 65 | of course, operations performed at application layer, not by the parser.) 66 | 67 | * Starting with revision dd07a4b798fcaa14a722a345b5ab8e07c3df42a1 the way attributes were modeled internally 68 | changed. Instead of using as separate Attribute object, attributes became Python attributes on the relevant 69 | block or phrase object. This does not affect command line use but would affect programmatic access to 70 | the document structure. 71 | 72 | * Starting with revision dd07a4b798fcaa14a722a345b5ab8e07c3df42a1, the use of fragment references 73 | with the syntax `[~foo]` was removed (see issue #166). Fragments can be inserted by name or ID just 74 | like any other block type. 75 | 76 | * From revision 3e9b8f6fd8cddf9cbedb25c44ab48323216ce71e 77 | 78 | * The change to insert by reference in b4ca40baa03233ff306ed20a59da92668e4e0872 79 | is reversed. It caused slightly more confusion than the old version. 80 | 81 | * The `~` symbol or referencing an fragment is removed. Fragments should 82 | be referenced by name or id. 83 | 84 | * The strings feature has been renamed "variable". This chiefly affects 85 | the serialization of variable definitions and references. 86 | 87 | * In revision 1f20902624d29dab002353df8374952c63fff81d the serialization of citations has been changed to support 88 | compound identifiers and to support easier processing of citations. See the 89 | language docs for details. 90 | 91 | * In revision defbc97c9bd592ab454296852c3d9a65e1007996 the command line options changed to support three different 92 | output modes as subcommands. Other options changed as well. See above for the new command line options. Also, 93 | the serialization interface to the parser changed. When calling the parser from a python program you no longer 94 | call `samparser.serialize('xml')` or `samparser.serialize('html')` but `samparser.doc.serialize_xml()` 95 | and `samparser.doc.serialize_html()` respectively. 96 | 97 | Please report any other backward incompatibilities you find so they can be added to this list. 98 | 99 | -------------------------------------------------------------------------------- /docsource/samschemaspec.sam: -------------------------------------------------------------------------------- 1 | spec: SAM Schema Spec 2 | 3 | introduction: 4 | 5 | This is the specification for the SAM schema. It consists of a set of rules 6 | that specify how a sam schema is written and interpreted. 7 | 8 | rule:(#encoding) Encoding 9 | 10 | A SAM Schemas is encoded in UTF8 11 | 12 | rule:(#names) Names 13 | 14 | Element names must be valid XML names. 15 | 16 | rule:(#header) SAM Schema Header 17 | 18 | A SAM Schema begins with a SAM Schema Header 19 | 20 | A SAM schema header consists of the word "samschema" followed by a colon 21 | followed optionally by a namespace URI in curly braces. For example: 22 | 23 | ```(samschema) 24 | samschema:{http://spfeopentoolkit.org/ns/spfe-doc} 25 | 26 | 27 | If specified, the namespace URI specifies the default namespace of all the 28 | elements defined in the schema. 29 | 30 | rule:(#namespaces) Namespaces 31 | 32 | 33 | rule:(#root) Root element declaration 34 | 35 | Unlike an XML schema, which declares a collection of elements, several of which 36 | might be suitable for document roots, SAM schemas always declare a document 37 | root element. 38 | 39 | The document root element is declared as a valid element name beginning in the 40 | first column. It is the first element declaration following the SAM schema 41 | header and any {include statements}(rule "#include"). 42 | 43 | rule:(#includes) Includes 44 | 45 | You can include another samschema in the current samschema document using 46 | an include statement. 47 | 48 | Included samschemas are processed as if they were part of the current 49 | samschema with the following provisions: 50 | 51 | * The included sam schema may not include a {root element declaration}[#root]). 52 | This means that any elements declared in an included samschama must be 53 | declared inside {named structures}[#stuctures]. 54 | 55 | * If the included sam schema contains a default namespace, it will be used 56 | as the namespace for the elements defined in the included samschema. 57 | 58 | * If the included sam schema does not contain a default namespace declaration, 59 | and the including sam schema does contain a namespace declaration, 60 | the elements in the included sameschema will be placed in the declared namespace 61 | of the including samschema. This rule is applied down the tree of includes from 62 | the top level schema and is evaluated as each include statement is processed 63 | from the top down. 64 | 65 | rule:(#blocks) blocks declarations 66 | 67 | An SAM document consists of blocks, fields, record sets, flows, decorations, 68 | and annotations. Blocks and 69 | fields are declared the same way. The distinction between a block and a 70 | field is that a block has children and a field does not. 71 | 72 | For a field, SAM interprets any content after the field tag as the value of 73 | the field. 74 | 75 | For a block, SAM interprets any content after the block tag as an 76 | implicit {title}(rule "#implicit") field. 77 | 78 | rule:(#implicit) Implicit elements 79 | 80 | SAM recognized certain elements by their context in the SAM document. They 81 | do not have explicit tags in the content. These implicit elements are delineated 82 | either by punctuation or by their position relative to other elements. 83 | 84 | Implicit elements are implicit in the instance. They must be declared in the 85 | schema. It is an error if an implicit element occurs in the instance in a 86 | position in which it is not permitted in the schema. 87 | 88 | The implicit elements are as follows: 89 | 90 | |p| The p element represents a paragraph. It is recognized as a block of 91 | lines with no leading tag. It ends with a blank line. p elements are the 92 | only elements that can contain decorations and annotations. 93 | 94 | |bold| The bold element is a decoration occurring within text and is 95 | indicated by surrounding the text with asterisks. 96 | 97 | |italic| The italic decoration is indicated by surrounding the text 98 | with underscores. 99 | 100 | |mono| The mono decoration is indicated by back ticks. It indicates that 101 | the text should be rendered in a monospaced font. 102 | 103 | |quotes| The quotes decoration is indicated by surrounding the text 104 | in straight double quotes. You can use printers quotes in your document 105 | but they will be recognized as plain text, not markup. 106 | 107 | |title| The title element is implicit if there is text on the same line as 108 | a block tag, and if a title field is allowed as the first filed of the block. 109 | Text in this position is an error if a title field is not allowed in 110 | this location by a the schema of if the title element that is allowed is 111 | a block element rather than a field element. 112 | 113 | |blockquote| The blockquote block element is indicated by opening 114 | with three double quotes on a line by themselves, plus any parameters on 115 | the opening quotes. The quoted material must be indented from the blockquote 116 | markers 117 | 118 | |codeblock| The codeblock block element is indicated by opening and closing with 119 | three backtick on a line by themselves, plus any attributes on the 120 | opening backticks. 121 | 122 | |fragment| The fragment element contains a set of elements that can be 123 | reused by inserting the fragment into other locations in the document, or 124 | other documents. Fragments can also be use to apply conditions to a 125 | set of elements that would otherwise lack a common container. 126 | Recommended practice it to use fragments to contain 127 | only general text elements, which helps ensure that the inserted fragment will 128 | be schema compliant in the receiving document (assuming a common definition 129 | of general text elements). A fragment is indicated by three tildes (of the 130 | block name "fragment". 131 | 132 | |string| The definition of a string, which may be inserted in the 133 | document, but is not intended to be published where defined. 134 | 135 | rule:(#ideomatic-annotation-types) Idiomatic annotations 136 | Certain annotation types are considered idiomatic. That is, the name 137 | is considered bound by convention to a certain meaning. This does not 138 | mean that you cannot use the type for other things, or use other names 139 | for that function, but editor and tools are entitled to provide enhanced 140 | support for the idiomatic meaning of the names. 141 | 142 | In some cases, idiomatic types may have specialized or atypical treatment 143 | of the standard attributes. For instance, the link type uses the 144 | specifically attribute for the URL to link to, which is an eccentric 145 | interpretation of what specifically is intended for. 146 | 147 | |link| The link type is idiomatically used for inserting explicit hyperlinks. The 148 | specifically attribute is used to specify the url to link to. The 149 | namespace attribute remains a namespace. It could be used if the protocol 150 | or location is anything other than HTTP/Web. 151 | 152 | A schema can rename an idiomatic type, so that the tool behavior associated 153 | with the idiomatic type can be applied to the new type: 154 | 155 | ```(samschema) 156 | href=link 157 | 158 | 159 | rule:(#idiomatic-elements) Idiomatic elements 160 | 161 | Idiomatic elements are elements with intended meanings. Those meaning are 162 | not actually enforced by the SAM parser and the application is free to 163 | do what it likes with them. However, users and editing applications are 164 | entitled to treat the idiomatic elements as having their intended meanings. 165 | In some cases, the annotations of idiomatic elements are interpreted in 166 | special ways. The attributes will be assigned names that reflect their idiomatic usage, 167 | rather than the standard names. 168 | 169 | |blockquote| Annotation is recognized as a citation. Annotation type is 170 | recognized as the type of publication (web, book, article, etc.). Specifically 171 | the text of the citation is the format specified by the namespace attribute. 172 | It specifically attribute will be named "citation", and the namespace attribute 173 | will be named format. 174 | It is up to the downstream processor to parse the citation into its parts. 175 | 176 | 177 | |codeblock| Annotation type is recognized as a specifying the language and 178 | source of the codeblock. Specifically is recognized as the source and is 179 | named "source". 180 | 181 | 182 | rule:(#implicit-structures) Implicit structures 183 | SAM also recognized implicit structures. An implicit structure occurs where 184 | more than one element is recognized by context. The implicit structures are: 185 | 186 | |unordered lists| Unordered lists are indicated by starting a line with an 187 | asterisk. The root element of the structure in {ul}(element). 188 | The container for the list item is {li}(element). The container for 189 | the text is {p}(element). 190 | 191 | |ordered lists| Ordered lists are indicated by starting a line with a number. 192 | Numbers do not have to be sequential. The root element of the structure in {ol}. 193 | The container for the list item is {li}. The container for the text is {p}. 194 | 195 | |labeled lists| Labeled lists are indicated by surrounding the label text with 196 | pipe characters at the start of a paragraph. The root element of the structure 197 | in {ll}(element). Each labeled list item has the name {li} and contains the label 198 | and the labeled paragraph. 199 | 200 | |block titles| When you place text after the name of a block that has nested 201 | content (as opposed to one that is merely a field with a single value), that 202 | text is treated as {title}(element) for the block. 203 | 204 | rule:(#rename-implicit) Renaming implicit structures 205 | 206 | You can rename the implicit elements so that the output of the SAM parser 207 | gives them different names. To rename an implicit element, use the new name 208 | followed by an equals sign, followed by the implicit element name. 209 | 210 | Thus, to create a paragraph element named para instead of p, specify the 211 | element as follows: 212 | 213 | ```(samschema) 214 | para=p 215 | 216 | 217 | rule:(#create-implicit) Creating implicit structures 218 | 219 | rule:(#order-number) Order and number of elements in structures 220 | 221 | You can indicate the order and number of elements in a structure. 222 | 223 | |~| All the items in the structure in any order. 224 | 225 | 226 | 227 | 228 | 229 | 230 | 231 | -------------------------------------------------------------------------------- /test.xslt: -------------------------------------------------------------------------------- 1 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | < 18 | 19 | 20 | 21 | 22 | =" 23 | 24 | 25 | 26 | 27 | " 28 | 29 | 30 | 31 | /> 32 | 33 | 34 | > 35 | 36 | </ 37 | 38 | > 39 | 40 | 41 | 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | 50 | 51 | <!DOCTYPE html> 52 | 53 | 54 | 55 | 56 | 57 | 58 | 59 | 60 |

Contents

61 |
79 | 80 | 81 | 82 |
83 | 84 | 85 |

86 |
87 | 88 | 89 | 90 | ^top 91 | 92 | 93 | 94 |

Test:

95 |
96 | 97 | 98 | 99 | 100 | 101 | 102 |

Case:

103 |
104 | 105 | 106 |

Source

107 | 108 |
109 | 110 | 111 |

Formatted output (not necessarily supported for all tests)

112 | 113 |
114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 |

Intended output (space normalized)

124 |
125 |

Actual output (space normalized)

126 |
127 | 128 | 129 |

Test result

130 | 131 | 132 |

PASS

133 |
134 | 135 |

**** FAIL ****

136 |
137 |
138 |
139 | 140 | 141 | 142 |
143 |
144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | []? 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | [ 181 | 182 | ] 183 | 184 | 185 | 186 | 187 | 188 | 189 | 190 |
191 |
192 | 193 | 194 | 195 | 196 | 197 | 198 | 199 | 200 | 201 | 202 | 203 | 204 | 205 | 206 | 207 |
208 |
209 | 210 | 211 | 212 | 213 | 214 | 215 | 216 | 217 | 218 | 219 | 220 | 221 |

222 | 223 | 224 | : 225 | 226 | 227 |

228 |
229 | 230 | 231 | 232 | 233 | 234 | 235 | 236 | 237 | 238 | 239 | 240 | 241 | 242 | 243 | 244 | 245 | 246 | 247 | 248 | 249 | 250 | 251 | 252 | 253 | 254 | 255 | 256 | 257 | 258 | & 259 | &amp; 260 | 261 | 262 | " 263 | &quot; 264 | 265 | 266 | < 267 | &lt; 268 | 269 | 270 | > 271 | &gt; 272 | 273 | 274 | 275 | 276 | 277 | & 278 | &amp; 279 | 280 | 281 | < 282 | &lt; 283 | 284 | 285 | > 286 | &gt; 287 | 288 | 289 | 290 | 291 | 292 | 293 | 294 | 295 | 296 | 297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 | 305 | 306 | 307 | 308 | 309 | 310 | 311 | 312 | 313 | 314 | 315 | 316 | 317 | 318 | 319 | 320 | 321 | 322 | 323 | 324 | 325 | 326 | 327 | 328 | 329 | 330 | 331 | 332 | 333 | 334 | 335 | 336 | 337 | 338 |
-------------------------------------------------------------------------------- /LICENSE-2.0.txt: -------------------------------------------------------------------------------- 1 | 2 | Apache License 3 | Version 2.0, January 2004 4 | http://www.apache.org/licenses/ 5 | 6 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 7 | 8 | 1. Definitions. 9 | 10 | "License" shall mean the terms and conditions for use, reproduction, 11 | and distribution as defined by Sections 1 through 9 of this document. 12 | 13 | "Licensor" shall mean the copyright owner or entity authorized by 14 | the copyright owner that is granting the License. 15 | 16 | "Legal Entity" shall mean the union of the acting entity and all 17 | other entities that control, are controlled by, or are under common 18 | control with that entity. For the purposes of this definition, 19 | "control" means (i) the power, direct or indirect, to cause the 20 | direction or management of such entity, whether by contract or 21 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 22 | outstanding shares, or (iii) beneficial ownership of such entity. 23 | 24 | "You" (or "Your") shall mean an individual or Legal Entity 25 | exercising permissions granted by this License. 26 | 27 | "Source" form shall mean the preferred form for making modifications, 28 | including but not limited to software source code, documentation 29 | source, and configuration files. 30 | 31 | "Object" form shall mean any form resulting from mechanical 32 | transformation or translation of a Source form, including but 33 | not limited to compiled object code, generated documentation, 34 | and conversions to other media types. 35 | 36 | "Work" shall mean the work of authorship, whether in Source or 37 | Object form, made available under the License, as indicated by a 38 | copyright notice that is included in or attached to the work 39 | (an example is provided in the Appendix below). 40 | 41 | "Derivative Works" shall mean any work, whether in Source or Object 42 | form, that is based on (or derived from) the Work and for which the 43 | editorial revisions, annotations, elaborations, or other modifications 44 | represent, as a whole, an original work of authorship. For the purposes 45 | of this License, Derivative Works shall not include works that remain 46 | separable from, or merely link (or bind by name) to the interfaces of, 47 | the Work and Derivative Works thereof. 48 | 49 | "Contribution" shall mean any work of authorship, including 50 | the original version of the Work and any modifications or additions 51 | to that Work or Derivative Works thereof, that is intentionally 52 | submitted to Licensor for inclusion in the Work by the copyright owner 53 | or by an individual or Legal Entity authorized to submit on behalf of 54 | the copyright owner. For the purposes of this definition, "submitted" 55 | means any form of electronic, verbal, or written communication sent 56 | to the Licensor or its representatives, including but not limited to 57 | communication on electronic mailing lists, source code control systems, 58 | and issue tracking systems that are managed by, or on behalf of, the 59 | Licensor for the purpose of discussing and improving the Work, but 60 | excluding communication that is conspicuously marked or otherwise 61 | designated in writing by the copyright owner as "Not a Contribution." 62 | 63 | "Contributor" shall mean Licensor and any individual or Legal Entity 64 | on behalf of whom a Contribution has been received by Licensor and 65 | subsequently incorporated within the Work. 66 | 67 | 2. Grant of Copyright License. Subject to the terms and conditions of 68 | this License, each Contributor hereby grants to You a perpetual, 69 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 70 | copyright license to reproduce, prepare Derivative Works of, 71 | publicly display, publicly perform, sublicense, and distribute the 72 | Work and such Derivative Works in Source or Object form. 73 | 74 | 3. Grant of Patent License. Subject to the terms and conditions of 75 | this License, each Contributor hereby grants to You a perpetual, 76 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 77 | (except as stated in this section) patent license to make, have made, 78 | use, offer to sell, sell, import, and otherwise transfer the Work, 79 | where such license applies only to those patent claims licensable 80 | by such Contributor that are necessarily infringed by their 81 | Contribution(s) alone or by combination of their Contribution(s) 82 | with the Work to which such Contribution(s) was submitted. If You 83 | institute patent litigation against any entity (including a 84 | cross-claim or counterclaim in a lawsuit) alleging that the Work 85 | or a Contribution incorporated within the Work constitutes direct 86 | or contributory patent infringement, then any patent licenses 87 | granted to You under this License for that Work shall terminate 88 | as of the date such litigation is filed. 89 | 90 | 4. Redistribution. You may reproduce and distribute copies of the 91 | Work or Derivative Works thereof in any medium, with or without 92 | modifications, and in Source or Object form, provided that You 93 | meet the following conditions: 94 | 95 | (a) You must give any other recipients of the Work or 96 | Derivative Works a copy of this License; and 97 | 98 | (b) You must cause any modified files to carry prominent notices 99 | stating that You changed the files; and 100 | 101 | (c) You must retain, in the Source form of any Derivative Works 102 | that You distribute, all copyright, patent, trademark, and 103 | attribution notices from the Source form of the Work, 104 | excluding those notices that do not pertain to any part of 105 | the Derivative Works; and 106 | 107 | (d) If the Work includes a "NOTICE" text file as part of its 108 | distribution, then any Derivative Works that You distribute must 109 | include a readable copy of the attribution notices contained 110 | within such NOTICE file, excluding those notices that do not 111 | pertain to any part of the Derivative Works, in at least one 112 | of the following places: within a NOTICE text file distributed 113 | as part of the Derivative Works; within the Source form or 114 | documentation, if provided along with the Derivative Works; or, 115 | within a display generated by the Derivative Works, if and 116 | wherever such third-party notices normally appear. The contents 117 | of the NOTICE file are for informational purposes only and 118 | do not modify the License. You may add Your own attribution 119 | notices within Derivative Works that You distribute, alongside 120 | or as an addendum to the NOTICE text from the Work, provided 121 | that such additional attribution notices cannot be construed 122 | as modifying the License. 123 | 124 | You may add Your own copyright statement to Your modifications and 125 | may provide additional or different license terms and conditions 126 | for use, reproduction, or distribution of Your modifications, or 127 | for any such Derivative Works as a whole, provided Your use, 128 | reproduction, and distribution of the Work otherwise complies with 129 | the conditions stated in this License. 130 | 131 | 5. Submission of Contributions. Unless You explicitly state otherwise, 132 | any Contribution intentionally submitted for inclusion in the Work 133 | by You to the Licensor shall be under the terms and conditions of 134 | this License, without any additional terms or conditions. 135 | Notwithstanding the above, nothing herein shall supersede or modify 136 | the terms of any separate license agreement you may have executed 137 | with Licensor regarding such Contributions. 138 | 139 | 6. Trademarks. This License does not grant permission to use the trade 140 | names, trademarks, service marks, or product names of the Licensor, 141 | except as required for reasonable and customary use in describing the 142 | origin of the Work and reproducing the content of the NOTICE file. 143 | 144 | 7. Disclaimer of Warranty. Unless required by applicable law or 145 | agreed to in writing, Licensor provides the Work (and each 146 | Contributor provides its Contributions) on an "AS IS" BASIS, 147 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 148 | implied, including, without limitation, any warranties or conditions 149 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 150 | PARTICULAR PURPOSE. You are solely responsible for determining the 151 | appropriateness of using or redistributing the Work and assume any 152 | risks associated with Your exercise of permissions under this License. 153 | 154 | 8. Limitation of Liability. In no event and under no legal theory, 155 | whether in tort (including negligence), contract, or otherwise, 156 | unless required by applicable law (such as deliberate and grossly 157 | negligent acts) or agreed to in writing, shall any Contributor be 158 | liable to You for damages, including any direct, indirect, special, 159 | incidental, or consequential damages of any character arising as a 160 | result of this License or out of the use or inability to use the 161 | Work (including but not limited to damages for loss of goodwill, 162 | work stoppage, computer failure or malfunction, or any and all 163 | other commercial damages or losses), even if such Contributor 164 | has been advised of the possibility of such damages. 165 | 166 | 9. Accepting Warranty or Additional Liability. While redistributing 167 | the Work or Derivative Works thereof, You may choose to offer, 168 | and charge a fee for, acceptance of support, warranty, indemnity, 169 | or other liability obligations and/or rights consistent with this 170 | License. However, in accepting such obligations, You may act only 171 | on Your own behalf and on Your sole responsibility, not on behalf 172 | of any other Contributor, and only if You agree to indemnify, 173 | defend, and hold each Contributor harmless for any liability 174 | incurred by, or claims asserted against, such Contributor by reason 175 | of your accepting any such warranty or additional liability. 176 | 177 | END OF TERMS AND CONDITIONS 178 | 179 | APPENDIX: How to apply the Apache License to your work. 180 | 181 | To apply the Apache License to your work, attach the following 182 | boilerplate notice, with the fields enclosed by brackets "[]" 183 | replaced with your own identifying information. (Don't include 184 | the brackets!) The text should be enclosed in the appropriate 185 | comment syntax for the file format. We also recommend that a 186 | file or class name and description of purpose be included on the 187 | same "printed page" as the copyright notice for easier 188 | identification within third-party archives. 189 | 190 | Copyright [yyyy] [name of copyright owner] 191 | 192 | Licensed under the Apache License, Version 2.0 (the "License"); 193 | you may not use this file except in compliance with the License. 194 | You may obtain a copy of the License at 195 | 196 | http://www.apache.org/licenses/LICENSE-2.0 197 | 198 | Unless required by applicable law or agreed to in writing, software 199 | distributed under the License is distributed on an "AS IS" BASIS, 200 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 201 | See the License for the specific language governing permissions and 202 | limitations under the License. 203 | -------------------------------------------------------------------------------- /docsource/parser.sam: -------------------------------------------------------------------------------- 1 | article: SAM Parser 2 | 3 | <<<(menu.sami) 4 | 5 | SAM Parser is an application to process SAM files into an equivalent XML representation which 6 | you can then further process in any desired form of output. 7 | 8 | SAM Parser allows to specify an XSLT 1.0 stylesheet to post-process the output XML into a 9 | desired output format. In particular, it only allows for processing individual SAM files to 10 | individual output files. It does not support any kind of assembly or linking of multiple source files. 11 | This facility is not intended to meet all output needs, but it 12 | provides a simple way to create basic outputs. 13 | 14 | SAM Parser is a Python 3 program that requires the regex and libxml libraries that are not 15 | distributed with the default Python distribution. If you don't already have a Python 3 install, 16 | the easiest way to get one with the required libraries installed is to install 17 | {Anaconda}(https://www.continuum.io/downloads). 18 | 19 | SAM Parser is invoked as follows: 20 | 21 | ```(console) 22 | samparser 23 | 24 | Three output modes are available: 25 | 26 | * `xml` outputs a version of the document in XML. You can use an XSD schema to validate the output file and/or an XSLT 1.0 stylesheet to transform it into another format. 27 | * `html` outputs a version of the document in HTML with embedded semantic markup. You can supply one of more CSS stylesheets and/or one or more JavaScrip files to include. 28 | * `regurgitate` outputs a version of the document in SAM with various references resolved and normalized. 29 | 30 | All output modes take the following options: 31 | 32 | * `` The path the SAM file or files to be processed. (required) 33 | * `[-outfile ]|[-outdir [-outputextension ]]` Specifies either an output file or a directory to place output files and the file extension to apply those files. (optional, defaults to the console) 34 | * `-smartquotes ` The path to a file containing smartquote rules. 35 | * `-expandrelativepaths` Causes the parser to expand relative paths in SAM insert statements when serializing output. Use this if you want paths relative to the source file to be made portable 36 | in the output file. 37 | 38 | XML output mode takes the following options: 39 | 40 | * `[-xslt [-transformedoutputfile ]\[-transformedoutputdir [-transformedextension ]` Specifies an XSD schema file to use to validate the XML output file. 42 | 43 | HTML output mode takes the follow options: 44 | 45 | * `-css ` Specifies the path to a CSS file to include in the HTML output file. (optional, repeatable) 46 | * `-javascipt ` Specifies the path to a JavaScript file to include in the HTML output file. (optional, repeatable) 47 | 48 | Regurgitate mode does not take any additional options. 49 | 50 | Short forms of the options are available as follows 51 | 52 | +++ 53 | *option* | *short form* 54 | -outfile | -o 55 | -outdir | -od 56 | -expandrelativepaths | -xrp 57 | -outputextension |-oext 58 | -smartquotes |-sq 59 | -xslt |-x 60 | -xsd | 61 | -transformedoutputfile |-to 62 | -transformedoutputdir |-tod 63 | -transformedextension |-toext 64 | -css | 65 | -javascript | 66 | 67 | section: Validating with an XML schema 68 | 69 | Eventually, SAM is going to have its own schema language, but until that is available 70 | (and probably afterward) you can validate your document against an XML schema. 71 | Schema validation is done on the XML output format, not the input (because it is an XML 72 | schema, not a SAM schema). To invoke schema validation, use the `-xsd` option 73 | on the command line: 74 | 75 | ```(console) 76 | -xsd 77 | 78 | section: Expanding relative paths 79 | 80 | The SAM parser can expand relative paths of insert statements in the source document 81 | while serializing the 82 | output. This can be useful if the location of the output file is not the same relative 83 | to the included resources as the location of the source file. To tell the parser to 84 | expand relative paths into absolute URLs, use the `-expandrelativepaths` option. The 85 | short form is `-xrp`. 86 | 87 | ```(console) 88 | - xrp 89 | 90 | Note that this applies to paths in SAM insert statements only. If you include paths in 91 | custom structures in your markup, they will not be expanded as the parser has no way 92 | of knowing that the value of a custom structure is a path. 93 | 94 | 95 | section: Regurgitating the SAM document 96 | 97 | The parser can also regurgitate the SAM document (that is, create a SAM serialization of 98 | the structure defined by the original SAM document). The regurgitated 99 | version may be different in small ways from the input document but will 100 | create the same structures internally and will serialize the same 101 | way as the original. Some of the difference are: 102 | 103 | * Character entities will be replaced by Unicode characters. 104 | * Paragraphs will be all on one line 105 | * Bold and italic decorations will be replaced with equivalent 106 | annotations. 107 | * Some non-essential character escapes may be included. 108 | * Annotation lookups will be performed and any `!annotation-lookup` declaration 109 | will be removed. 110 | * Smart quote processing will be performed and any `!smart-quotes` declaration 111 | will be removed. 112 | 113 | To regurgitate, use the `regurgitate` output mode. 114 | 115 | section: Smart quotes 116 | 117 | The parser incorporates a smart quotes feature. The writer can specify 118 | that they want smartquotes processing for their document by including 119 | the smartquotes declaration at the start of their document. 120 | 121 | ```(sam) 122 | !smart-quotes: on 123 | 124 | By default, the parser supports two values for the smart quotes declaration, `on` 125 | and `off` (the default). The built-in `on` setting supports the following 126 | translations: 127 | 128 | * single quotes to curly quotes 129 | * double quotes to curly double quotes 130 | * single quotes as apostrophe to curly quotes 131 | * -- to en-dash 132 | * --- to em-dash 133 | 134 | Note that no smart quote algorithm is perfect. This option will miss some 135 | instances and may get some wrong. To ensure you always get the characters you 136 | want, enter the unicode characters directly or use a character entity. 137 | 138 | Smart quote processing is not applied to code phrases or to codeblocks 139 | or embedded markup. 140 | 141 | Because different writers may want different smart quote rules, or different 142 | rules may be appropriate to different kinds of materials. the parser lets 143 | you specify your own sets of smart quote rules. Essentially this lets you 144 | detect any pattern in the text and define a substitution for it. You can use 145 | it for any characters substitutions that you find useful, even those having 146 | nothing to do with quotes. 147 | 148 | To define a set of smart quote substitutions, create a XML file like the 149 | `sq.xml` file included with the parser. This file includes two alternate 150 | sets of smart quote rules, `justquotes` and `justdashes`, which contains 151 | rulesets which process just quotes and just dashes respectively. The dashes 152 | and quotes rules in this file are the same as those built in to the parser. 153 | Note, however, that the parser does not use these files by default. 154 | 155 | To invoke the `justquotes` rule set: 156 | 157 | 1. Add the declaration `!smart-quotes: justquotes` to the document. 158 | 159 | 2. Use the command line parameter `-sq /sq.xml`. 160 | 161 | To add a custom rule set, create your own rule set file and invoke it 162 | in the same way. 163 | 164 | Note that the rules in each rule set are represented by regular expressions. 165 | The rules detect characters based on their surroundings. They do not detect 166 | quotations by finding the opening and closing quotes as a pair. They find them 167 | separately. This means that the order of rules in the rule file may be 168 | important. In the default rules, close quote rules are listed first. 169 | Reversing the order might result in some close quotes being detected as 170 | open quotes. 171 | 172 | section: HTML Output Mode 173 | 174 | Normally SAM is serialized to XML which you can then process to produce HTML or any other 175 | output you want. However, the parser also supports outputting HTML directly. The attraction 176 | of this is that it allows you to have a semantically constrained input format that can 177 | be validated with a schema but which can still output to HTML5 directly. 178 | 179 | SAM structures are output to HTML as follows: 180 | 181 | * Named blocks are output as HTML `
` elements. The SAM block 182 | name is output as the `class` attribute of the DIV elements, allowing you to attach 183 | specific CSS styling to each type of block. 184 | 185 | * Codeblocks are output as `pre` elements with the language attribute output as a `data-language` 186 | and the `class` as `codeblock`. Code is wrapped in `code` tags, also with `class` as `codeblock`. 187 | 188 | * Embedded data is ignored and a warning is issued. 189 | 190 | * Paragraphs, ordered lists, and unordered lists are output as their HTML equivalents. 191 | 192 | * Labelled lists are output as definition lists. 193 | 194 | * Grids are output as tables. 195 | 196 | * Record sets are output as tables with the field names as table headers. 197 | 198 | * Inserts by URL become `object` elements. String inserts are resolved if the named string is available. 199 | Inserts by ID are resolved by inserting the object with the specified ID. A warning will be raised and 200 | the insert will be ignored if you try to insert a block element with an inline insert or and inline 201 | element with a block insert. All other inserts are ignored and a warning is raised. 202 | 203 | * Phrases are output as spans with the `class` attribute `phrase`. 204 | 205 | * Annotations are output as spans nested within the phrase spans they annotate. The specifically and 206 | namespace attributes of an annotation are output as `data-*` attributes. 207 | 208 | * Attributes are output as HTML attributes. ID attributes are output as HTML `id` attributes. Language-code 209 | attributes are ouput as HTML `lang` attributes. Other attributes are output as HTML `data-*` attributes. 210 | 211 | * An HTML `head` element is generated which includes the `title` elements if the root block of the 212 | SAM document has a title. It also includes ``. 213 | 214 | To generate HTML output, use the `html` output mode the command line. 215 | 216 | To specify a stylesheet to be imported by the resulting HTML file, use the `-css` option with the 217 | URL of the css file to be included (relative to the location of the HTML file). You can specify the 218 | `-css` option more than once. 219 | 220 | To specify a javascript file to be imported by the resulting HTML file, use the `-javascript` option 221 | with the URL of the javascript file to be included (relative to the location of the HTML file). You 222 | can specify the `-javascript` option more than once. 223 | 224 | 225 | 226 | 227 | section: Running SAM Parser on Windows 228 | 229 | To run SAM Parser on Windows, use the `samparser` batch file: 230 | 231 | ```(console) 232 | samparser xml foo.sam -o foo.xml -x foo2html.xslt -to foo.html 233 | 234 | ### Running SAM Parser on Xnix and Mac 235 | 236 | To run SAM Parser on Xnix or Mac, invoke Python 3 as appropriate on your system. For example: 237 | 238 | ```(console) 239 | python3 samparser.py xml foo.sam -o foo.xml -x foo2html.xslt -to foo.html 240 | 241 | 242 | 243 | -------------------------------------------------------------------------------- /Eclipse Public License - Version 1.0.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Eclipse Public License - Version 1.0 6 | 23 | 24 | 25 | 26 | 27 | 28 |

Eclipse Public License - v 1.0

29 | 30 |

THE ACCOMPANYING PROGRAM IS PROVIDED UNDER THE TERMS OF THIS ECLIPSE 31 | PUBLIC LICENSE ("AGREEMENT"). ANY USE, REPRODUCTION OR 32 | DISTRIBUTION OF THE PROGRAM CONSTITUTES RECIPIENT'S ACCEPTANCE OF THIS 33 | AGREEMENT.

34 | 35 |

1. DEFINITIONS

36 | 37 |

"Contribution" means:

38 | 39 |

a) in the case of the initial Contributor, the initial 40 | code and documentation distributed under this Agreement, and

41 |

b) in the case of each subsequent Contributor:

42 |

i) changes to the Program, and

43 |

ii) additions to the Program;

44 |

where such changes and/or additions to the Program 45 | originate from and are distributed by that particular Contributor. A 46 | Contribution 'originates' from a Contributor if it was added to the 47 | Program by such Contributor itself or anyone acting on such 48 | Contributor's behalf. Contributions do not include additions to the 49 | Program which: (i) are separate modules of software distributed in 50 | conjunction with the Program under their own license agreement, and (ii) 51 | are not derivative works of the Program.

52 | 53 |

"Contributor" means any person or entity that distributes 54 | the Program.

55 | 56 |

"Licensed Patents" mean patent claims licensable by a 57 | Contributor which are necessarily infringed by the use or sale of its 58 | Contribution alone or when combined with the Program.

59 | 60 |

"Program" means the Contributions distributed in accordance 61 | with this Agreement.

62 | 63 |

"Recipient" means anyone who receives the Program under 64 | this Agreement, including all Contributors.

65 | 66 |

2. GRANT OF RIGHTS

67 | 68 |

a) Subject to the terms of this Agreement, each 69 | Contributor hereby grants Recipient a non-exclusive, worldwide, 70 | royalty-free copyright license to reproduce, prepare derivative works 71 | of, publicly display, publicly perform, distribute and sublicense the 72 | Contribution of such Contributor, if any, and such derivative works, in 73 | source code and object code form.

74 | 75 |

b) Subject to the terms of this Agreement, each 76 | Contributor hereby grants Recipient a non-exclusive, worldwide, 77 | royalty-free patent license under Licensed Patents to make, use, sell, 78 | offer to sell, import and otherwise transfer the Contribution of such 79 | Contributor, if any, in source code and object code form. This patent 80 | license shall apply to the combination of the Contribution and the 81 | Program if, at the time the Contribution is added by the Contributor, 82 | such addition of the Contribution causes such combination to be covered 83 | by the Licensed Patents. The patent license shall not apply to any other 84 | combinations which include the Contribution. No hardware per se is 85 | licensed hereunder.

86 | 87 |

c) Recipient understands that although each Contributor 88 | grants the licenses to its Contributions set forth herein, no assurances 89 | are provided by any Contributor that the Program does not infringe the 90 | patent or other intellectual property rights of any other entity. Each 91 | Contributor disclaims any liability to Recipient for claims brought by 92 | any other entity based on infringement of intellectual property rights 93 | or otherwise. As a condition to exercising the rights and licenses 94 | granted hereunder, each Recipient hereby assumes sole responsibility to 95 | secure any other intellectual property rights needed, if any. For 96 | example, if a third party patent license is required to allow Recipient 97 | to distribute the Program, it is Recipient's responsibility to acquire 98 | that license before distributing the Program.

99 | 100 |

d) Each Contributor represents that to its knowledge it 101 | has sufficient copyright rights in its Contribution, if any, to grant 102 | the copyright license set forth in this Agreement.

103 | 104 |

3. REQUIREMENTS

105 | 106 |

A Contributor may choose to distribute the Program in object code 107 | form under its own license agreement, provided that:

108 | 109 |

a) it complies with the terms and conditions of this 110 | Agreement; and

111 | 112 |

b) its license agreement:

113 | 114 |

i) effectively disclaims on behalf of all Contributors 115 | all warranties and conditions, express and implied, including warranties 116 | or conditions of title and non-infringement, and implied warranties or 117 | conditions of merchantability and fitness for a particular purpose;

118 | 119 |

ii) effectively excludes on behalf of all Contributors 120 | all liability for damages, including direct, indirect, special, 121 | incidental and consequential damages, such as lost profits;

122 | 123 |

iii) states that any provisions which differ from this 124 | Agreement are offered by that Contributor alone and not by any other 125 | party; and

126 | 127 |

iv) states that source code for the Program is available 128 | from such Contributor, and informs licensees how to obtain it in a 129 | reasonable manner on or through a medium customarily used for software 130 | exchange.

131 | 132 |

When the Program is made available in source code form:

133 | 134 |

a) it must be made available under this Agreement; and

135 | 136 |

b) a copy of this Agreement must be included with each 137 | copy of the Program.

138 | 139 |

Contributors may not remove or alter any copyright notices contained 140 | within the Program.

141 | 142 |

Each Contributor must identify itself as the originator of its 143 | Contribution, if any, in a manner that reasonably allows subsequent 144 | Recipients to identify the originator of the Contribution.

145 | 146 |

4. COMMERCIAL DISTRIBUTION

147 | 148 |

Commercial distributors of software may accept certain 149 | responsibilities with respect to end users, business partners and the 150 | like. While this license is intended to facilitate the commercial use of 151 | the Program, the Contributor who includes the Program in a commercial 152 | product offering should do so in a manner which does not create 153 | potential liability for other Contributors. Therefore, if a Contributor 154 | includes the Program in a commercial product offering, such Contributor 155 | ("Commercial Contributor") hereby agrees to defend and 156 | indemnify every other Contributor ("Indemnified Contributor") 157 | against any losses, damages and costs (collectively "Losses") 158 | arising from claims, lawsuits and other legal actions brought by a third 159 | party against the Indemnified Contributor to the extent caused by the 160 | acts or omissions of such Commercial Contributor in connection with its 161 | distribution of the Program in a commercial product offering. The 162 | obligations in this section do not apply to any claims or Losses 163 | relating to any actual or alleged intellectual property infringement. In 164 | order to qualify, an Indemnified Contributor must: a) promptly notify 165 | the Commercial Contributor in writing of such claim, and b) allow the 166 | Commercial Contributor to control, and cooperate with the Commercial 167 | Contributor in, the defense and any related settlement negotiations. The 168 | Indemnified Contributor may participate in any such claim at its own 169 | expense.

170 | 171 |

For example, a Contributor might include the Program in a commercial 172 | product offering, Product X. That Contributor is then a Commercial 173 | Contributor. If that Commercial Contributor then makes performance 174 | claims, or offers warranties related to Product X, those performance 175 | claims and warranties are such Commercial Contributor's responsibility 176 | alone. Under this section, the Commercial Contributor would have to 177 | defend claims against the other Contributors related to those 178 | performance claims and warranties, and if a court requires any other 179 | Contributor to pay any damages as a result, the Commercial Contributor 180 | must pay those damages.

181 | 182 |

5. NO WARRANTY

183 | 184 |

EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, THE PROGRAM IS 185 | PROVIDED ON AN "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS 186 | OF ANY KIND, EITHER EXPRESS OR IMPLIED INCLUDING, WITHOUT LIMITATION, 187 | ANY WARRANTIES OR CONDITIONS OF TITLE, NON-INFRINGEMENT, MERCHANTABILITY 188 | OR FITNESS FOR A PARTICULAR PURPOSE. Each Recipient is solely 189 | responsible for determining the appropriateness of using and 190 | distributing the Program and assumes all risks associated with its 191 | exercise of rights under this Agreement , including but not limited to 192 | the risks and costs of program errors, compliance with applicable laws, 193 | damage to or loss of data, programs or equipment, and unavailability or 194 | interruption of operations.

195 | 196 |

6. DISCLAIMER OF LIABILITY

197 | 198 |

EXCEPT AS EXPRESSLY SET FORTH IN THIS AGREEMENT, NEITHER RECIPIENT 199 | NOR ANY CONTRIBUTORS SHALL HAVE ANY LIABILITY FOR ANY DIRECT, INDIRECT, 200 | INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING 201 | WITHOUT LIMITATION LOST PROFITS), HOWEVER CAUSED AND ON ANY THEORY OF 202 | LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING 203 | NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OR 204 | DISTRIBUTION OF THE PROGRAM OR THE EXERCISE OF ANY RIGHTS GRANTED 205 | HEREUNDER, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

206 | 207 |

7. GENERAL

208 | 209 |

If any provision of this Agreement is invalid or unenforceable under 210 | applicable law, it shall not affect the validity or enforceability of 211 | the remainder of the terms of this Agreement, and without further action 212 | by the parties hereto, such provision shall be reformed to the minimum 213 | extent necessary to make such provision valid and enforceable.

214 | 215 |

If Recipient institutes patent litigation against any entity 216 | (including a cross-claim or counterclaim in a lawsuit) alleging that the 217 | Program itself (excluding combinations of the Program with other 218 | software or hardware) infringes such Recipient's patent(s), then such 219 | Recipient's rights granted under Section 2(b) shall terminate as of the 220 | date such litigation is filed.

221 | 222 |

All Recipient's rights under this Agreement shall terminate if it 223 | fails to comply with any of the material terms or conditions of this 224 | Agreement and does not cure such failure in a reasonable period of time 225 | after becoming aware of such noncompliance. If all Recipient's rights 226 | under this Agreement terminate, Recipient agrees to cease use and 227 | distribution of the Program as soon as reasonably practicable. However, 228 | Recipient's obligations under this Agreement and any licenses granted by 229 | Recipient relating to the Program shall continue and survive.

230 | 231 |

Everyone is permitted to copy and distribute copies of this 232 | Agreement, but in order to avoid inconsistency the Agreement is 233 | copyrighted and may only be modified in the following manner. The 234 | Agreement Steward reserves the right to publish new versions (including 235 | revisions) of this Agreement from time to time. No one other than the 236 | Agreement Steward has the right to modify this Agreement. The Eclipse 237 | Foundation is the initial Agreement Steward. The Eclipse Foundation may 238 | assign the responsibility to serve as the Agreement Steward to a 239 | suitable separate entity. Each new version of the Agreement will be 240 | given a distinguishing version number. The Program (including 241 | Contributions) may always be distributed subject to the version of the 242 | Agreement under which it was received. In addition, after a new version 243 | of the Agreement is published, Contributor may elect to distribute the 244 | Program (including its Contributions) under the new version. Except as 245 | expressly stated in Sections 2(a) and 2(b) above, Recipient receives no 246 | rights or licenses to the intellectual property of any Contributor under 247 | this Agreement, whether expressly, by implication, estoppel or 248 | otherwise. All rights in the Program not expressly granted under this 249 | Agreement are reserved.

250 | 251 |

This Agreement is governed by the laws of the State of New York and 252 | the intellectual property laws of the United States of America. No party 253 | to this Agreement will bring a legal action under this Agreement more 254 | than one year after the cause of action arose. Each party waives its 255 | rights to a jury trial in any resulting litigation.

256 | 257 | 258 | 259 | -------------------------------------------------------------------------------- /docs/samschemaspec.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | SAM Schema Spec 5 | 6 | 7 | 8 | 9 |
10 |

SAM Schema Spec

11 | 12 |
13 |

This is the specification for the SAM schema. It consists of a set of rules that specify how a sam schema is written and interpreted.

14 |
15 |
16 |

Encoding

17 | 18 |

A SAM Schemas is encoded in UTF8

19 |
20 |
21 |

Names

22 | 23 |

Element names must be valid XML names.

24 |
25 |
26 |

SAM Schema Header

27 | 28 |

A SAM Schema begins with a SAM Schema Header

29 |

A SAM schema header consists of the word "samschema" followed by a colon followed optionally by a namespace URI in curly braces. For example:

30 |
samschema:{http://spfeopentoolkit.org/ns/spfe-doc}
 31 | 
32 |

If specified, the namespace URI specifies the default namespace of all the elements defined in the schema.

33 |
34 |
Namespaces
35 |
36 |

Root element declaration

37 | 38 |

Unlike an XML schema, which declares a collection of elements, several of which might be suitable for document roots, SAM schemas always declare a document root element.

39 |

The document root element is declared as a valid element name beginning in the first column. It is the first element declaration following the SAM schema header and any include statements.

40 |
41 |
42 |

Includes

43 | 44 |

You can include another samschema in the current samschema document using an include statement.

45 |

Included samschemas are processed as if they were part of the current samschema with the following provisions:

46 |
    47 |
  • 48 |

    The included sam schema may not include a root element declaration).

    49 |
  • 50 |
51 |

This means that any elements declared in an included samschama must be declared inside named structures.

52 |
    53 |
  • 54 |

    If the included sam schema contains a default namespace, it will be used

    55 |
  • 56 |
57 |

as the namespace for the elements defined in the included samschema.

58 |
    59 |
  • 60 |

    If the included sam schema does not contain a default namespace declaration,

    61 |
  • 62 |
63 |

and the including sam schema does contain a namespace declaration, the elements in the included sameschema will be placed in the declared namespace of the including samschema. This rule is applied down the tree of includes from the top level schema and is evaluated as each include statement is processed from the top down.

64 |
65 |
66 |

blocks declarations

67 | 68 |

An SAM document consists of blocks, fields, record sets, flows, decorations, and annotations. Blocks and fields are declared the same way. The distinction between a block and a field is that a block has children and a field does not.

69 |

For a field, SAM interprets any content after the field tag as the value of the field.

70 |

For a block, SAM interprets any content after the block tag as an implicit title field.

71 |
72 |
73 |

Implicit elements

74 | 75 |

SAM recognized certain elements by their context in the SAM document. They do not have explicit tags in the content. These implicit elements are delineated either by punctuation or by their position relative to other elements.

76 |

Implicit elements are implicit in the instance. They must be declared in the schema. It is an error if an implicit element occurs in the instance in a position in which it is not permitted in the schema.

77 |

The implicit elements are as follows:

78 |
79 |
80 |
p
81 |

The p element represents a paragraph. It is recognized as a block of

82 |
83 |
84 |
85 |

lines with no leading tag. It ends with a blank line. p elements are the only elements that can contain decorations and annotations.

86 |
87 |
88 |
bold
89 |

The bold element is a decoration occurring within text and is

90 |
91 |
92 |
93 |

indicated by surrounding the text with asterisks.

94 |
95 |
96 |
italic
97 |

The italic decoration is indicated by surrounding the text

98 |
99 |
100 |
101 |

with underscores.

102 |
103 |
104 |
mono
105 |

The mono decoration is indicated by back ticks. It indicates that

106 |
107 |
108 |
109 |

the text should be rendered in a monospaced font.

110 |
111 |
112 |
quotes
113 |

The quotes decoration is indicated by surrounding the text

114 |
115 |
116 |
117 |

in straight double quotes. You can use printers quotes in your document but they will be recognized as plain text, not markup.

118 |
119 |
120 |
title
121 |

The title element is implicit if there is text on the same line as

122 |
123 |
124 |
125 |

a block tag, and if a title field is allowed as the first filed of the block. Text in this position is an error if a title field is not allowed in this location by a the schema of if the title element that is allowed is a block element rather than a field element.

126 |
127 |
128 |
blockquote
129 |

The blockquote block element is indicated by opening

130 |
131 |
132 |
133 |

with three double quotes on a line by themselves, plus any parameters on the opening quotes. The quoted material must be indented from the blockquote markers

134 |
135 |
136 |
codeblock
137 |

The codeblock block element is indicated by opening and closing with

138 |
139 |
140 |
141 |

three backtick on a line by themselves, plus any attributes on the opening backticks.

142 |
143 |
144 |
fragment
145 |

The fragment element contains a set of elements that can be

146 |
147 |
148 |
149 |

reused by inserting the fragment into other locations in the document, or other documents. Fragments can also be use to apply conditions to a set of elements that would otherwise lack a common container. Recommended practice it to use fragments to contain only general text elements, which helps ensure that the inserted fragment will be schema compliant in the receiving document (assuming a common definition of general text elements). A fragment is indicated by three tildes (of the block name "fragment".

150 |
151 |
152 |
string
153 |

The definition of a string, which may be inserted in the

154 |
155 |
156 |
157 |

document, but is not intended to be published where defined.

158 |
159 |
160 |

Idiomatic annotations

161 | 162 |

Certain annotation types are considered idiomatic. That is, the name is considered bound by convention to a certain meaning. This does not mean that you cannot use the type for other things, or use other names for that function, but editor and tools are entitled to provide enhanced support for the idiomatic meaning of the names.

163 |

In some cases, idiomatic types may have specialized or atypical treatment of the standard attributes. For instance, the link type uses the specifically attribute for the URL to link to, which is an eccentric interpretation of what specifically is intended for.

164 |
165 |
166 |
link
167 |

The link type is idiomatically used for inserting explicit hyperlinks. The

168 |
169 |
170 |
171 |

specifically attribute is used to specify the url to link to. The namespace attribute remains a namespace. It could be used if the protocol or location is anything other than HTTP/Web.

172 |

A schema can rename an idiomatic type, so that the tool behavior associated with the idiomatic type can be applied to the new type:

173 |
href=link
174 | 
175 |
176 |
177 |

Idiomatic elements

178 | 179 |

Idiomatic elements are elements with intended meanings. Those meaning are not actually enforced by the SAM parser and the application is free to do what it likes with them. However, users and editing applications are entitled to treat the idiomatic elements as having their intended meanings. In some cases, the annotations of idiomatic elements are interpreted in special ways. The attributes will be assigned names that reflect their idiomatic usage, rather than the standard names.

180 |
181 |
182 |
blockquote
183 |

Annotation is recognized as a citation. Annotation type is

184 |
185 |
186 |
187 |

recognized as the type of publication (web, book, article, etc.). Specifically the text of the citation is the format specified by the namespace attribute. It specifically attribute will be named "citation", and the namespace attribute will be named format. It is up to the downstream processor to parse the citation into its parts.

188 |
189 |
190 |
codeblock
191 |

Annotation type is recognized as a specifying the language and

192 |
193 |
194 |
195 |

source of the codeblock. Specifically is recognized as the source and is named "source".

196 |
197 |
198 |

Implicit structures

199 | 200 |

SAM also recognized implicit structures. An implicit structure occurs where more than one element is recognized by context. The implicit structures are:

201 |
202 |
203 |
unordered lists
204 |

Unordered lists are indicated by starting a line with an

205 |
206 |
207 |
208 |

asterisk. The root element of the structure in ul. The container for the list item is li. The container for the text is p.

209 |
210 |
211 |
ordered lists
212 |

Ordered lists are indicated by starting a line with a number.

213 |
214 |
215 |
216 |

Numbers do not have to be sequential. The root element of the structure in ol. The container for the list item is li. The container for the text is p.

217 |
218 |
219 |
labeled lists
220 |

Labeled lists are indicated by surrounding the label text with

221 |
222 |
223 |
224 |

pipe characters at the start of a paragraph. The root element of the structure in ll. Each labeled list item has the name li and contains the label and the labeled paragraph.

225 |
226 |
227 |
block titles
228 |

When you place text after the name of a block that has nested

229 |
230 |
231 |
232 |

content (as opposed to one that is merely a field with a single value), that text is treated as title for the block.

233 |
234 |
235 |

Renaming implicit structures

236 | 237 |

You can rename the implicit elements so that the output of the SAM parser gives them different names. To rename an implicit element, use the new name followed by an equals sign, followed by the implicit element name.

238 |

Thus, to create a paragraph element named para instead of p, specify the element as follows:

239 |
para=p
240 | 
241 |
242 |
Creating implicit structures
243 |
244 |

Order and number of elements in structures

245 | 246 |

You can indicate the order and number of elements in a structure.

247 |
248 |
249 |
~
250 |

All the items in the structure in any order.

251 |
252 |
253 |
254 |
255 |
256 | 257 | -------------------------------------------------------------------------------- /samsparser.py: -------------------------------------------------------------------------------- 1 | import sys 2 | from statemachine import StateMachine 3 | from lxml import etree 4 | import io 5 | 6 | try: 7 | import regex as re 8 | 9 | re_supports_unicode_categories = True 10 | except ImportError: 11 | import re 12 | 13 | re_supports_unicode_categories = False 14 | print( 15 | """Regular expression support for Unicode categories not available. 16 | IDs starting with non-ASCII lowercase letters will not be recognized and 17 | will be treated as titles. Please install Python regex module. 18 | 19 | """, file=sys.stderr) 20 | 21 | 22 | 23 | class SamParser: 24 | def __init__(self): 25 | 26 | self.stateMachine = StateMachine() 27 | self.stateMachine.add_state("NEW", self._new_file) 28 | self.stateMachine.add_state("SAM", self._sam) 29 | self.stateMachine.add_state("BLOCK", self._block) 30 | self.stateMachine.add_state("CODEBLOCK-START", self._codeblock_start) 31 | self.stateMachine.add_state("CODEBLOCK", self._codeblock) 32 | self.stateMachine.add_state("PARAGRAPH-START", self._paragraph_start) 33 | self.stateMachine.add_state("PARAGRAPH", self._paragraph) 34 | self.stateMachine.add_state("RECORD-START", self._record_start) 35 | self.stateMachine.add_state("RECORD", self._record) 36 | self.stateMachine.add_state("LIST-ITEM", self._list_item) 37 | self.stateMachine.add_state("NUM-LIST-ITEM", self._num_list_item) 38 | self.stateMachine.add_state("BLOCK-INSERT", self._block_insert) 39 | self.stateMachine.add_state("END", None, end_state=1) 40 | self.stateMachine.set_start("NEW") 41 | self.current_paragraph = None 42 | self.doc = DocStructure() 43 | self.source = None 44 | self.patterns = { 45 | 'comment': re.compile(r'\s*#.*'), 46 | 'block-start': re.compile(r'(\s*)([a-zA-Z0-9-_]+):(?:\((.*?)\))?(.*)'), 47 | 'codeblock-start': re.compile(r'(\s*)```(.*)'), 48 | 'codeblock-end': re.compile(r'(\s*)```\s*$'), 49 | 'paragraph-start': re.compile(r'\w*'), 50 | 'blank-line': re.compile(r'^\s*$'), 51 | 'record-start': re.compile(r'\s*[a-zA-Z0-9-_]+::(.*)'), 52 | 'list-item': re.compile(r'(\s*)(\*\s+)(.*)'), 53 | 'num-list-item': re.compile(r'(\s*)([0-9]+\.\s+)(.*)'), 54 | 'block-insert': re.compile(r'(\s*)>>\(.*?\)\w*') 55 | } 56 | 57 | def parse(self, source): 58 | self.source = source 59 | try: 60 | self.stateMachine.run(self.source) 61 | except EOFError: 62 | raise Exception("Document ended before structure was complete. At:\n\n" 63 | + self.current_paragraph) 64 | 65 | def paragraph_start(self, line): 66 | self.current_paragraph = line.strip() 67 | 68 | def paragraph_append(self, line): 69 | self.current_paragraph += " " + line.strip() 70 | 71 | def pre_start(self, line): 72 | self.current_paragraph = line 73 | 74 | def pre_append(self, line): 75 | self.current_paragraph += line 76 | 77 | def _new_file(self, source): 78 | line = source.next_line 79 | if line[:4] == 'sam:': 80 | self.doc.new_root('sam', line[5:]) 81 | return "SAM", source 82 | else: 83 | raise Exception("Not a SAM file!") 84 | 85 | def _block(self, source): 86 | line = source.currentLine 87 | match = self.patterns['block-start'].match(line) 88 | indent = len(match.group(1)) 89 | element = match.group(2).strip() 90 | attributes = match.group(3) 91 | content = match.group(4).strip() 92 | 93 | if content[:1] == ':': 94 | return "RECORD-START", source 95 | else: 96 | self.doc.new_block(element, attributes, content, indent) 97 | return "SAM", source 98 | 99 | def _codeblock_start(self, source): 100 | line = source.currentLine 101 | local_indent = len(line) - len(line.lstrip()) 102 | match = self.patterns['codeblock-start'].match(line) 103 | attributes = re.compile(r'\((.*?)\)').match(match.group(2).strip()) 104 | language = attributes.group(1) 105 | self.doc.new_block('codeblock', language, None, local_indent) 106 | self.pre_start('') 107 | return "CODEBLOCK", source 108 | 109 | def _codeblock(self, source): 110 | line = source.next_line 111 | if self.patterns['codeblock-end'].match(line): 112 | self.doc.add_flow(Pre(self.current_paragraph)) 113 | return "SAM", source 114 | else: 115 | self.pre_append(line) 116 | return "CODEBLOCK", source 117 | 118 | def _paragraph_start(self, source): 119 | line = source.currentLine 120 | local_indent = len(line) - len(line.lstrip()) 121 | self.doc.new_block('p', None, '', local_indent) 122 | self.paragraph_start(line) 123 | return "PARAGRAPH", source 124 | 125 | def _paragraph(self, source): 126 | line = source.next_line 127 | if self.patterns['blank-line'].match(line): 128 | para_parser.parse(self.current_paragraph, self.doc) 129 | return "SAM", source 130 | else: 131 | self.paragraph_append(line) 132 | return "PARAGRAPH", source 133 | 134 | def _list_item(self, source): 135 | line = source.currentLine 136 | match = self.patterns['list-item'].match(line) 137 | local_indent = len(match.group(1)) 138 | content_indent = local_indent + len(match.group(2)) 139 | self.doc.new_unordered_list_item(local_indent, content_indent) 140 | self.paragraph_start(str(match.group(3)).strip()) 141 | return "PARAGRAPH", source 142 | 143 | 144 | def _num_list_item(self, source): 145 | line = source.currentLine 146 | match = self.patterns['num-list-item'].match(line) 147 | local_indent = len(match.group(1)) 148 | content_indent = local_indent + len(match.group(2)) 149 | self.doc.new_ordered_list_item(local_indent, content_indent) 150 | self.paragraph_start(str(match.group(3)).strip()) 151 | return "PARAGRAPH", source 152 | 153 | def _block_insert(self, source): 154 | line = source.currentLine 155 | indent = len(source.currentLine) - len(source.currentLine.lstrip()) 156 | attribute_pattern = re.compile(r'\s*>>\((.*?)\)') 157 | match = attribute_pattern.match(line) 158 | self.doc.new_block('insert', text='', attributes=parse_insert(match.group(1)), indent=indent) 159 | return "SAM", source 160 | 161 | def _record_start(self, source): 162 | line = source.currentLine 163 | match = self.patterns['block-start'].match(line) 164 | local_indent = len(match.group(1)) 165 | local_element = match.group(2).strip() 166 | field_names = [x.strip() for x in self.patterns['record-start'].match(line).group(1).split(',')] 167 | self.doc.new_record_set(local_element, field_names, local_indent) 168 | return "RECORD", source 169 | 170 | def _record(self, source): 171 | line = source.next_line 172 | if self.patterns['blank-line'].match(line): 173 | return "SAM", source 174 | else: 175 | field_values = [x.strip() for x in line.split(',')] 176 | record = list(zip(self.doc.fields, field_values)) 177 | self.doc.new_record(record) 178 | return "RECORD", source 179 | 180 | def _sam(self, source): 181 | try: 182 | line = source.next_line 183 | except EOFError: 184 | return "END", source 185 | if self.patterns['comment'].match(line): 186 | self.doc.new_comment(Comment(line.strip()[1:])) 187 | return "SAM", source 188 | elif self.patterns['block-start'].match(line): 189 | return "BLOCK", source 190 | elif self.patterns['blank-line'].match(line): 191 | return "SAM", source 192 | elif self.patterns['codeblock-start'].match(line): 193 | return "CODEBLOCK-START", source 194 | elif self.patterns['list-item'].match(line): 195 | return "LIST-ITEM", source 196 | elif self.patterns['num-list-item'].match(line): 197 | return "NUM-LIST-ITEM", source 198 | elif self.patterns['block-insert'].match(line): 199 | return "BLOCK-INSERT", source 200 | elif self.patterns['paragraph-start'].match(line): 201 | return "PARAGRAPH-START", source 202 | else: 203 | raise Exception("I'm confused") 204 | 205 | def serialize(self, serialize_format): 206 | return self.doc.serialize(serialize_format) 207 | 208 | 209 | class Block: 210 | def __init__(self, name='', attributes='', content='', indent=0): 211 | 212 | # Test for a valid block name. Must be valid XML name. 213 | try: 214 | x = etree.Element(name) 215 | except ValueError: 216 | raise Exception("Invalid block name: " + name) 217 | 218 | self.name = name 219 | self.attributes = attributes 220 | self.content = content 221 | self.indent = indent 222 | self.parent = None 223 | self.children = [] 224 | 225 | 226 | 227 | 228 | def add_child(self, b): 229 | b.parent = self 230 | self.children.append(b) 231 | 232 | def add_sibling(self, b): 233 | b.parent = self.parent 234 | self.parent.children.append(b) 235 | 236 | def add_at_indent(self, b, indent): 237 | x = self.parent 238 | while x.indent >= indent: 239 | x = x.parent 240 | b.parent = x 241 | x.children.append(b) 242 | 243 | def __str__(self): 244 | return ''.join(self._output_block()) 245 | 246 | def _output_block(self): 247 | yield " " * self.indent 248 | yield "[%s:'%s'" % (self.name, self.content) 249 | for x in self.children: 250 | yield "\n" 251 | yield str(x) 252 | yield "]" 253 | 254 | def serialize_xml(self): 255 | yield '<{0}'.format(self.name) 256 | if self.children: 257 | if self.attributes: 258 | if self.name == 'codeblock': 259 | yield ' language="{0}"'.format(self.attributes) 260 | else: 261 | try: 262 | yield ' ids="{0}"'.format(' '.join(self.attributes[0])) 263 | except (IndexError, TypeError): 264 | pass 265 | try: 266 | yield ' conditions="{0}"'.format(' '.join(self.attributes[1])) 267 | except (IndexError, TypeError): 268 | pass 269 | 270 | yield ">" 271 | 272 | if self.content: 273 | yield "\n{0}".format(self.content) 274 | 275 | if type(self.children[0]) is not Flow: 276 | yield "\n" 277 | 278 | for x in self.children: 279 | yield from x.serialize_xml() 280 | yield "\n".format(self.name) 281 | else: 282 | yield ">{1}\n".format(self.name, self.content) 283 | 284 | 285 | class Comment(Block): 286 | def __init__(self, content='', indent=0): 287 | super().__init__(name='comment', content=content, indent=indent) 288 | 289 | def __str__(self): 290 | return "[%s:'%s']" % ('#comment', self.content) 291 | 292 | def serialize_xml(self): 293 | yield '\n'.format(self.content) 294 | 295 | 296 | class BlockInsert(Block): 297 | # Should not really inherit from Block as cannot have children, etc 298 | def __init__(self, content='', indent=0): 299 | super().__init__(name='insert', content=content, indent=indent) 300 | 301 | def __str__(self): 302 | return "[%s:'%s']" % ('#insert', self.content) 303 | 304 | def serialize_xml(self): 305 | yield '\n' 318 | 319 | 320 | class Root(Block): 321 | def __init__(self, name='', content='', indent=-1): 322 | super().__init__(name, None, content, -1) 323 | 324 | def serialize_xml(self): 325 | yield '\n' 326 | yield '' # should include namespace and schema 327 | for x in self.children: 328 | yield from x.serialize_xml() 329 | yield '' 330 | 331 | 332 | class Flow: 333 | def __init__(self, thing=None): 334 | self.flow = [] 335 | if thing: 336 | self.append(thing) 337 | 338 | def __str__(self): 339 | return "[{0}]".format(''.join([str(x) for x in self.flow])) 340 | 341 | def append(self, thing): 342 | if not thing == '': 343 | self.flow.append(thing) 344 | 345 | def serialize_xml(self): 346 | for x in self.flow: 347 | try: 348 | yield from x.serialize_xml() 349 | except AttributeError: 350 | yield self._escape_for_xml(x) 351 | 352 | def _escape_for_xml(self, s): 353 | t = dict(zip([ord('<'), ord('>'), ord('&')], ['<', '>', '&'])) 354 | return s.translate(t) 355 | 356 | 357 | class StringSource: 358 | def __init__(self, string_to_parse): 359 | """ 360 | 361 | :param string_to_parse: The string to parse. 362 | """ 363 | self.current_line = None 364 | self.current_line_number = 0 365 | self.buf = io.StringIO(string_to_parse) 366 | 367 | @property 368 | def next_line(self): 369 | self.current_line = self.buf.readline() 370 | if self.current_line == "": 371 | raise EOFError("End of file") 372 | self.current_line_number += 1 373 | return self.current_line 374 | 375 | 376 | def parse_block_attributes(attributes_string): 377 | try: 378 | attributes_list = attributes_string.split() 379 | except AttributeError: 380 | return None, None 381 | ids = [x[1:] for x in attributes_list if x[0] == '#'] 382 | conditions = [x[1:] for x in attributes_list if x[0] == '?'] 383 | unexpected_attributes = [x for x in attributes_list if not(x[0] in '?#')] 384 | if unexpected_attributes: 385 | raise Exception("Unexpected insert attribute(s): {0}".format(unexpected_attributes)) 386 | return ids if ids else None, conditions if conditions else None 387 | 388 | 389 | def parse_insert(insert_string): 390 | attributes_list = insert_string.split() 391 | insert_type = attributes_list.pop(0) 392 | insert_url = attributes_list.pop(0) 393 | insert_id = [x[1:] for x in attributes_list if x[0] == '#'] 394 | insert_condition = [x[1:] for x in attributes_list if x[0] == '?'] 395 | unexpected_attributes = [x for x in attributes_list if not(x[0] in '?#')] 396 | if unexpected_attributes: 397 | raise Exception("Unexpected insert attribute(s): {0}".format(unexpected_attributes)) 398 | return insert_type, \ 399 | insert_url, \ 400 | insert_id if insert_id else None, \ 401 | insert_condition if insert_condition else None 402 | 403 | if __name__ == "__main__": 404 | samParser = SamParser() 405 | filename = sys.argv[-1] 406 | try: 407 | with open(filename, "r") as myfile: 408 | test = myfile.read() 409 | except FileNotFoundError: 410 | test = """samschema:(http:// 411 | this: 412 | is: a test""" 413 | 414 | samParser.parse(StringSource(test)) 415 | print("".join(samParser.serialize('xml'))) 416 | -------------------------------------------------------------------------------- /docsource/recipes.sam: -------------------------------------------------------------------------------- 1 | article: SAM Recipes 2 | 3 | <<<(menu.sami) 4 | 5 | SAM is an abstract markup language, otherwise known as a meta-language. This means that it is intended, like XML, 6 | defining your own markup languages. Unlike XML, however, SAM pre-defines a basic set of structures. 7 | SAM's predefined structures provide a simple clear syntax for common text structures, but they don't cover 8 | everything you might need. The following recipes are suggestions on how to handle some common markup 9 | problems in SAM. But remember, these involve you defining markup for yourself and require you to handle that 10 | markup in the application layer. 11 | 12 | recipe: Footnotes 13 | 14 | SAM does not have an predefined markup for footnotes, but it does have support for IDs and citations, 15 | which provide a reference mechanism for footnotes. The suggested recipe for footnotes is to create a 16 | footnote block with an id, and use a citation to reference that id: 17 | 18 | ```(sam) 19 | This paragraph reqires a footnote.[*1] 20 | 21 | footnote:(*1) 22 | This is a footnote. 23 | 24 | The footnote structure can be anywhere in the document, but it makes sense to place it immediately 25 | after the paragraph in which the reference occurs. You can move it to the bottom of the page or the 26 | of the chapter or book in the publishing process. 27 | 28 | Note that the footnote structure is not a child of the paragraph but a sibling at the same 29 | level of indentation. Paragraphs cannot have block children. 30 | 31 | recipe: Terminal sessions 32 | 33 | There is no support for SAM markup in the body of a codeblock. This works for code, since it 34 | requires no escaping, regardless of language. It does not work so well for terminal sessions 35 | if you want to mark up the prompt, input, and output. If you need this kind of terminal session 36 | support, the suggested recipe is to create a terminal session block and use fields for 37 | the sequence of prompt, input, and response. This reads quite clearly and you can easily 38 | use it to construct a formatted terminal session illustration when publishing. 39 | 40 | ```(sam) 41 | terminal: 42 | prompt: $ 43 | input: dir 44 | response: Empty directory. 45 | 46 | recipe: Semantic lists 47 | 48 | Sometimes you have semantic lists. That is, the list and its items are of a specific type. For 49 | example: 50 | 51 | ```(sam) 52 | filmography: 53 | film: Rio Bravo 54 | film: The Shootist 55 | 56 | If you don't want to have to repeat the item name over and over, you can use a recordset 57 | with a single field. 58 | 59 | ```(sam) 60 | filmography:: film 61 | Rio Bravo 62 | The Shootish 63 | 64 | Notice that while these two constructions capture the same data and the same semantics, 65 | they are not the same structure, and are therefore not interchangeable. You have to decide 66 | which one you want in your markup language and provide processing for it accordingly. 67 | 68 | Also note the the XML representation of the two forms is different. The first outputs this: 69 | 70 | ```(xml) 71 | 72 | Rio Bravo 73 | The Shootist 74 | 75 | 76 | The second outputs this: 77 | 78 | ```(xml) 79 | 80 | 81 | Rio Bravo 82 | 83 | 84 | The Shootist 85 | 86 | 87 | 88 | Of course, it is easy enough to transform the second output into the first in post-processing. 89 | 90 | recipe: Subscripts and superscripts 91 | SAM does not have predefined support for superscripts and subscripts. If you need then, you 92 | need to define a tagging language the provides them. You would do this with annotations: 93 | 94 | ```(sam) 95 | H{2}(super)S0{4}(super) 96 | 97 | recipe: Conditional paragraphs 98 | You can't apply attributes to a paragraph. If you want to make paragraphs conditional, 99 | support fragments in your tagging language and put the conditions on the fragments. 100 | 101 | ```(sam) 102 | ~~~(?novice) 103 | Be very careful and ask for help if you need it. 104 | 105 | Push the big red button and run. 106 | 107 | recipe: Use a lookup file to make annotations easier 108 | SAM allows you to omit the annotations on a phrase after you have annotated it the first 109 | time in a file. When the parser sees a phrase with no annotation or citation, it looks 110 | back through the file an finds the last instance of that phrase and copies its annotation. 111 | 112 | This is a time saver. If you have a set of phrases that are commonly annotated 113 | the same way across a number of documents, you can create a lookup file in which those 114 | phrases are annotated and import it into each document. Then you will not need to 115 | annotate any of those phrases in the document. Just mark them up as phrases using 116 | curly braces and the parser will fill in the annotations. 117 | 118 | You need to make sure that the contents of the lookup file do not become part of the 119 | published document. To do this, simply create a structure in the lookup file to hold 120 | the list of annotations and suppress that structure in the application layer. 121 | 122 | ```(sam) 123 | dont-publish-this: 124 | {Enter}(key) 125 | {the Duke}(actor "John Wayne" (SAG)) 126 | 127 | recipe: Complex labeled lists 128 | SAM provides a simple labeled list format that allows only one paragraph attached to a 129 | label. For anything more complex than this, construct a labeled list structured with 130 | blocks and fields. 131 | 132 | ```(sam) 133 | ll: 134 | li: 135 | lable: Item label 136 | item: 137 | The text of the item, including: 138 | 139 | * paragraphs 140 | * lists 141 | * etc 142 | 143 | recipe: Single sourcing via attribute substitution 144 | You can provide single sourcing support via attribute chaining. That is, you can provide both an 145 | HTTP link and a subject annotation. But if you use the trick of importing a file of 146 | annotation definitions and relying on annotation lookup to add them to the content, you 147 | can also substitute in different annotation lookup sets for different media (or different 148 | anything else). This will get easier when we add catalog support. 149 | 150 | recipe: Index markers 151 | 152 | There are a number of ways to implement index markers in SAM. The first it to avoid the 153 | use of explict index marker altogether and to generate an index based on semantic annotation. 154 | 155 | ```(sam) 156 | {The Duke}(actor "John Wayne") plays an ex-Union colonel. 157 | 158 | In this case you could generate an index entry for `John Wayne`, and a categorized entry 159 | for `Actor:John Wayne` from this annotation. 160 | 161 | 162 | Note that you don't have to generate an index entry for every instance of 163 | an annotated phrase. You can choose to generate an entry only for the first occurrence of 164 | an annotated phrase in a section or chapter, for instance. 165 | 166 | To implement explicit index markers, you can use annotations with a type `index`: 167 | 168 | ```(sam) 169 | {The Duke}(index "John Wayne; Actor:John Wayne") plays an ex-Union colonel. 170 | 171 | To implement index markers that span a passage, you can use a field on a block: 172 | 173 | ```(sam) 174 | bio: John Wayne 175 | index: John Wayne; Actor:John Wayne 176 | 177 | John Wayne was an actor known for his roles in westerns. 178 | 179 | To implement index markers that span an arbitrary passage within a block, you 180 | can use a field on a fragment: 181 | 182 | ```(sam) 183 | ~~~ 184 | index: John Wayne; Actor:John Wayne 185 | 186 | Jimmy Stewart also made a number of films with John Wayne. 187 | ... 188 | 189 | Remember that these are suggestions as to how you might implement index markers 190 | in a tagging language based on SAM. SAM does not provide explicit support for 191 | index markers. 192 | 193 | recipe: Code callouts 194 | 195 | There is no way to insert markup into a regular codeblock. If you want to do callouts 196 | in code, you can use lines for the code and citations for the callouts. (The presumptive 197 | semantics for citations is that what they produce is based on the thing they refer to.) 198 | 199 | ```(sam) 200 | codesample: 201 | language: python 202 | code: 203 | | print("Hello World")[*c1] 204 | | print ("Goodbye, cruel World")[*c2] 205 | callouts: 206 | callout:(*c1) This prints "Hello World". 207 | callout:(*c2) This prints "Goodbye, cruel World". 208 | 209 | recipe: Markdown style deferred links 210 | ```(sam) 211 | Look it up in {Wikipedia}[*wikipedia] 212 | 213 | link:(*wikipedia) http://wikipedia.org 214 | 215 | recipe: Complex tables 216 | 217 | Sam {grids} do not support spanning rows and columns in a table. However, you 218 | could provide for spanning rows and columns in grids at the application 219 | layer if you wanted to. This allows you to adopt a syntax for spanning 220 | rows and columns that works for your markup language. 221 | 222 | One possible way to do this as to indicate spanning rows and columns as follows. The example 223 | here is based on {https://tex.stackexchange.com/questions/368176/i-badly-need-to-generate-the-following-table}(url) which has shots of layout. 224 | 225 | ```(sam) 226 | +++ 227 | Paper Title | Performance Measure ||| Image Type 228 | _ | Evaluation Metric | Proposed Method | traditional method | _ 229 | | DC | 0.0019 | 0.0021 | 230 | | JS | 0.9975 | 0.9916 | 231 | | DSC | 0.9987 | 0.9958 | 232 | 233 | Here the column spanning in indicated by stacking all of the vertical bars that separate 234 | the columns together at the end of the spanned column while the row spanning is 235 | indicated by placing a single underscore character in the row to be spanned. 236 | 237 | This will result in XML output where the cells to be column-spanned with contain 238 | and empty string and those to be row-spanned contain a single underscore. The 239 | application layer processing can then create the column and row spans accordingly. 240 | 241 | You could also do a complex table in fully explicit markup something like this: 242 | 243 | ```(sam) 244 | table: 245 | header: 246 | row: 247 | cell: Paper Title 248 | cell: Performance Measure 249 | cell: #hspan 250 | cell: #hspan 251 | cell: Image Type 252 | row: 253 | cell: #vspan 254 | cell: Evaluation Metric 255 | cell: Proposed Method 256 | cell: traditional method 257 | cell: #vspan 258 | body: 259 | row: 260 | cell: #empty 261 | cell: DC 262 | cell: 0.0019 263 | cell: 0.0021 264 | cell: #empty 265 | 266 | row: 267 | cell: #empty 268 | cell: JS 269 | cell: 0.9975 270 | cell: 0.9916 271 | cell: #empty 272 | 273 | row: 274 | cell: #empty 275 | cell: DSC 276 | cell: 0.9987 277 | cell: 0.9958 278 | cell: #empty 279 | 280 | Because SAM does not support arbitrary attributes, this approach uses special cell content 281 | to indicate spanning. The application layer has to interpret the `#vsapn` and `#hspan`. `#empty` is really just semantic sugar. 282 | 283 | Another alternative, it to use nested tables, though this will produce a slightly different 284 | layout: 285 | 286 | ```(sam) 287 | table: 288 | header: 289 | row: 290 | cell: Paper Title 291 | cell: Performance Measure 292 | cell: Image Type 293 | body: 294 | row: 295 | cell: 296 | cell: 297 | table: 298 | header: 299 | cell: Evaluation Metric 300 | cell: Proposed Method 301 | cell: traditional method 302 | body: 303 | row: 304 | cell: DC 305 | cell: 0.0019 306 | cell: 0.0021 307 | 308 | row: 309 | cell: JS 310 | cell: 0.9975 311 | cell: 0.9916 312 | 313 | row: 314 | cell: DSC 315 | cell: 0.9987 316 | cell: 0.9958 317 | cell: 318 | 319 | This can be expressed more compactly like this: 320 | 321 | ```(sam) 322 | table: 323 | header: 324 | row: 325 | cell: Paper Title 326 | cell: Performance Measure 327 | cell: Image Type 328 | body: 329 | row: 330 | cell: 331 | cell: 332 | +++ 333 | Evaluation Metric | Proposed Method | traditional method 334 | DC | 0.0019 | 0.0021 335 | JS | 0.9975 | 0.9916 336 | DSC | 0.9987 | 0.9958 337 | cell: 338 | 339 | recipe: Stemming support for annotation lookup 340 | 341 | The {annotation lookup} feature allows writers to avoid having to explicitly annotate 342 | phrases every time they mention them. If the parser detect an unannotated phrase, it 343 | will work its way back through the document, including any included files, looking for 344 | the same phrase with an annotation. It then copies the annotation from the first 345 | annotated phrase it finds. The question is, how does it match phrases? The parser 346 | provides two {annotation lookup modes}, `case_sensitive` and `case_insensitive`. However, 347 | this does not account for cases where you might want various forms of the word to 348 | be annotated the same way, such as "run", "ran", and "running". To match different 349 | forms of a word, you need to use a technique called "stemming". 350 | 351 | Stemming is not universal. There are different stemming algorithms with different 352 | strengths and weaknesses, and different languages require different stemming 353 | algorithms. Rather than choosing one, therefore, the parser leaves it open to the 354 | user to supply their own annotation lookup mode, which may implement stemming or 355 | any other approach to annotation lookup that the user wants. 356 | 357 | You can add other {annotation lookup modes} by writing a lookup function and adding it to the dictionary `samparser.annotation_lookup_modes`. -------------------------------------------------------------------------------- /docs/parser.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | SAM Parser 5 | 6 | 7 | 8 | 9 |
10 |

SAM Parser

11 | 12 | 15 |

SAM Parser is an application to process SAM files into an equivalent XML representation which you can then further process in any desired form of output.

16 |

SAM Parser allows to specify an XSLT 1.0 stylesheet to post-process the output XML into a desired output format. In particular, it only allows for processing individual SAM files to individual output files. It does not support any kind of assembly or linking of multiple source files. This facility is not intended to meet all output needs, but it provides a simple way to create basic outputs.

17 |

SAM Parser is a Python 3 program that requires the regex and libxml libraries that are not distributed with the default Python distribution. If you don't already have a Python 3 install, the easiest way to get one with the required libraries installed is to install Anaconda.

18 |

SAM Parser is invoked as follows:

19 |
samparser <output_mode> <options>
 20 | 
21 |

Three output modes are available:

22 |
    23 |
  • 24 |

    xml outputs a version of the document in XML. You can use an XSD schema to validate the output file and/or an XSLT 1.0 stylesheet to transform it into another format.

    25 |
  • 26 |
  • 27 |

    html outputs a version of the document in HTML with embedded semantic markup. You can supply one of more CSS stylesheets and/or one or more JavaScrip files to include.

    28 |
  • 29 |
  • 30 |

    regurgitate outputs a version of the document in SAM with various references resolved and normalized.

    31 |
  • 32 |
33 |

All output modes take the following options:

34 |
    35 |
  • 36 |

    <infile> The path the SAM file or files to be processed. (required)

    37 |
  • 38 |
  • 39 |

    [-outfile <output file>]|[-outdir <output directory> [-outputextension <output extension>]] Specifies either an output file or a directory to place output files and the file extension to apply those files. (optional, defaults to the console)

    40 |
  • 41 |
  • 42 |

    -smartquotes <smartquote_rules> The path to a file containing smartquote rules.

    43 |
  • 44 |
  • 45 |

    -expandrelativepaths Causes the parser to expand relative paths in SAM insert statements when serializing output. Use this if you want paths relative to the source file to be made portable

    46 |
  • 47 |
48 |

in the output file.

49 |

XML output mode takes the following options:

50 |
    51 |
  • 52 |

    [-xslt <xslt-file> [-transformedoutputfile <transformed file>]\[-transformedoutputdir <tansformed ouput dir> [-transformedextension <transformed files extension]] Specifies an XSL 1.0 stylesheet to use to transform the XML output into a final format, along with the file name or directory and extension to use for the transformed output.

    53 |
  • 54 |
  • 55 |

    [-xsd <XSD schema file>] Specifies an XSD schema file to use to validate the XML output file.

    56 |
  • 57 |
58 |

HTML output mode takes the follow options:

59 |
    60 |
  • 61 |

    -css <css file location> Specifies the path to a CSS file to include in the HTML output file. (optional, repeatable)

    62 |
  • 63 |
  • 64 |

    -javascipt <javascript file location> Specifies the path to a JavaScript file to include in the HTML output file. (optional, repeatable)

    65 |
  • 66 |
67 |

Regurgitate mode does not take any additional options.

68 |

Short forms of the options are available as follows

69 | 70 | 71 | 72 | 73 | 74 | 75 | 76 | 77 | 78 | 79 | 80 | 81 | 82 | 83 | 84 | 85 | 86 | 87 | 88 | 89 | 90 | 91 | 92 | 93 | 94 | 95 | 96 | 97 | 98 | 99 | 100 | 101 | 102 | 103 | 104 | 105 | 106 | 107 | 108 | 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 |
optionshort form
-outfile-o
-outdir-od
-expandrelativepaths-xrp
-outputextension-oext
-smartquotes-sq
-xslt-x
-xsd
-transformedoutputfile-to
-transformedoutputdir-tod
-transformedextension-toext
-css
-javascript
123 |
124 |

Validating with an XML schema

125 | 126 |

Eventually, SAM is going to have its own schema language, but until that is available (and probably afterward) you can validate your document against an XML schema. Schema validation is done on the XML output format, not the input (because it is an XML schema, not a SAM schema). To invoke schema validation, use the -xsd option on the command line:

127 |
-xsd <scehma.xsd>
128 | 
129 |
130 |
131 |

Expanding relative paths

132 | 133 |

The SAM parser can expand relative paths of insert statements in the source document while serializing the output. This can be useful if the location of the output file is not the same relative to the included resources as the location of the source file. To tell the parser to expand relative paths into absolute URLs, use the -expandrelativepaths option. The short form is -xrp.

134 |
- xrp
135 | 
136 |

Note that this applies to paths in SAM insert statements only. If you include paths in custom structures in your markup, they will not be expanded as the parser has no way of knowing that the value of a custom structure is a path.

137 |
138 |
139 |

Regurgitating the SAM document

140 | 141 |

The parser can also regurgitate the SAM document (that is, create a SAM serialization of the structure defined by the original SAM document). The regurgitated version may be different in small ways from the input document but will create the same structures internally and will serialize the same way as the original. Some of the difference are:

142 |
    143 |
  • 144 |

    Character entities will be replaced by Unicode characters.

    145 |
  • 146 |
  • 147 |

    Paragraphs will be all on one line

    148 |
  • 149 |
  • 150 |

    Bold and italic decorations will be replaced with equivalent annotations.

    151 |
  • 152 |
  • 153 |

    Some non-essential character escapes may be included.

    154 |
  • 155 |
  • 156 |

    Annotation lookups will be performed and any !annotation-lookup declaration will be removed.

    157 |
  • 158 |
  • 159 |

    Smart quote processing will be performed and any !smart-quotes declaration will be removed.

    160 |
  • 161 |
162 |

To regurgitate, use the regurgitate output mode.

163 |
164 |
165 |

Smart quotes

166 | 167 |

The parser incorporates a smart quotes feature. The writer can specify that they want smartquotes processing for their document by including the smartquotes declaration at the start of their document.

168 |
!smart-quotes: on
169 | 
170 |

By default, the parser supports two values for the smart quotes declaration, on and off (the default). The built-in on setting supports the following translations:

171 |
    172 |
  • 173 |

    single quotes to curly quotes

    174 |
  • 175 |
  • 176 |

    double quotes to curly double quotes

    177 |
  • 178 |
  • 179 |

    single quotes as apostrophe to curly quotes

    180 |
  • 181 |
  • 182 |

    <space>--<space> to en-dash

    183 |
  • 184 |
  • 185 |

    --- to em-dash

    186 |
  • 187 |
188 |

Note that no smart quote algorithm is perfect. This option will miss some instances and may get some wrong. To ensure you always get the characters you want, enter the unicode characters directly or use a character entity.

189 |

Smart quote processing is not applied to code phrases or to codeblocks or embedded markup.

190 |

Because different writers may want different smart quote rules, or different rules may be appropriate to different kinds of materials. the parser lets you specify your own sets of smart quote rules. Essentially this lets you detect any pattern in the text and define a substitution for it. You can use it for any characters substitutions that you find useful, even those having nothing to do with quotes.

191 |

To define a set of smart quote substitutions, create a XML file like the sq.xml file included with the parser. This file includes two alternate sets of smart quote rules, justquotes and justdashes, which contains rulesets which process just quotes and just dashes respectively. The dashes and quotes rules in this file are the same as those built in to the parser. Note, however, that the parser does not use these files by default.

192 |

To invoke the justquotes rule set:

193 |
    194 |
  1. 195 |

    Add the declaration !smart-quotes: justquotes to the document.

    196 |
  2. 197 |
  3. 198 |

    Use the command line parameter -sq <path-to-sam-directory>/sq.xml.

    199 |
  4. 200 |
201 |

To add a custom rule set, create your own rule set file and invoke it in the same way.

202 |

Note that the rules in each rule set are represented by regular expressions. The rules detect characters based on their surroundings. They do not detect quotations by finding the opening and closing quotes as a pair. They find them separately. This means that the order of rules in the rule file may be important. In the default rules, close quote rules are listed first. Reversing the order might result in some close quotes being detected as open quotes.

203 |
204 |
205 |

HTML Output Mode

206 | 207 |

Normally SAM is serialized to XML which you can then process to produce HTML or any other output you want. However, the parser also supports outputting HTML directly. The attraction of this is that it allows you to have a semantically constrained input format that can be validated with a schema but which can still output to HTML5 directly.

208 |

SAM structures are output to HTML as follows:

209 |
    210 |
  • 211 |

    Named blocks are output as HTML <div> elements. The SAM block name is output as the class attribute of the DIV elements, allowing you to attach specific CSS styling to each type of block.

    212 |
  • 213 |
  • 214 |

    Codeblocks are output as pre elements with the language attribute output as a data-language and the class as codeblock. Code is wrapped in code tags, also with class as codeblock.

    215 |
  • 216 |
  • 217 |

    Embedded data is ignored and a warning is issued.

    218 |
  • 219 |
  • 220 |

    Paragraphs, ordered lists, and unordered lists are output as their HTML equivalents.

    221 |
  • 222 |
  • 223 |

    Labelled lists are output as definition lists.

    224 |
  • 225 |
  • 226 |

    Grids are output as tables.

    227 |
  • 228 |
  • 229 |

    Record sets are output as tables with the field names as table headers.

    230 |
  • 231 |
  • 232 |

    Inserts by URL become object elements. String inserts are resolved if the named string is available. Inserts by ID are resolved by inserting the object with the specified ID. A warning will be raised and the insert will be ignored if you try to insert a block element with an inline insert or and inline element with a block insert. All other inserts are ignored and a warning is raised.

    233 |
  • 234 |
  • 235 |

    Phrases are output as spans with the class attribute phrase.

    236 |
  • 237 |
  • 238 |

    Annotations are output as spans nested within the phrase spans they annotate. The specifically and namespace attributes of an annotation are output as data-* attributes.

    239 |
  • 240 |
  • 241 |

    Attributes are output as HTML attributes. ID attributes are output as HTML id attributes. Language-code attributes are ouput as HTML lang attributes. Other attributes are output as HTML data-* attributes.

    242 |
  • 243 |
  • 244 |

    An HTML head element is generated which includes the title elements if the root block of the SAM document has a title. It also includes <meta charset = "UTF-8">.

    245 |
  • 246 |
247 |

To generate HTML output, use the html output mode the command line.

248 |

To specify a stylesheet to be imported by the resulting HTML file, use the -css option with the URL of the css file to be included (relative to the location of the HTML file). You can specify the -css option more than once.

249 |

To specify a javascript file to be imported by the resulting HTML file, use the -javascript option with the URL of the javascript file to be included (relative to the location of the HTML file). You can specify the -javascript option more than once.

250 |
251 |
252 |

Running SAM Parser on Windows

253 | 254 |

To run SAM Parser on Windows, use the samparser batch file:

255 |
samparser xml foo.sam -o foo.xml -x foo2html.xslt -to foo.html
256 | 
257 | 258 |

To run SAM Parser on Xnix or Mac, invoke Python 3 as appropriate on your system. For example:

259 |
python3 samparser.py xml foo.sam -o foo.xml -x foo2html.xslt -to foo.html
260 | 
261 | 
262 | 
263 | 
264 |
265 |
266 | 267 | -------------------------------------------------------------------------------- /docs/recipes.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | SAM Recipes 5 | 6 | 7 | 8 | 9 |
10 |

SAM Recipes

11 | 12 | 15 |

SAM is an abstract markup language, otherwise known as a meta-language. This means that it is intended, like XML, defining your own markup languages. Unlike XML, however, SAM pre-defines a basic set of structures. SAM's predefined structures provide a simple clear syntax for common text structures, but they don't cover everything you might need. The following recipes are suggestions on how to handle some common markup problems in SAM. But remember, these involve you defining markup for yourself and require you to handle that markup in the application layer.

16 |
17 |

Footnotes

18 | 19 |

SAM does not have an predefined markup for footnotes, but it does have support for IDs and citations, which provide a reference mechanism for footnotes. The suggested recipe for footnotes is to create a footnote block with an id, and use a citation to reference that id:

20 |
This paragraph reqires a footnote.[*1]
 21 | 
 22 | footnote:(*1)
 23 |     This is a footnote.
 24 | 
25 |

The footnote structure can be anywhere in the document, but it makes sense to place it immediately after the paragraph in which the reference occurs. You can move it to the bottom of the page or the of the chapter or book in the publishing process.

26 |

Note that the footnote structure is not a child of the paragraph but a sibling at the same level of indentation. Paragraphs cannot have block children.

27 |
28 |
29 |

Terminal sessions

30 | 31 |

There is no support for SAM markup in the body of a codeblock. This works for code, since it requires no escaping, regardless of language. It does not work so well for terminal sessions if you want to mark up the prompt, input, and output. If you need this kind of terminal session support, the suggested recipe is to create a terminal session block and use fields for the sequence of prompt, input, and response. This reads quite clearly and you can easily use it to construct a formatted terminal session illustration when publishing.

32 |
terminal:
 33 |     prompt: $
 34 |     input: dir
 35 |     response: Empty directory.
 36 | 
37 |
38 |
39 |

Semantic lists

40 | 41 |

Sometimes you have semantic lists. That is, the list and its items are of a specific type. For example:

42 |
filmography:
 43 |     film: Rio Bravo
 44 |     film: The Shootist
 45 | 
46 |

If you don't want to have to repeat the item name over and over, you can use a recordset with a single field.

47 |
filmography:: film
 48 |     Rio Bravo
 49 |     The Shootish
 50 | 
51 |

Notice that while these two constructions capture the same data and the same semantics, they are not the same structure, and are therefore not interchangeable. You have to decide which one you want in your markup language and provide processing for it accordingly.

52 |

Also note the the XML representation of the two forms is different. The first outputs this:

53 |
<filmography>
 54 | <film>Rio Bravo</film>
 55 | <film>The Shootist</film>
 56 | </filmography>
 57 | 
58 |

The second outputs this:

59 |
<filmography>
 60 | <row>
 61 | <film>Rio Bravo</film>
 62 | </row>
 63 | <row>
 64 | <film>The Shootist</film>
 65 | </row>
 66 | </filmography>
 67 | 
68 |

Of course, it is easy enough to transform the second output into the first in post-processing.

69 |
70 |
71 |

Subscripts and superscripts

72 | 73 |

SAM does not have predefined support for superscripts and subscripts. If you need then, you need to define a tagging language the provides them. You would do this with annotations:

74 |
H{2}(super)S0{4}(super)
 75 | 
76 |
77 |
78 |

Conditional paragraphs

79 | 80 |

You can't apply attributes to a paragraph. If you want to make paragraphs conditional, support fragments in your tagging language and put the conditions on the fragments.

81 |
~~~(?novice)
 82 |     Be very careful and ask for help if you need it.
 83 | 
 84 | Push the big red button and run.
 85 | 
86 |
87 |
88 |

Use a lookup file to make annotations easier

89 | 90 |

SAM allows you to omit the annotations on a phrase after you have annotated it the first time in a file. When the parser sees a phrase with no annotation or citation, it looks back through the file an finds the last instance of that phrase and copies its annotation.

91 |

This is a time saver. If you have a set of phrases that are commonly annotated the same way across a number of documents, you can create a lookup file in which those phrases are annotated and import it into each document. Then you will not need to annotate any of those phrases in the document. Just mark them up as phrases using curly braces and the parser will fill in the annotations.

92 |

You need to make sure that the contents of the lookup file do not become part of the published document. To do this, simply create a structure in the lookup file to hold the list of annotations and suppress that structure in the application layer.

93 |
dont-publish-this:
 94 |     {Enter}(key)
 95 |     {the Duke}(actor "John Wayne" (SAG))
 96 | 
97 |
98 |
99 |

Complex labeled lists

100 | 101 |

SAM provides a simple labeled list format that allows only one paragraph attached to a label. For anything more complex than this, construct a labeled list structured with blocks and fields.

102 |
ll:
103 |     li:
104 |         lable: Item label
105 |         item:
106 |             The text of the item, including:
107 | 
108 |             * paragraphs
109 |             * lists
110 |             * etc
111 | 
112 |
113 |
114 |

Single sourcing via attribute substitution

115 | 116 |

You can provide single sourcing support via attribute chaining. That is, you can provide both an HTTP link and a subject annotation. But if you use the trick of importing a file of annotation definitions and relying on annotation lookup to add them to the content, you can also substitute in different annotation lookup sets for different media (or different anything else). This will get easier when we add catalog support.

117 |
118 |
119 |

Index markers

120 | 121 |

There are a number of ways to implement index markers in SAM. The first it to avoid the use of explict index marker altogether and to generate an index based on semantic annotation.

122 |
{The Duke}(actor "John Wayne") plays an ex-Union colonel.
123 | 
124 |

In this case you could generate an index entry for John Wayne, and a categorized entry for Actor:John Wayne from this annotation.

125 |

Note that you don't have to generate an index entry for every instance of an annotated phrase. You can choose to generate an entry only for the first occurrence of an annotated phrase in a section or chapter, for instance.

126 |

To implement explicit index markers, you can use annotations with a type index:

127 |
{The Duke}(index "John Wayne; Actor:John Wayne") plays an ex-Union colonel.
128 | 
129 |

To implement index markers that span a passage, you can use a field on a block:

130 |
bio: John Wayne
131 |     index: John Wayne; Actor:John Wayne
132 | 
133 |     John Wayne was an actor known for his roles in westerns.
134 | 
135 |

To implement index markers that span an arbitrary passage within a block, you can use a field on a fragment:

136 |
~~~
137 |     index: John Wayne; Actor:John Wayne
138 | 
139 |     Jimmy Stewart also made a number of films with John Wayne.
140 |     ...
141 | 
142 |

Remember that these are suggestions as to how you might implement index markers in a tagging language based on SAM. SAM does not provide explicit support for index markers.

143 |
144 |
145 |

Code callouts

146 | 147 |

There is no way to insert markup into a regular codeblock. If you want to do callouts in code, you can use lines for the code and citations for the callouts. (The presumptive semantics for citations is that what they produce is based on the thing they refer to.)

148 |
149 |
150 |
python
151 |
152 |
print("Hello World")
153 |
print ("Goodbye, cruel World")
154 |
155 |
156 |
This prints "Hello World".
157 |
This prints "Goodbye, cruel World".
158 |
159 |
160 |
161 |
162 |

Markdown style deferred links

163 | 164 |
Look it up in {Wikipedia}[*wikipedia]
165 | 
166 | link:(*wikipedia) http://wikipedia.org
167 | 
168 |
169 |
170 |

Complex tables

171 | 172 |

Sam grids do not support spanning rows and columns in a table. However, you could provide for spanning rows and columns in grids at the application layer if you wanted to. This allows you to adopt a syntax for spanning rows and columns that works for your markup language.

173 |

One possible way to do this as to indicate spanning rows and columns as follows. The example here is based on https://tex.stackexchange.com/questions/368176/i-badly-need-to-generate-the-following-table which has shots of layout.

174 |
+++
175 |     Paper Title | Performance Measure                                    ||| Image Type
176 |          _      | Evaluation Metric | Proposed Method | traditional method |      _
177 |                 | DC                | 0.0019          | 0.0021             | 
178 |                 | JS                | 0.9975          | 0.9916             | 
179 |                 | DSC               | 0.9987          | 0.9958             | 
180 | 
181 |

Here the column spanning in indicated by stacking all of the vertical bars that separate the columns together at the end of the spanned column while the row spanning is indicated by placing a single underscore character in the row to be spanned.

182 |

This will result in XML output where the cells to be column-spanned with contain and empty string and those to be row-spanned contain a single underscore. The application layer processing can then create the column and row spans accordingly.

183 |

You could also do a complex table in fully explicit markup something like this:

184 |
table:
185 |     header:
186 |         row:
187 |             cell: Paper Title
188 |             cell: Performance Measure
189 |             cell: #hspan
190 |             cell: #hspan
191 |             cell: Image Type
192 |         row: 
193 |             cell: #vspan
194 |             cell: Evaluation Metric
195 |             cell: Proposed Method
196 |             cell: traditional method
197 |             cell: #vspan
198 |     body:
199 |         row:
200 |             cell: #empty
201 |             cell: DC
202 |             cell: 0.0019
203 |             cell: 0.0021
204 |             cell: #empty
205 | 
206 |         row:
207 |             cell: #empty
208 |             cell: JS
209 |             cell: 0.9975
210 |             cell: 0.9916
211 |             cell: #empty
212 | 
213 |         row:
214 |             cell: #empty
215 |             cell: DSC
216 |             cell: 0.9987
217 |             cell: 0.9958
218 |             cell: #empty 
219 | 
220 |

Because SAM does not support arbitrary attributes, this approach uses special cell content to indicate spanning. The application layer has to interpret the #vsapn and #hspan. #empty is really just semantic sugar.

221 |

Another alternative, it to use nested tables, though this will produce a slightly different layout:

222 |
table:
223 |     header:
224 |         row:
225 |             cell: Paper Title
226 |             cell: Performance Measure
227 |             cell: Image Type
228 |     body:
229 |         row: 
230 |             cell: 
231 |             cell:
232 |                 table:
233 |                     header:
234 |                             cell: Evaluation Metric
235 |                             cell: Proposed Method
236 |                             cell: traditional method
237 |                     body:
238 |                         row:
239 |                             cell: DC
240 |                             cell: 0.0019
241 |                             cell: 0.0021
242 | 
243 |                         row:
244 |                             cell: JS
245 |                             cell: 0.9975
246 |                             cell: 0.9916
247 | 
248 |                         row:
249 |                             cell: DSC
250 |                             cell: 0.9987
251 |                             cell: 0.9958
252 |             cell: 
253 | 
254 |

This can be expressed more compactly like this:

255 |
table:
256 |     header:
257 |         row:
258 |             cell: Paper Title
259 |             cell: Performance Measure
260 |             cell: Image Type
261 |     body:
262 |         row: 
263 |             cell: 
264 |             cell:
265 |                 +++
266 |                     Evaluation Metric | Proposed Method | traditional method      
267 |                     DC                | 0.0019          | 0.0021             
268 |                     JS                | 0.9975          | 0.9916              
269 |                     DSC               | 0.9987          | 0.9958              
270 |             cell: 
271 | 
272 |
273 |
274 |

Stemming support for annotation lookup

275 | 276 |

The annotation lookup feature allows writers to avoid having to explicitly annotate phrases every time they mention them. If the parser detect an unannotated phrase, it will work its way back through the document, including any included files, looking for the same phrase with an annotation. It then copies the annotation from the first annotated phrase it finds. The question is, how does it match phrases? The parser provides two annotation lookup modes, case_sensitive and case_insensitive. However, this does not account for cases where you might want various forms of the word to be annotated the same way, such as "run", "ran", and "running". To match different forms of a word, you need to use a technique called "stemming".

277 |

Stemming is not universal. There are different stemming algorithms with different strengths and weaknesses, and different languages require different stemming algorithms. Rather than choosing one, therefore, the parser leaves it open to the user to supply their own annotation lookup mode, which may implement stemming or any other approach to annotation lookup that the user wants.

278 |

You can add other annotation lookup modes by writing a lookup function and adding it to the dictionary samparser.annotation_lookup_modes.

279 |
280 |
281 | 282 | -------------------------------------------------------------------------------- /docsource/quickstart.sam: -------------------------------------------------------------------------------- 1 | article: SAM Quick Start 2 | 3 | <<<(menu.sami) 4 | 5 | SAM is a markup language for semantic authoring. It is not a fixed language like Markdown. It is a meta-language like XML. Markdown has a fixed set of document structures that you can use. SAM, like XML, lets you define and create your own structures to suit your own needs. Unlike XML, and like Markdown, SAM is designed to be easy to write in a plain text editor. 6 | 7 | To make it easier to write, SAM provides a set of common text structures, such as paragraphs and lists, that are shared by all SAM document types. These common structures deliberately basic. There are other document structures you will probably need that you will have to design for your self using SAM's generic structures: blocks and annotations. See the {SAM Recipes} document for some suggestions on how to create some commonly used document structures. 8 | 9 | SAM is designed for structured writing. That is, writing that obeys a specific defined structure that tells you what goes into a document and how it is expressed. A structured document isn't designed to be read directly by the reader. It must be processed with algorithms to produce readable output. But it also supports using algorithms to perform many other functions, such as guiding the writer, validating the content, linking the content, managing the content, creating new forms of content, or publishing the content to different media and for different purposes. (In this document I refer to such algorithms collectively as "the application layer".) 10 | 11 | To enable all these algorithms, the writers must obey the structure of the markup language and must embed certain items of metadata into the content for the algorithms to work with. To do this easily and effectively, the writer must be able to clearly see the structure as they are writing. That is {not the case with XML}(https://everypageispageone.com/2016/01/28/why-does-xml-suck/), whether you write it in a plain text editor or in a sophisticated XML editor that makes it look like a Word document. Verbose XML tags obscure structure and Word-like interfaces hide structure behind an approximation of final formatting. SAM is designed to make the structure of structured documents clear and explicit to the writer, while also making them simple and straightforward to type. 12 | 13 | Structured writing essentially divides content into a set of nested named blocks. SAM introduces blocks with a one word label followed by a colon. It shows the nesting of one block inside another by indenting the nested block under its parent. Here is an example of a structured document, a recipe, written in SAM: 14 | 15 | 16 | ```(sam) 17 | recipe: Hard Boiled Egg 18 | introduction: 19 | A hard boiled egg is simple and nutritious. It makes 20 | a quick and delicious addition to any breakfast. 21 | 22 | Hard boiled eggs can also be an ingredients in 23 | other dishes, such as a {Cobb Salad}(recipe). 24 | 25 | ingredients:: ingredient, quantity 26 | eggs, 12 27 | water, 2qt 28 | preparation: 29 | 1. Place eggs in pan and cover with water. 30 | 2. Bring water to a boil. 31 | 3. Remove from heat and cover for 12 minutes. 32 | 4. Place eggs in cold water to stop cooking. 33 | 5. Peel and serve. 34 | prep-time: 15 minutes 35 | serves: 6 36 | wine-match: champagne and orange juice 37 | beverage-match: orange juice 38 | nutrition: 39 | serving: 1 large (50 g) 40 | calories: 78 41 | total-fat: 5 g 42 | saturated-fat: 0.7 g 43 | polyunsaturated-fat: 0.7 g 44 | monounsaturated-fat: 2 g 45 | cholesterol: 186.5 mg 46 | sodium: 62 mg 47 | potassium: 63 mg 48 | total-carbohydrate: 0.6 g 49 | dietary-fiber: 0 g 50 | sugar: 0.6 g 51 | protein: 6 g 52 | 53 | All SAM documents consist of a single root block, in this case the block labeled "recipe". That block contains a number of other blocks with labels like "ingredients" and "nutrition". Some of these block also contain other blocks. For example, the `preparation` block contain a list block (the list is one of SAM's default block types). Some blocks may contain just a single value. These blocks are called "fields". The individual fields that make up the `nutrition` block are indented are fields. 54 | 55 | This is a semantic version of a recipe. It does not tell you how the recipe should be formatted. It does not even contain the section headings that will probably be used when the recipe is printed on paper or screen. To actually make a printed recipe, the semantic version needs to be processed by an algorithm that decides which pieces of information to publish, what order to publish them in, what titles to use for the sections, how to format the text, and which media to publish to. However, the semantic version lets you do cool stuff like make a collection of recipes with less than 80 calories per serving that can be prepared in less than 20 minutes, because it makes those pieces of information clearly available to algorithms for querying and processing. It also helps to make sure the people writing recipes provide all the information we need in the order and form we want it in. 56 | 57 | The main parts of a SAM document as as follows. 58 | 59 | 60 | section: Blocks 61 | 62 | A block is a container for other structures. In the example above, `recipe:` starts a block that contains the entire recipe document. `nutrition:` starts a block that contains the nutrition information. Blocks are delimited by indentation. Everything indented under the block tag is part of the block. Every SAM document must have a single root block that contains all the content of the document. 63 | 64 | The recipe block has a title, "Hard Boiled Egg", which is the text after the colon in `recipe:`. 65 | 66 | ```(sam) 67 | recipe: Hard Boiled Egg 68 | introduction: 69 | A hard boiled egg is simple and nutritious. It makes 70 | a quick and delicious addition to any breakfast. 71 | 72 | This is a shortcut and is exactly equivalent to this (provided there is nothing indented under the `title` tag): 73 | 74 | ```(sam) 75 | recipe: 76 | title: Hard Boiled Egg 77 | introduction: 78 | A hard boiled egg is simple and nutritious. It makes 79 | a quick and delicious addition to any breakfast. 80 | 81 | section: Fields 82 | 83 | A field is a container for a single value. Each of the lines in the `nutrition` block is a field. For example, `sugar: 0.6 g` is a field with the label `sugar` and the value `0.6 g`. A field has the same format as a block except that there is nothing indented under it. If you indent something under a field it will become a block and its value will become the title of the block. 84 | 85 | section: Paragraphs 86 | 87 | A paragraph is a string of text. Paragraphs may run over several lines. A paragraph ends with a blank line or when the block it belongs to ends (which means when next line is less indented). A paragraph is a kind of block but it does not require a label. Its block name is `p` but this is completely implicit. Paragraphs may not have children. That is, you can't indent anything under a paragraph. 88 | 89 | section: Lists 90 | 91 | You can create ordered lists by using numbers followed by a period to start each line of the list. 92 | 93 | ```(sam) 94 | 1. Place eggs in pan and cover with water. 95 | 2. Bring water to a boil. 96 | 3. Remove from heat and cover for 12 minutes. 97 | 4. Place eggs in cold water to stop cooking. 98 | 5. Peel and serve. 99 | 100 | Unordered lists can be created the same way using asterisks. 101 | 102 | ```(sam) 103 | * Dog 104 | * Cat 105 | * Monkey 106 | 107 | Lists are blocks, as are each item in the list. The content of a list item is a paragraph. Therefore the equivalent XML structure to the SAM structure above is: 108 | 109 | ```(xml) 110 |
    111 |
  • 112 |

    Dog

    113 |
  • 114 |
  • 115 |

    Cat

    116 |
  • 117 |
  • 118 |

    Monkey

    119 |
  • 120 |
121 | 122 | Since paragraphs cannot have children, when you create a list following a paragraph, the number or asterisk that starts the list items must be at the same indent level as the paragraph above. 123 | 124 | Lists can be nested: 125 | 126 | ```(sam) 127 | 1. Cats 128 | * Tabby 129 | * Manx 130 | * Siamese 131 | 2. Dogs 132 | * Collie 133 | * Beagle 134 | * Spaniel 135 | 136 | You can create more than one paragraph in a list item by indenting the second paragraph to match the indent of the first one: 137 | 138 | ```(sam) 139 | 1. Cats are fluffy. 140 | 141 | Dogs are silly. 142 | 143 | Cows fly over the moon. 144 | 145 | 2. Rocks are hard. 146 | 147 | Earth is round. 148 | 149 | The moon is made of green cheese. 150 | 151 | section: Phrases: 152 | A phrase is a piece of text within a paragraph that you want to apply metadata to. A phrase is marked up with curly braces. In the recipe example, the words "Cobb Salad" are marked up as a phrase: `{Cobb Salad}`. We mark up phrases so that we can add metadata to them with {annotations} and {citations}. 153 | 154 | section: Characters 155 | 156 | SAM documents are {Unicode} documents encoded in {UTF-8}. You can therefore enter any character you like as literal text. However, there are two circumstances where you may need to enter characters in a different way: 157 | 158 | * You want to enter a character, such as an open curly brace, that is a SAM markup character. 159 | 160 | * You want to enter a character that is not on your keyboard or that is hard to type. 161 | 162 | To enter a character that is a SAM markup character, precede it with a backslash: 163 | 164 | ```(sam) 165 | The curly brace \{ is not a commonly used character. 166 | 167 | Note that characters such as the curly brace are only recognized as markup in SAM if they form part of a complete SAM markup structure. Because there is no closing curly brace in the above sentence, the curly brace by itself would still be recognized as text. However, using the backslash character escape is probably still a good idea to prevent accidentally creating markup sequences without realizing it. 168 | 169 | To enter characters that are not on your keyboard, you can either type unicode character using whatever keyboard sequence your OS provides or you can enter an XML/HTML style character entity. For example, to enter the symbol for a British pound, you can use the `£` character entity. 170 | 171 | ```(sam) 172 | The doggy in the window costs £5.00. 173 | 174 | SAM recognizes all of the HTML character entities. (Note, however, that the recognition of HTML character entities could vary based on the SAM parser used. The current SAM Parser uses the Python html library to do character entity decoding. Running the parser with a different version of the html library might result in a different set of named entities being recognized.) 175 | 176 | You can also use XML style numeric and hexadecimal character entities: 177 | 178 | ```(sam) 179 | The doggy in the window costs £5.00. 180 | 181 | and\: 182 | 183 | ```(sam) 184 | The doggy in the window costs £5.00. 185 | 186 | Note that the use of character entities is an alternative to using backslash character escapes, so you can do: 187 | 188 | 189 | ```(sam) 190 | The curly brace { is not a commonly used character. 191 | 192 | instead of 193 | 194 | ```(sam) 195 | The curly brace \{ is not a commonly used character. 196 | 197 | To enter a literal backslash into your text you can either do: 198 | 199 | ```(sam) 200 | \\ 201 | 202 | or 203 | 204 | ```(sam) 205 | \ 206 | 207 | or, if the backslash is not preceding a SAM markup character, you can just enter it alone: 208 | 209 | ```(sam) 210 | \ 211 | 212 | To enter a literal character escape sequence, you can escape the leading `&` either with a backslash: 213 | 214 | ```(sam) 215 | \£ 216 | 217 | or with a character escape: 218 | 219 | ```(sam) 220 | &pound; 221 | 222 | section: Annotations 223 | An annotation is a way of adding descriptive labels or metadata to individual phrases within a text. Annotations can cover anything from formatting (bold, italic) to linking to semantic annotation. Annotations are applied to phrases. In the example above, the phrase "Cobb Salad" has an annotation "recipe" (`{Cobb Salad}(recipe)`) which tells us that the phrase is the name of a recipe. There is no fixed set of annotations in SAM. It is up to you to determine what annotations you need for your content. 224 | 225 | Annotations have three parts, the second and third parts being optional. 226 | 227 | The first part is the type. It asserts that the annotated phrase contains a certain type of information or performs a certain type of role in the document. 228 | 229 | ```(sam) 230 | {John Wayne}(actor) plays an ex-Union Colonel out for revenge. 231 | 232 | This annotation asserts that the phrase "John Wayne" is the name of an actor (its type). 233 | 234 | The second part is the "specifically" attribute. Sometimes the meaning of the phrase is not clear from the words of the phrase itself. In this case you can make the meaning clear using the "specifically" attribute, which is contained in quotation marks after the type: 235 | 236 | ```(sam) 237 | {The Duke}(actor "John Wayne") plays an ex-Union Colonel out for revenge. 238 | 239 | In this case the phrase itself is not the canonical name of the actor it refers to, so we add the specifically attribute `"John Wayne"` to clarify the meaning of the annotated phrase. 240 | 241 | The third part of the annotation is the namespace attribute. In some cases the type and specifically attributes may not be enough to uniquely identify the phrase being annotated because the same phrase, even it its canonical form, my refer to the same type of object in another context. In this case we can use the namespace attribute, which is contained in a second set of parentheses, to specify the context in which the annotated phrase should be understood. 242 | 243 | ```(sam) 244 | {The Duke}(actor "John Wayne" (SAG)) plays an ex-Union Colonel out for revenge. 245 | 246 | In this case, the names of American actors are governed by the Screen Actors Guild. It is possible there could be another actor in another jurisdiction who also uses the name "John Wayne". So here we specify that the name, as we are using it, belongs to the namespace of actors that is governed by SAG. 247 | 248 | section: Annotation chaining 249 | You cannot markup one phrase inside another phrase, however, you can add more than one annotation to a phrase. To add additional annotations, simply follow one with another without any space between them: 250 | 251 | ```(sam) 252 | {Clint Eastwood}(actor)(director) stared in and directed {Gran Torino}(movie). 253 | 254 | section: Annotation lookup 255 | You do not have to repeat the full annotation of a phrase that occurs multiple times in a document. If you have already annotated a phrase once in a document, you can simply mark it as a phrase (by wrapping it in curly braces) and the parser will look back through the document for the last time that phrase was annotated and copy the most recent annotation to the current phrase. In no annotation of that phrase is found, the parser will raise a warning. This is informational only, it is not an error to have an unannotated phrase. 256 | 257 | section: Links 258 | Links are a form of annotation. As a meta-language, SAM does not support links itself. It is up to you to decide if your markup language includes support for links or not. However, SAM does reserve the annotation type `link` for creating links, and has a shortcut for creating links, which looks like this: 259 | 260 | ```(sam) 261 | {Cobb Salad}(http://allrecipes.com/recipe/14415/cobb-salad/) 262 | 263 | That is, if the first thing in the annotation is recognized as a URL, it is assumed to be a link annotation pointing to that URL. This the example above is equivalent to: 264 | 265 | ```(sam) 266 | 267 | {Cobb Salad}(link "http://allrecipes.com/recipe/14415/cobb-salad/") 268 | 269 | section: Bold and Italic 270 | 271 | You can indicate bold and italic text with annotations, like this: 272 | 273 | ```(sam) 274 | This text {is in bold}(bold) type. 275 | 276 | This text {is in italic}(italic) type. 277 | 278 | However, SAM provides shortcuts for bold and italic, as follows: 279 | 280 | ```(sam) 281 | This text *is in bold* type. 282 | 283 | This text _is in italic_ type. 284 | 285 | These forms are equivalent, but notice that the shortcut forms don't support the other annotation attributes and that you cannot chain other annotations with the shortcuts. Thus if you wanted to annotate a book title as both italic and a title, you would have to do it like this: 286 | 287 | ```(sam) 288 | {Moby Dick}(italic)(book-title) is a long book. 289 | 290 | (But note that you should not actually mark up book titles like this as the fact that the phrase is marked up as a book-title is enough to tell the formatting algorithm to format it in italic.) 291 | 292 | Also note that the bold and italic shortcuts cannot be nested. If you want to mark some text as both bold and italic, you have to use regular annotations: 293 | 294 | ```(sam) 295 | This information is _really_, *really*, {really}(bold)(italic) important. 296 | 297 | 298 | section: Attributes 299 | SAM does not have a general attribute mechanism like XML. XML attributes are one of the things that make it hard to write in, even in a structured editor. However, SAM does provide some basic attributes that are commonly needed for content management purposes. 300 | 301 | You can attach attributes to blocks and block-like things and to phrases. An attribute looks like an annotation, in the sense that it is contained in parentheses, but attributes as distinguished by an initial character the specifies their type. Thus a condition starts with a question mark `(?deluxe)`. 302 | 303 | To attach an attribute to a block, place it directly after the colon of the block header: 304 | 305 | ```(sam) 306 | note:(?basic) 307 | Do not immerse your Basic model Widget in water. Only higher 308 | trim levels are waterproof. 309 | 310 | 311 | The supported attribute types are: 312 | 313 | |ID| An identifier for the block or field. An ID must be unique in the document. An ID is preceded by an asterisk: `*my-id`. 314 | 315 | |Name| A name also identifies a block or field but it has wider scope. Names are used when you want to identify something across a set of documents. SAM does not check that names are unique across a set of documents (it has no way of knowing what that set might be). Resolution of names is up to the processing software. A name is preceded by a `#` symbol: `#my-name`. 316 | 317 | |Condition| Specifying a condition token is a way of telling the publishing software to include this block only when a certain condition is true. How the publishing software determines if a condition is true is entirely up to the application layer. A condition attribute is preceded by a `?`: `?my-condition`. 318 | 319 | |Language| You can specify the language of a block using a language attribute. The language attribute should a {W3C language tag}(https://www.w3.org/International/articles/language-tags/). It is preceded by a `!`: `!en-CA`. 320 | 321 | You can apply multiple attributes to a block: 322 | 323 | ```(sam) 324 | note:(*note.waterproof)(?basic) 325 | Do not immerse you Basic model Widget in Water. Only higher 326 | trim levels are waterproof. 327 | 328 | You can only apply one each of the {id}, {name}, and {language} {attributes}, but you can apply as many condition attributes to a {block} as you need. 329 | 330 | Note that if you wish to add other forms of management domain metadata to your blocks you can do so using fields within the block. SAM's set of management domain annotations are just a convenience feature. You can add any management domain metadata you like using regular fields. As a semantic format, there is nothing in SAM that says that all of the content of blocks or fields has to appear in print. Blocks and fields can contain any kind of data or metadata you like. It is up to the publishing algorithms to decide which content to publish and which to use for management purposes. 331 | 332 | 333 | section: Citations 334 | Citations are used to refer to another resource. The expectation is that on output the citation markup will be replaced by text that creates a reference to that resource. (Note that this is different from an annotation which might lead to a link being created on a phrase, but will not change the content of the phrase.) 335 | 336 | Citations come in two forms. Citations of internal names/ids, and citations of external resources. Citations can be applied to: 337 | 338 | * an arbitrary point in a paragraph 339 | 340 | * a phrase 341 | 342 | * a blockquote 343 | 344 | To cite a resource that has an id within the current SAM document, reference the id like this: 345 | 346 | ```(sam) 347 | Moby Dick is about a big fish[*moby]. See [*whale]. 348 | 349 | fig:(*whale) 350 | >>(image whale.png) 351 | 352 | footnote:(*moby) 353 | Actually, Moby Dick is a whale, not a fish. 354 | 355 | (Note that `fig` and `footnote` are not part of SAM itself. They are block types in a particular language defined using SAM. Whether or not your SAM-based tagging language supports `fig` or `footnote`, and what they mean in that language, is entirely up to you. This is just an example of how you might use citations in your tagging language.) 356 | 357 | To cite a named resource within your content set, reference the name like this: 358 | 359 | ```(sam) 360 | Moby Dick[#MobyDick] is about a big fish. 361 | 362 | Or like this: 363 | 364 | ```(sam) 365 | {Moby Dick}[#MobyDick] is about a big fish. 366 | 367 | The advantage of applying the citation to a phrase is that it allows 368 | you to turn the citation into a link in online content. 369 | 370 | This example is using a name citation as a means of referencing a bibliographical entry that could be used by a publishing algorithm to build a citation in the desired format on output. This assumes that there is a bibliographic entry with the named `MobyDick` somewhere in your content set. 371 | 372 | To cite an external work without referring to another resource in your content set, use a standard reference syntax. 373 | 374 | ```(sam) 375 | Moby Dick[Melville, 1851] is about a big fish. 376 | 377 | 378 | SAM does not attempt to decode the format of this style of citation. It just delivers it as a string to the application layer. 379 | 380 | The application layer is responsible for all processing of references. Since the syntax does not make any distinction between the types of references being made (`*foo` could be the name of any block, such as a graphic, a footnote, or a bibliographic entry) the presumed semantics of a tagging language that uses citations is that the treatment of the citation is based on the type of object that the citation references, not the type of the citation. Thus in the example above, `[*whale]` is a figure reference because the block with the ID `*whale` is a figure and `[*moby]` is a footnote reference because the block with the ID `*moby` is a footnote. 381 | 382 | You can chain {citations} the same way you chain {annotations} and you can also chain citations and annotations together. 383 | 384 | 385 | section: Record sets 386 | 387 | A record set is a kind of table. But because SAM is designed for semantic 388 | authoring, record sets are designed more like a database table than 389 | a formatted table. The content in a record set could be presented in a 390 | table, but it could also be presented in other ways or queried like a 391 | database table. 392 | 393 | A record set consists of records, one per line. A record is a 394 | set of field values separated by commas. Each record in a record set has the 395 | same set of fields of the same type, like a record in a database table. The names of 396 | the fields are specified in the record set header. In the recipe example, the ingredients are specified using a record set, with each ingredient forming a record. 397 | 398 | ```(sam) 399 | ingredients:: ingredient, quantity 400 | eggs, 12 401 | water, 2qt 402 | 403 | The record set label is `ingredients::` and it is marked with two colons instead of one. The field names follow the two colons, separated by commas. Each line that follows contains a record of an ingredient, with the fields separated by commas. If the above were written using regular blocks and fields rather than a record set, it would look like this: 404 | 405 | ```(sam) 406 | ingredients: 407 | row: 408 | ingredient: eggs 409 | quantity: 12 410 | row: 411 | ingredient: water 412 | quantity: 2qt 413 | 414 | You could use record sets to create a simple table layout: 415 | 416 | ```(sam) 417 | table:: cell, cell 418 | eggs, 12 419 | water, 2qt 420 | 421 | This is equivalent to: 422 | 423 | ```(sam) 424 | table: 425 | row: 426 | cell: eggs 427 | cell: 12 428 | row: 429 | cell: water 430 | cell: 2qt 431 | 432 | However, here the semantics of the content has been lost. Record sets exist to 433 | allow you to retain and express the semantics of tabular data. 434 | 435 | section: Grids 436 | 437 | There is another way to do simple table layouts, using grids. A grid looks like this: 438 | 439 | 440 | 441 | ```(sam) 442 | +++ 443 | eggs | 12 444 | water | 2qt 445 | 446 | This is equivalent to a record set in the form: 447 | 448 | ```(sam) 449 | grid:: cell, cell 450 | eggs, 12 451 | water, 2qt 452 | 453 | Which is equivalent to blocks and fields like this: 454 | 455 | ```(sam) 456 | grid: 457 | row: 458 | cell: eggs 459 | cell: 12 460 | row: 461 | cell: water 462 | cell: 2qt 463 | 464 | However, using grids can make it easier to read small table-like structures in the SAM source. 465 | 466 | There are no advanced table features like table heads or row and column spanning in grids. SAM is intended for semantic authoring rather than complex layout effects. For suggestions on how to handle complex tables in SAM, see the {SAM Recipes} document. 467 | 468 | section: Code Blocks 469 | A code block is a piece of computer code or perhaps the text of a terminal display. Code blocks are usually presented literally as written, often in a fixed space font and with line breaks in the same place as in the source. Most code blocks are in a specific programming or data structure language which the processing routine may need to know to do things like color coding the syntax. In SAM, a codeblock is introduced with three back ticks. (````````). 470 | 471 | 472 | 473 | ```(sam) 474 | ```(python) 475 | for i in range(1,10): 476 | print("Hello World " + str(i)) 477 | 478 | The code block must be intended under the codeblock header (the three backticks). The indentation of the code in the code block will be calculated based on the least indented line. Thus in the example above, the line `for i in range(1,10):` will be treated as having an indent of 0, and the line `print("Hello World " + str(i))` will be treated as having an indent of 4. 479 | 480 | The language of the codeblock is given in the annotation immediately following the three back ticks of the header. 481 | 482 | The content of a code block is not processed as SAM markup. This means that you do not have to escape any of the characters in your code sample. It also means that you can quote SAM markup itself without it being parsed as SAM markup. (Every example in this document uses exactly this technique.) 483 | 484 | section: Inline code 485 | 486 | Inline code is contained between backticks: 487 | 488 | ```(sam) 489 | Python uses the `print()` function to print output. 490 | 491 | Note that this is not a generic monospaced type decoration. It is intended specifically for code. Text in a code decoration is not parsed the way ordinary SAM markup is parsed. Character escapes are not recognized. The text is presented verbatim. Thus if you want to insert the code for a character entity, you can do it like this: 492 | 493 | ```(sam) 494 | In XML use `"` to enter a & character. 495 | 496 | The only character escaping that is done in inline code is for the back tick symbol. To enter a literal back tick into inline code, use two back ticks in a row. 497 | 498 | ```(sam) 499 | Sam creates inline code like this: ```"```. 500 | 501 | This will render as: 502 | 503 | """ 504 | Sam creates inline code like this: ```"```. 505 | 506 | Inline code can be annotated in the same way as a codeblock, so if you want to specify the language of a piece of inline code you can do this: 507 | 508 | ```(sam) 509 | Python uses the `print()`(python) function to print output. 510 | 511 | section: Blockquotes 512 | 513 | A block quote is a quotation from another document that is set apart from the main text. In SAM, a block quote is introduced by three quotation marks in a row. You can use either double or single quotation marks. The body of the block quote is indented under the block quote header: 514 | 515 | ```(sam) 516 | 517 | As Lewis Carroll observed: 518 | 519 | """[Lewis Carroll, Alice in Wonderland] 520 | Why, sometimes I've believed as many as six 521 | impossible things before breakfast. 522 | 523 | The content of a block quote is regular SAM markup and is processed just like any other SAM markup. 524 | 525 | 526 | section: Lines 527 | A line is a piece of text with a fixed line ending. Poetry, for example, is a set of lines. Normally, SAM runs the lines of a paragraph together and leaves it up to the publishing software to determine line breaks at publishing time. To preserve the line breaks in your text, use lines. In SAM, lines are created by preceding each line with a pipe character followed by a space (the space is important!): 528 | 529 | ```(sam) 530 | Jabberwocky is a nonsense poem by Lewis Carroll. 531 | 532 | """[Lewis Carroll, Through the Looking-Glass, and What Alice Found There (1871)] 533 | 534 | | 'Twas brillig, and the slithy toves 535 | | Did gyre and gimble in the wabe; 536 | | All mimsy were the borogoves, 537 | | And the mome raths outgrabe. 538 | 539 | section: Inserts 540 | 541 | An insert is an instruction to the {application layer} to insert a resource into the document output. 542 | 543 | Inserts may be created at the {block} level or inside a paragraph or field value. At the block level, and insert is placed on a line by itself and is indicated by three greater-than signs: 544 | 545 | ```(sam) 546 | >>>(image foo.png) 547 | 548 | Inside a paragraph or field value, an insert is indicated by a single greater-than followed the the identification of the resource in parentheses: 549 | 550 | ```(sam) 551 | My favorite flavor of ice cream is >($favorite-flavor). 552 | 553 | The resource to be inserted may be identified either by type and url as in the block example above or by reference to an id, name, fragment, variable, or key (see below for information on variables and keys) as in the inline example, which inserts the value of a variable. 554 | 555 | When a insert in indicated by URL, the type of the insert must be indicated (as it is with the word `image`) in the example above. SAM reserves the following instert type names: image, video, audio, feed, app, and object. You can also add your own insert type names. 556 | 557 | Note that SAM does not process inserts. That is entirely up to the applications layer. SAM just provides standardized syntax for specifying an insert. 558 | 559 | You can also assign names, conditions, or ids to an insert. 560 | 561 | ```(sam) 562 | >>>(image fancy.png)(?model=deluxe) 563 | 564 | Remember that it is up to the application layer to implement such conditions. 565 | 566 | section: Includes 567 | 568 | You can include one SAM file in another. The included file must be a complete SAM file. Its structured is included in the structure of the included file at the indent level of the include statement. 569 | 570 | ```(sam) 571 | <<<(foo.sam) 572 | 573 | Unlike inserts, which are simply parsed by the parser and passed on to the application layer for processing, includes are executed by the parser. The result of the include is presented to the application layer as part of the parsed document. 574 | 575 | The ID uniqueness constraint that applies to individual SAM document also applies to included files. The IDs must be unique across the entire document parsed from the source file and any included files. 576 | 577 | Includes cannot be made conditional, since conditions are parsed by the application layer, not the parser. If you want a conditional include in your tagging language, add a field for this purpose to your tagging language. 578 | 579 | ```(sam) 580 | my-include:(?bar) foo.sam 581 | 582 | Naturally, this include must be processed by the application layer. 583 | 584 | For similar reasons, an include cannot have a {name}, {id}, or {language} attribute, since it does not produce an artifact in the output. If you need to apply any of these things to the included content, you can wrap the include statement in another structure such as a {block} or a {fragment} and apply the attributes to that. This will result in the included content being wrapped in that block or fragment in the output where the application layer can deal with it appropriately. 585 | 586 | section: Variables 587 | 588 | You can define a variable using the form: 589 | 590 | ```(sam) 591 | $foo=bar 592 | 593 | The value of a variable may contains annotations and/or citations. 594 | 595 | To insert the value of a variable, use an insert (block level or inline) with the variable name: 596 | 597 | ```(sam) 598 | >($title) 599 | 600 | Note that the SAM parser does not resolve variables. The variable definitions and variable insert instructions are passed through to the application layer and must be resolved by the processing application. This allows the processing application to determine the scope within which variable names will be resolved. --------------------------------------------------------------------------------