├── .idea
└── .gitignore
├── hello_world.py
├── hello_world_tangler.py
├── tangledown_kernel
├── kernel.json
└── tangledown_kernel.py
├── tangleup-roundtrip-test.sh
├── hello_world.md
├── check_internal_anchor_links.py
├── .gitignore
├── bootstrap_tangledown.py
├── tangledown.py
└── README.md
/.idea/.gitignore:
--------------------------------------------------------------------------------
1 | # Default ignored files
2 | /shelf/
3 | /workspace.xml
4 |
--------------------------------------------------------------------------------
/hello_world.py:
--------------------------------------------------------------------------------
1 | def hello_world():
2 | print("Hello, world, from Tangledown!")
3 | if __name__ == "__main__":
4 | hello_world()
5 |
--------------------------------------------------------------------------------
/hello_world_tangler.py:
--------------------------------------------------------------------------------
1 | from tangledown import get_lines, accumulate_lines, tangle_all
2 | tangle_all(*accumulate_lines(*get_lines("hello_world.md")))
3 |
--------------------------------------------------------------------------------
/tangledown_kernel/kernel.json:
--------------------------------------------------------------------------------
1 | {"argv":["python","-m","tangledown_kernel", "-f", "{connection_file}"],
2 | "display_name":"Tangledown"
3 | }
4 |
5 |
6 |
--------------------------------------------------------------------------------
/tangleup-roundtrip-test.sh:
--------------------------------------------------------------------------------
1 | python tangleup_experiment.py
2 | python tangledown.py asr_tangleup_test.md
3 | pushd examples/asr/asr
4 | lein test
5 | lein run
6 | popd
7 |
--------------------------------------------------------------------------------
/hello_world.md:
--------------------------------------------------------------------------------
1 | # TANGLE HELLO_WORLD
2 |
3 |
4 | In JupyterLab, pair this markdown file with a Jupytext notebook, open the notebook, then
5 |
6 |
7 | ## RUN ALL CELLS IN THIS NOTEBOOK
8 |
9 |
10 | A noweb tag named "hello world def" defining a function named "hello_world."
11 |
12 |
13 |
14 |
15 | def hello_world():
16 | print("Hello, world, from Tangledown!")
17 |
18 |
19 |
20 |
21 | A tangle tag that refers to the noweb tag named "hello world def" and writes a python file, "hello_world.py." That python file can be called as a script from the python command line. It can also be imported as a module, and the importing code (example below) can call the function, "hello_world," which is defined in "hello world def."
22 |
23 |
24 |
25 |
26 |
27 | if __name__ == "__main__":
28 | hello_world()
29 |
30 |
31 |
32 |
33 | Tangle the code out of this here Markdown file using tangledown as a module.
34 |
35 | ```python
36 | from tangledown import get_lines, accumulate_lines, tangle_all
37 | tangle_all(*accumulate_lines(*get_lines("hello_world.md")))
38 | ```
39 |
40 | Call the "hello_world" function imported from the "hello_world" module.
41 |
42 | ```python
43 | import hello_world
44 | hello_world.hello_world()
45 | ```
46 |
47 | Isn't that cool?
48 |
49 |
50 | Well, hell, let's bootstrap tangledown itself from "README.md." This is how you bootstrap a compiler.
51 |
52 | ```python
53 | tangle_all(*accumulate_lines(*get_lines("README.md")))
54 | ```
55 |
56 | Do it again to make sure it all worked!
57 |
58 | ```python
59 | tangle_all(*accumulate_lines(*get_lines("README.md")))
60 | ```
61 |
62 | Hot Dayyum! Here is a deeper test that everything is ok:
63 |
64 | ```python
65 | from tangledown import test_re_matching
66 | test_re_matching(*get_lines("README.md"))
67 | ```
68 |
69 | ```python
70 |
71 | ```
72 |
--------------------------------------------------------------------------------
/tangledown_kernel/tangledown_kernel.py:
--------------------------------------------------------------------------------
1 | from ipykernel.ipkernel import IPythonKernel
2 | from pprint import pprint
3 | import sys # for version_info
4 | from pathlib import Path
5 | from tangledown import \
6 | accumulate_lines, \
7 | get_lines, \
8 | expand_tangles
9 |
10 |
11 | class TangledownKernel(IPythonKernel):
12 | current_victim_filepath = ""
13 | with open(Path.home() / '.tangledown/current_victim_file.txt') as v:
14 | fp = v.read()
15 | tracer_, nowebs, tangles_ = accumulate_lines(*get_lines(fp))
16 | implementation = 'Tangledown'
17 | implementation_version = '1.0'
18 | language = 'no-op'
19 | language_version = '0.1'
20 | language_info = { # for syntax coloring
21 | "name": "python",
22 | "version": sys.version.split()[0],
23 | "mimetype": "text/x-python",
24 | "codemirror_mode": {"name": "ipython", "version": sys.version_info[0]},
25 | "pygments_lexer": "ipython%d" % 3,
26 | "nbconvert_exporter": "python",
27 | "file_extension": ".py",
28 | }
29 | banner = "Tangledown kernel - expanding 'block' tags"
30 |
31 |
32 | async def do_execute(self, code, silent, store_history=True, user_expressions=None,
33 | allow_stdin=False):
34 | if not silent:
35 | cleaned_lines = [line + '\n' for line in code.split('\n')]
36 | # HERE'S THE BEEF!
37 | expanded_code = expand_tangles(None, [cleaned_lines], self.nowebs)
38 | reply_content = await super().do_execute(
39 | expanded_code, silent, store_history, user_expressions)
40 | stream_content = {
41 | 'name': 'stdout',
42 | 'text': reply_content,
43 | }
44 | self.send_response(self.iopub_socket, 'stream', stream_content)
45 | return {'status': 'ok',
46 | # The base class increments the execution count
47 | 'execution_count': self.execution_count,
48 | 'payload': [],
49 | 'user_expressions': {},
50 | }
51 | if __name__ == '__main__':
52 | from ipykernel.kernelapp import IPKernelApp
53 | IPKernelApp.launch_instance(kernel_class=TangledownKernel)
54 |
55 |
56 |
--------------------------------------------------------------------------------
/check_internal_anchor_links.py:
--------------------------------------------------------------------------------
1 | from typing import List, Dict, Set
2 | import re, sys
3 | from pprint import pprint
4 |
5 | filename = "literate-copperhead-003.md"
6 |
7 | anchor_re = r'.*'
8 | ref_re = r'\[(.*?)\]\(#(.*?)\)'
9 |
10 |
11 | def find_anchors(filename: str) -> Set[str]:
12 | anchor_list = []
13 | with open(filename) as f:
14 | lines = f.readlines()
15 | index: int = 0
16 | for line_no, line in enumerate(lines):
17 | mm: List = re.findall(anchor_re, line)
18 | if mm:
19 | index += len(mm)
20 | for m in mm:
21 | anchor_list.append(m)
22 | anchor_set = set(anchor_list)
23 | listlen = len(anchor_list)
24 | dupes_exist = (not (listlen == len(anchor_set)))
25 | print(f'THERE ARE '
26 | f'{"NO" if not dupes_exist else ""}'
27 | f' DUPLICATES AMONGST {listlen} ANCHORS')
28 | if dupes_exist:
29 | dupes_list = []
30 | dupes = set()
31 | for a in anchor_list:
32 | if a in dupes:
33 | dupes_list.append(a)
34 | else:
35 | dupes.add(a)
36 | print(f'THE DUPES ARE:')
37 | pprint(dupes_list)
38 | return anchor_set
39 |
40 |
41 | def match_refs(filename: str, anchors: Set):
42 | with open(filename) as f:
43 | lines = f.readlines()
44 | index = 0; matchcount = 0; failcount = 0
45 | for line_no, line in enumerate(lines):
46 | mm: list = re.findall(ref_re, line)
47 | if mm:
48 | index += len(mm)
49 | for m in mm:
50 | ref = m[1]
51 | if ref in anchors:
52 | matchcount += 1
53 | else:
54 | failcount += 1
55 | print(f'FAILED TO FIND MATCHING ANCHOR: '
56 | f'index: {index}, line_no: {line_no + 1}, ref: {ref} '
57 | f'matchcount: {matchcount}, failcount: {failcount}.')
58 | if index == matchcount:
59 | print(f'SUCCESS IN FINDING ALL MATCHING ANCHORS: ')
60 | print(f'number of refs: {index}, matchcount: {matchcount}.')
61 | else:
62 | print(f'FAILED TO FIND {failcount} ANCHORS WITH '
63 | f'{index} TRIALS AND {matchcount} SUCCESSES.')
64 |
65 |
66 | if __name__ == "__main__":
67 | filename = sys.argv[1]
68 | match_refs(filename, find_anchors(filename))
69 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # This
2 |
3 | tangledown-venv
4 |
5 | # Mac
6 |
7 | .DS_Store
8 |
9 | # Notebooks
10 |
11 | *.ipynb
12 |
13 | # Clojure
14 |
15 | target
16 | classes
17 | checkouts
18 | profiles.clj
19 | pom.xml
20 | pom.xml.asc
21 | pom.properties
22 | *.jar
23 | *.class
24 | .lein-*
25 | .nrepl-port
26 | .prepl-port
27 | .hgignore
28 | .hg/
29 |
30 | # Python
31 |
32 | # Byte-compiled / optimized / DLL files
33 | __pycache__/
34 | *.py[cod]
35 | *$py.class
36 |
37 | # C extensions
38 | *.so
39 |
40 | # Distribution / packaging
41 | .Python
42 | build/
43 | develop-eggs/
44 | dist/
45 | downloads/
46 | eggs/
47 | .eggs/
48 | lib/
49 | lib64/
50 | parts/
51 | sdist/
52 | var/
53 | wheels/
54 | share/python-wheels/
55 | *.egg-info/
56 | .installed.cfg
57 | *.egg
58 | MANIFEST
59 |
60 | # PyInstaller
61 | # Usually these files are written by a python script from a template
62 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
63 | *.manifest
64 | *.spec
65 |
66 | # Installer logs
67 | pip-log.txt
68 | pip-delete-this-directory.txt
69 |
70 | # Unit test / coverage reports
71 | htmlcov/
72 | .tox/
73 | .nox/
74 | .coverage
75 | .coverage.*
76 | .cache
77 | nosetests.xml
78 | coverage.xml
79 | *.cover
80 | *.py,cover
81 | .hypothesis/
82 | .pytest_cache/
83 | cover/
84 |
85 | # Translations
86 | *.mo
87 | *.pot
88 |
89 | # Django stuff:
90 | *.log
91 | local_settings.py
92 | db.sqlite3
93 | db.sqlite3-journal
94 |
95 | # Flask stuff:
96 | instance/
97 | .webassets-cache
98 |
99 | # Scrapy stuff:
100 | .scrapy
101 |
102 | # Sphinx documentation
103 | docs/_build/
104 |
105 | # PyBuilder
106 | .pybuilder/
107 | target/
108 |
109 | # Jupyter Notebook
110 | .ipynb_checkpoints
111 |
112 | # IPython
113 | profile_default/
114 | ipython_config.py
115 |
116 | # pyenv
117 | # For a library or package, you might want to ignore these files since the code is
118 | # intended to run in multiple environments; otherwise, check them in:
119 | # .python-version
120 |
121 | # pipenv
122 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
123 | # However, in case of collaboration, if having platform-specific dependencies or dependencies
124 | # having no cross-platform support, pipenv may install dependencies that don't work, or not
125 | # install all needed dependencies.
126 | Pipfile.lock
127 |
128 | # poetry
129 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
130 | # This is especially recommended for binary packages to ensure reproducibility, and is more
131 | # commonly ignored for libraries.
132 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
133 | poetry.lock
134 |
135 | # pdm
136 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
137 | #pdm.lock
138 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
139 | # in version control.
140 | # https://pdm.fming.dev/#use-with-ide
141 | .pdm.toml
142 |
143 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
144 | __pypackages__/
145 |
146 | # Celery stuff
147 | celerybeat-schedule
148 | celerybeat.pid
149 |
150 | # SageMath parsed files
151 | *.sage.py
152 |
153 | # Environments
154 | .env
155 | .venv
156 | env/
157 | venv/
158 | ENV/
159 | env.bak/
160 | venv.bak/
161 |
162 | # Spyder project settings
163 | .spyderproject
164 | .spyproject
165 |
166 | # Rope project settings
167 | .ropeproject
168 |
169 | # mkdocs documentation
170 | /site
171 |
172 | # mypy
173 | .mypy_cache/
174 | .dmypy.json
175 | dmypy.json
176 |
177 | # Pyre type checker
178 | .pyre/
179 |
180 | # pytype static type analyzer
181 | .pytype/
182 |
183 | # Cython debug symbols
184 | cython_debug/
185 |
186 | # PyCharm
187 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can
188 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
189 | # and can be added to the global gitignore or merged into this file. For a more nuclear
190 | # option (not recommended) you can uncomment the following to ignore the entire idea folder.
191 | .idea/
--------------------------------------------------------------------------------
/bootstrap_tangledown.py:
--------------------------------------------------------------------------------
1 | from typing import List, Dict, Tuple, Match
2 |
3 | NowebName = str
4 | FileName = str
5 | TangleFileName = FileName
6 | LineNumber = int
7 | Line = str
8 | Lines = List[Line]
9 | Liness = List[Lines]
10 | LinesTuple = Tuple[LineNumber, Lines]
11 |
12 | Nowebs = Dict[NowebName, Lines]
13 | Tangles = Dict[TangleFileName, Liness]
14 |
15 |
16 | import re
17 | import sys
18 | from pathlib import Path
19 |
20 | def get_aFile() -> str:
21 | """Get a file name from the command-line arguments
22 | or 'README.md' as a default."""
23 | print({f'len(sys.argv)': len(sys.argv), f'sys.argv': sys.argv})
24 |
25 |
26 | aFile = 'README.md' # default
27 | if len(sys.argv) > 1:
28 | file_names = [p for p in sys.argv
29 | if (p[0] != '-') # option
30 | and (p[-3:] != '.py')
31 | and (p[-5:] != '.json')]
32 | if file_names:
33 | aFile = sys.argv[1]
34 | return aFile
35 |
36 | raw_line_re: re = re.compile(r'')
37 | def get_lines(fn: FileName) -> Lines:
38 | """Get lines from a file named fn. Replace
39 | 'raw' fenceposts with blank lines. Write full path to
40 | a secret place for the Tangledown kernel to pick it up.
41 | Return tuple of file path (for TangleUp's Tracer) and
42 | lines."""
43 | def save_aFile_path_for_kernel(fn: FileName) -> FileName:
44 | xpath: Path = Path.cwd() / Path(fn).name
45 | victim_file_name = str(xpath.absolute())
46 | safepath: Path = Path.home() / '.tangledown/current_victim_file.txt'
47 | Path(safepath).parents[0].mkdir(parents=True, exist_ok=True)
48 | print(f"SAVING {victim_file_name} in secret place {str(safepath)}")
49 | with open(safepath, "w") as t:
50 | t.write(victim_file_name)
51 | return xpath
52 |
53 |
54 | xpath = save_aFile_path_for_kernel(fn)
55 | with open(fn) as f:
56 | in_lines: Lines = f.readlines ()
57 | out_lines: Lines = []
58 | for in_line in in_lines:
59 | out_lines.append(
60 | in_line if not raw_line_re.match(in_line) else "\n")
61 | return xpath, out_lines
62 |
63 |
64 | noweb_start_re = re.compile (r'^$')
65 | noweb_end_re = re.compile (r'^$')
66 |
67 | tangle_start_re = re.compile (r'^$')
68 | tangle_end_re = re.compile (r'^$')
69 |
70 |
71 | block_start_re = re.compile (r'^(\s*)')
72 | block_end_re = re.compile (r'^(\s)*')
73 |
74 |
75 |
76 | def test_re_matching(fp: Path, lines: Lines) -> None:
77 | for line in lines:
78 | noweb_start_match = noweb_start_re.match (line)
79 | tangle_start_match = tangle_start_re.match (line)
80 | block_start_match = block_start_re.match (line)
81 |
82 | noweb_end_match = noweb_end_re.match (line)
83 | tangle_end_match = tangle_end_re.match (line)
84 | block_end_match = block_end_re.match (line)
85 |
86 | if (noweb_start_match):
87 | print ('NOWEB: ', noweb_start_match.group (0))
88 | print ('name of the block: ', noweb_start_match.group (1))
89 | elif (noweb_end_match):
90 | print ('NOWEB END: ', noweb_end_match.group (0))
91 | elif (tangle_start_match):
92 | print ('TANGLE: ', tangle_start_match.group (0))
93 | print ('name of the file: ', tangle_start_match.group (1))
94 | elif (tangle_end_match):
95 | print ('TANGLE END: ', tangle_end_match.group (0))
96 | elif (block_start_match):
97 | print ('BLOCK: ', block_start_match.group (0))
98 | print ('name of the block: ', block_start_match.group (1))
99 | if (block_end_match):
100 | print ('BLOCK END SAME LINE: ', block_end_match.group (0))
101 | else:
102 | print ('BLOCK NO END')
103 | elif (block_end_match):
104 | print ('BLOCK END ANOTHER LINE: ', block_end_match.group (0))
105 | else:
106 | pass
107 |
108 |
109 | from dataclasses import dataclass, field
110 | from typing import Union ## TODO
111 | @dataclass
112 | class Tracer:
113 | trace: List[Dict] = field(default_factory=list)
114 | line_no = 0
115 | current_betweens: Lines = field(default_factory=list)
116 | fp: Path = None
117 | # First Pass
118 | def add_markdown(self, i, between: Line):
119 | self.line_no += 1
120 | self.current_betweens.append((self.line_no, between))
121 |
122 |
123 | def _end_betweens(self, i):
124 | if self.current_betweens:
125 | self.trace.append({"ending_line_number": self.line_no, "i": i,
126 | "language": "markdown", "kind": 'between',
127 | "text": self.current_betweens})
128 | self.current_betweens = []
129 |
130 |
131 | def add_noweb(self, i, language, id_, key, noweb_lines):
132 | self._end_betweens(i)
133 | self.line_no = i
134 | self.trace.append({"ending_line_number": self.line_no, "i": i,
135 | "language": language, "id_": id_,
136 | "kind": 'noweb', key: noweb_lines})
137 |
138 |
139 | def add_tangle(self, i, language, id_, key, tangle_liness):
140 | self._end_betweens(i)
141 | self.line_no = i
142 | self.trace.append({"ending_line_number": self.line_no, "i": i,
143 | "language": language, "id_": id_,
144 | "kind": 'tangle', key: tangle_liness})
145 |
146 |
147 | def dump(self):
148 | pr = self.fp.parent
149 | fn = self.fp.name
150 | fn2 = fn.translate(str.maketrans('.', '_'))
151 | # Store the trace in the dir where the input md file is:
152 | vr = f'tangledown_trace_{fn2}'
153 | np = pr / (vr + ".py")
154 | with open(np, "w") as fs:
155 | print(f'{vr} = (', file=fs)
156 | pprint(self.trace, stream=fs)
157 | print(')', file=fs)
158 |
159 |
160 | # Second Pass
161 | def add_expandedn_noweb(self, i, language, id_, key, noweb_lines):
162 | self._end_betweens(i)
163 | self.line_no = i
164 | self.trace.append({"ending_line_number": self.line_no, "i": i,
165 | "language": language, "id_": id_,
166 | "kind": 'expanded_noweb', key: noweb_lines})
167 |
168 |
169 | def add_expanded_tangle(self, i, language, id_, key, tangle_liness):
170 | self._end_betweens(i)
171 | self.line_no = i
172 | self.trace.append({"ending_line_number": self.line_no, "i": i,
173 | "language": language, "id_": id_,
174 | "kind": 'expanded_tangle', key: tangle_liness})
175 |
176 |
177 |
178 |
179 | triple_backtick_re = re.compile (r'^`[`]`((\w+)?\s*(id=([0-9a-fA-F-]+))?)')
180 | blank_line_re = re.compile (r'^\s*$')
181 |
182 | def first_non_blank_line_is_triple_backtick (
183 | i: LineNumber, lines: Lines) -> Match[Line]:
184 | while (blank_line_re.match (lines[i])):
185 | i = i + 1
186 | yes = triple_backtick_re.match (lines[i])
187 | language = "python" # default
188 | id_ = None # default
189 | if yes:
190 | language = yes.groups()[1] or language
191 | id_ = yes.groups()[3] ## can be 'None'
192 | return i, yes, language, id_
193 |
194 |
195 | def accumulate_contents (
196 | lines: Lines, i: LineNumber, end_re: re) -> LinesTuple:
197 | r"""Harvest contents of a noweb or tangle tag. The start
198 | taglet was consumed by caller. Consume the end taglet."""
199 | i, yes, language, id_ = first_non_blank_line_is_triple_backtick(i, lines)
200 | snip = 0 if yes else 4
201 | contents_lines: Lines = []
202 | for j in range (i, len(lines)):
203 | if (end_re.match(lines[j])):
204 | return j + 1, language, id_, contents_lines # the only return
205 | if not triple_backtick_re.match (lines[j]):
206 | contents_lines.append (lines[j][snip:])
207 |
208 |
209 | def anchor_is_tilde(path_str: str) -> bool:
210 | result = (path_str[0:2] == "~/") and (Path(path_str).anchor == '')
211 | return result
212 |
213 | def normalize_file_path(tangle_file_attribute: str) -> Path:
214 | result: Path = Path(tangle_file_attribute)
215 | if (anchor_is_tilde(tangle_file_attribute)):
216 | result = (Path.home() / tangle_file_attribute[2:])
217 | return result.absolute()
218 |
219 |
220 | from pprint import pprint
221 | def accumulate_lines(fp: Path, lines: Lines) -> Tuple[Tracer, Nowebs, Tangles]:
222 | tracer = Tracer()
223 | tracer.fp = fp
224 | nowebs: Nowebs = {}
225 | tangles: Tangles = {}
226 | i = 0
227 | while i < len(lines):
228 | noweb_start_match = noweb_start_re.match (lines[i])
229 | tangle_start_match = tangle_start_re.match (lines[i])
230 | if (noweb_start_match):
231 | key: NowebName = noweb_start_match.group(1)
232 | (i, language, id_, nowebs[key]) = \
233 | accumulate_contents(lines, i + 1, noweb_end_re)
234 | tracer.add_noweb(i, language, id_, key, nowebs[key])
235 | elif (tangle_start_match):
236 | key: TangleFileName = \
237 | str(normalize_file_path(tangle_start_match.group(1)))
238 | if not (key in tangles):
239 | tangles[key]: Liness = []
240 | (i, language, id_, things) = accumulate_contents(lines, i + 1, tangle_end_re)
241 | tangles[key] += [things]
242 | tracer.add_tangle(i, language, id_, key, tangles[key])
243 | else:
244 | tracer.add_markdown(i, lines[i])
245 | i += 1
246 | return tracer, nowebs, tangles
247 |
248 |
249 | def there_is_a_block_tag (lines: Lines) -> bool:
250 | for line in lines:
251 | block_start_match = block_start_re.match (line)
252 | if (block_start_match):
253 | return True
254 | return False
255 |
256 |
257 | def eat_block_tag (i: LineNumber, lines: Lines) -> LineNumber:
258 | for j in range (i, len(lines)):
259 | end_match = block_end_re.match (lines[j])
260 | # DUDE! Check leading whitespace against block_start_re
261 | if (end_match):
262 | return j + 1
263 | else: # DUDE!
264 | pass
265 |
266 |
267 | def expand_blocks (nowebs: Nowebs, lines: Lines,
268 | language: str = "python") -> Lines:
269 | out_lines = []
270 | block_key: NowebName = ""
271 | for i in range (len (lines)):
272 | block_start_match = block_start_re.match (lines[i])
273 | if (block_start_match):
274 | leading_whitespace: str = block_start_match.group (1)
275 | block_key: NowebName = block_start_match.group (2)
276 | block_lines: Lines = nowebs [block_key] # DUDE!
277 | i: LineNumber = eat_block_tag (i, lines)
278 | for block_line in block_lines:
279 | out_lines.append (leading_whitespace + block_line)
280 | else:
281 | out_lines.append (lines[i])
282 | return out_lines
283 |
284 |
285 | def expand_tangles(liness: Liness, nowebs: Nowebs) -> str:
286 | contents: Lines = []
287 | for lines in liness:
288 | while there_is_a_block_tag (lines):
289 | lines = expand_blocks (nowebs, lines)
290 | contents += lines
291 | return ''.join(contents)
292 |
293 |
294 |
295 | def tangle_all(tracer: Tracer, nowebs: Nowebs, tangles: Tangles) -> None:
296 | for filename, liness in tangles.items ():
297 | Path(filename).parents[0].mkdir(parents=True, exist_ok=True)
298 | contents: str = expand_tangles(liness, nowebs)
299 | with open (filename, 'w') as outfile:
300 | print(f"WRITING FILE: {filename}")
301 | outfile.write (contents)
302 | tracer.dump()
303 |
304 | if __name__ == "__main__":
305 | fn, lines = get_lines(get_aFile())
306 | # test_re_matching(lines)
307 | tracer, nowebs, tangles = accumulate_lines(fn, lines)
308 | tangle_all(tracer, nowebs, tangles)
309 |
310 |
311 |
--------------------------------------------------------------------------------
/tangledown.py:
--------------------------------------------------------------------------------
1 | from typing import List, Dict, Tuple, Match
2 |
3 | NowebName = str
4 | FileName = str
5 | TangleFileName = FileName
6 | LineNumber = int
7 | Line = str
8 | Lines = List[Line]
9 | Liness = List[Lines]
10 | LinesTuple = Tuple[LineNumber, Lines]
11 |
12 | Nowebs = Dict[NowebName, Lines]
13 | Tangles = Dict[TangleFileName, Liness]
14 |
15 |
16 | import re
17 | import sys
18 | from pathlib import Path
19 |
20 | def get_aFile() -> str:
21 | """Get a file name from the command-line arguments
22 | or 'README.md' as a default."""
23 | print({f'len(sys.argv)': len(sys.argv), f'sys.argv': sys.argv})
24 |
25 |
26 | aFile = 'README.md' # default
27 | if len(sys.argv) > 1:
28 | file_names = [p for p in sys.argv
29 | if (p[0] != '-') # option
30 | and (p[-3:] != '.py')
31 | and (p[-5:] != '.json')]
32 | if file_names:
33 | aFile = sys.argv[1]
34 | return aFile
35 |
36 | raw_line_re: re = re.compile(r'')
37 | def get_lines(fn: FileName) -> Lines:
38 | """Get lines from a file named fn. Replace
39 | 'raw' fenceposts with blank lines. Write full path to
40 | a secret place for the Tangledown kernel to pick it up.
41 | Return tuple of file path (for TangleUp's Tracer) and
42 | lines."""
43 | def save_aFile_path_for_kernel(fn: FileName) -> FileName:
44 | xpath: Path = Path.cwd() / Path(fn).name
45 | victim_file_name = str(xpath.absolute())
46 | safepath: Path = Path.home() / '.tangledown/current_victim_file.txt'
47 | Path(safepath).parents[0].mkdir(parents=True, exist_ok=True)
48 | print(f"SAVING {victim_file_name} in secret place {str(safepath)}")
49 | with open(safepath, "w") as t:
50 | t.write(victim_file_name)
51 | return xpath
52 |
53 |
54 | xpath = save_aFile_path_for_kernel(fn)
55 | with open(fn) as f:
56 | in_lines: Lines = f.readlines ()
57 | out_lines: Lines = []
58 | for in_line in in_lines:
59 | out_lines.append(
60 | in_line if not raw_line_re.match(in_line) else "\n")
61 | return xpath, out_lines
62 |
63 |
64 | noweb_start_re = re.compile (r'^$')
65 | noweb_end_re = re.compile (r'^$')
66 |
67 | tangle_start_re = re.compile (r'^$')
68 | tangle_end_re = re.compile (r'^$')
69 |
70 |
71 | block_start_re = re.compile (r'^(\s*)')
72 | block_end_re = re.compile (r'^(\s)*')
73 |
74 |
75 |
76 | def test_re_matching(fp: Path, lines: Lines) -> None:
77 | for line in lines:
78 | noweb_start_match = noweb_start_re.match (line)
79 | tangle_start_match = tangle_start_re.match (line)
80 | block_start_match = block_start_re.match (line)
81 |
82 | noweb_end_match = noweb_end_re.match (line)
83 | tangle_end_match = tangle_end_re.match (line)
84 | block_end_match = block_end_re.match (line)
85 |
86 | if (noweb_start_match):
87 | print ('NOWEB: ', noweb_start_match.group (0))
88 | print ('name of the block: ', noweb_start_match.group (1))
89 | elif (noweb_end_match):
90 | print ('NOWEB END: ', noweb_end_match.group (0))
91 | elif (tangle_start_match):
92 | print ('TANGLE: ', tangle_start_match.group (0))
93 | print ('name of the file: ', tangle_start_match.group (1))
94 | elif (tangle_end_match):
95 | print ('TANGLE END: ', tangle_end_match.group (0))
96 | elif (block_start_match):
97 | print ('BLOCK: ', block_start_match.group (0))
98 | print ('name of the block: ', block_start_match.group (1))
99 | if (block_end_match):
100 | print ('BLOCK END SAME LINE: ', block_end_match.group (0))
101 | else:
102 | print ('BLOCK NO END')
103 | elif (block_end_match):
104 | print ('BLOCK END ANOTHER LINE: ', block_end_match.group (0))
105 | else:
106 | pass
107 |
108 |
109 | from dataclasses import dataclass, field
110 | from typing import Union ## TODO
111 | @dataclass
112 | class Tracer:
113 | trace: List[Dict] = field(default_factory=list)
114 | line_no = 0
115 | current_betweens: Lines = field(default_factory=list)
116 | fp: Path = None
117 | # First Pass
118 | def add_markdown(self, i, between: Line):
119 | self.line_no += 1
120 | self.current_betweens.append((self.line_no, between))
121 |
122 |
123 | def add_raw(self, i, between: Line):
124 | self.line_no += 1
125 | self.current_betweens.append((self.line_no, between))
126 |
127 |
128 | def _end_betweens(self, i):
129 | if self.current_betweens:
130 | self.trace.append({"ending_line_number": self.line_no, "i": i,
131 | "language": "markdown", "kind": 'between',
132 | "text": self.current_betweens})
133 | self.current_betweens = []
134 |
135 |
136 | def add_noweb(self, i, language, id_, key, noweb_lines):
137 | self._end_betweens(i)
138 | self.line_no = i
139 | self.trace.append({"ending_line_number": self.line_no, "i": i,
140 | "language": language, "id_": id_,
141 | "kind": 'noweb', key: noweb_lines})
142 |
143 |
144 | def add_tangle(self, i, language, id_, key, tangle_liness):
145 | self._end_betweens(i)
146 | self.line_no = i
147 | self.trace.append({"ending_line_number": self.line_no, "i": i,
148 | "language": language, "id_": id_,
149 | "kind": 'tangle', key: tangle_liness})
150 |
151 |
152 | def dump(self):
153 | pr = self.fp.parent
154 | fn = self.fp.name
155 | fn2 = fn.translate(str.maketrans('.', '_'))
156 | # Store the trace in the dir where the input md file is:
157 | vr = f'tangledown_trace_{fn2}'
158 | np = pr / (vr + ".py")
159 | with open(np, "w") as fs:
160 | print(f'sequential_structure = (', file=fs)
161 | pprint(self.trace, stream=fs)
162 | print(')', file=fs)
163 |
164 |
165 | # Second Pass
166 | def add_expandedn_noweb(self, i, language, id_, key, noweb_lines):
167 | self._end_betweens(i)
168 | self.line_no = i
169 | self.trace.append({"ending_line_number": self.line_no, "i": i,
170 | "language": language, "id_": id_,
171 | "kind": 'expanded_noweb', key: noweb_lines})
172 |
173 |
174 | def add_expanded_tangle(self, i, language, id_, key, tangle_liness):
175 | self._end_betweens(i)
176 | self.line_no = i
177 | self.trace.append({"ending_line_number": self.line_no, "i": i,
178 | "language": language, "id_": id_,
179 | "kind": 'expanded_tangle', key: tangle_liness})
180 |
181 |
182 |
183 |
184 | triple_backtick_re = re.compile (r'^`[`]`((\w+)?\s*(id=([0-9a-fA-F-]+))?)')
185 | blank_line_re = re.compile (r'^\s*$')
186 |
187 | def first_non_blank_line_is_triple_backtick (
188 | i: LineNumber, lines: Lines) -> Match[Line]:
189 | while (blank_line_re.match (lines[i])):
190 | i = i + 1
191 | yes = triple_backtick_re.match (lines[i])
192 | language = "python" # default
193 | id_ = None # default
194 | if yes:
195 | language = yes.groups()[1] or language
196 | id_ = yes.groups()[3] ## can be 'None'
197 | return i, yes, language, id_
198 |
199 |
200 | def accumulate_contents (
201 | lines: Lines, i: LineNumber, end_re: re) -> LinesTuple:
202 | r"""Harvest contents of a noweb or tangle tag. The start
203 | taglet was consumed by caller. Consume the end taglet."""
204 | i, yes, language, id_ = first_non_blank_line_is_triple_backtick(i, lines)
205 | snip = 0 if yes else 4
206 | contents_lines: Lines = []
207 | for j in range (i, len(lines)):
208 | if (end_re.match(lines[j])):
209 | return j + 1, language, id_, contents_lines # the only return
210 | if not triple_backtick_re.match (lines[j]):
211 | contents_lines.append (lines[j][snip:])
212 |
213 |
214 | def anchor_is_tilde(path_str: str) -> bool:
215 | result = (path_str[0:2] == "~/") and (Path(path_str).anchor == '')
216 | return result
217 |
218 | def normalize_file_path(tangle_file_attribute: str) -> Path:
219 | result: Path = Path(tangle_file_attribute)
220 | if (anchor_is_tilde(tangle_file_attribute)):
221 | result = (Path.home() / tangle_file_attribute[2:])
222 | return result.absolute()
223 |
224 |
225 | raw_start_re = re.compile("")
226 | raw_end_re = re.compile("")
227 | from pprint import pprint
228 | def accumulate_lines(fp: Path, lines: Lines) -> Tuple[Tracer, Nowebs, Tangles]:
229 | tracer = Tracer()
230 | tracer.fp = fp
231 | nowebs: Nowebs = {}
232 | tangles: Tangles = {}
233 | i = 0
234 | while i < len(lines):
235 | noweb_start_match = noweb_start_re.match (lines[i])
236 | tangle_start_match = tangle_start_re.match (lines[i])
237 | if noweb_start_match:
238 | in_between = False
239 | key: NowebName = noweb_start_match.group(1)
240 | (i, language, id_, nowebs[key]) = \
241 | accumulate_contents(lines, i + 1, noweb_end_re)
242 | tracer.add_noweb(i, language, id_, key, nowebs[key])
243 |
244 |
245 | elif tangle_start_match:
246 | in_between = False
247 | key: TangleFileName = \
248 | str(normalize_file_path(tangle_start_match.group(1)))
249 | if not (key in tangles):
250 | tangles[key]: Liness = []
251 | (i, language, id_, things) = accumulate_contents(lines, i + 1, tangle_end_re)
252 | tangles[key] += [things]
253 | tracer.add_tangle(i, language, id_, key, tangles[key])
254 |
255 |
256 | elif raw_start_re.match (lines[i]):
257 | pass
258 |
259 |
260 | else:
261 | in_between = True
262 | tracer.add_markdown(i, lines[i])
263 | i += 1
264 |
265 |
266 | if in_between: # Close out final markdown.
267 | tracer._end_betweens(i)
268 | return tracer, nowebs, tangles
269 |
270 |
271 | def there_is_a_block_tag (lines: Lines) -> bool:
272 | for line in lines:
273 | block_start_match = block_start_re.match (line)
274 | if (block_start_match):
275 | return True
276 | return False
277 |
278 |
279 | def eat_block_tag (i: LineNumber, lines: Lines) -> LineNumber:
280 | for j in range (i, len(lines)):
281 | end_match = block_end_re.match (lines[j])
282 | # DUDE! Check leading whitespace against block_start_re
283 | if (end_match):
284 | return j + 1
285 | else: # DUDE!
286 | pass
287 |
288 |
289 | def expand_blocks (tracer: Tracer, nowebs: Nowebs, lines: Lines,
290 | language: str = "python") -> Lines:
291 | out_lines = []
292 | block_key: NowebName = ""
293 | for i in range (len (lines)):
294 | block_start_match = block_start_re.match (lines[i])
295 | if (block_start_match):
296 | leading_whitespace: str = block_start_match.group (1)
297 | block_key: NowebName = block_start_match.group (2)
298 | block_lines: Lines = nowebs [block_key] # DUDE!
299 | i: LineNumber = eat_block_tag (i, lines)
300 | for block_line in block_lines:
301 | out_lines.append (leading_whitespace + block_line)
302 | else:
303 | out_lines.append (lines[i])
304 | return out_lines
305 |
306 |
307 | def expand_tangles(tracer: Tracer, liness: Liness, nowebs: Nowebs) -> str:
308 | contents: Lines = []
309 | for lines in liness:
310 | while there_is_a_block_tag (lines):
311 | lines = expand_blocks (tracer, nowebs, lines)
312 | contents += lines
313 | return ''.join(contents)
314 |
315 |
316 |
317 | def tangle_all(tracer: Tracer, nowebs: Nowebs, tangles: Tangles) -> None:
318 | for filename, liness in tangles.items ():
319 | Path(filename).parents[0].mkdir(parents=True, exist_ok=True)
320 | contents: str = expand_tangles(tracer, liness, nowebs)
321 | with open (filename, 'w') as outfile:
322 | print(f"WRITING FILE: {filename}")
323 | outfile.write (contents)
324 | tracer.dump()
325 |
326 | if __name__ == "__main__":
327 | fn, lines = get_lines(get_aFile())
328 | # test_re_matching(lines)
329 | tracer, nowebs, tangles = accumulate_lines(fn, lines)
330 | tangle_all(tracer, nowebs, tangles)
331 |
332 |
333 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Tangledown: One-Step Literate Markdown
2 |
3 |
4 | #### Brian Beckman
5 | #### Friday, 23 Sep 2022
6 | #### v0.0.8
7 |
8 |
9 | # OVERVIEW
10 |
11 |
12 | ## WRITING MATTERS
13 |
14 |
15 | Leslie Lamport, Turing-Award Winner, 2013, said, approximately:
16 |
17 |
18 | > Writing is Nature's Way of showing you how sloppy your thinking is. Coding is Nature's Way of showing you how sloppy your writing is. Testing is Nature's Way of showing you how sloppy your coding is.
19 |
20 |
21 | > If you can't write, you can't think. If you're not writing, you only think you're thinking.
22 |
23 |
24 | In here, we will show you how to combine thinking, writing, coding, and testing in a natural way. Your code will be the central character in a narrative, a story crafted to help your readers understand _both_ what you're doing _and_ how you're doing it. Your code will be tested because you (and your readers) can run it, right here and now, inside this [Jupytext](#oh-my-jupytext) \[sic\] notebook. Your story and your code will _never_ get out of sync because you will be working with both of them all the time.
25 |
26 |
27 | ### NARRATIVE ORDER
28 |
29 |
30 | Narrative order is the natural order for a story, but it's not the natural order for interpreters and compilers, even for Jupyter kernels. Tangledown lets you write in narrative order, then, later, tangle the code out into executable order, where the definitions of the parts precede the story. That executable order is backwards and inside-out from the reader's point of view! TangleUp lets you maintain your code by rebuilding your story in narrative order from sources, in executable order, that you may have changed on disk (_TangleUp_ is abandoned. Turned out to be too difficult).
31 |
32 |
33 | Without something like this, you're condemned to explaining _how_ your code works before you can say much or anything about _what_ your code is doing. Indulge us in a little _theory of writing_, will you?
34 |
35 |
36 | ## CREATIVE WRITING 101
37 |
38 |
39 | You're writing a murder mystery.
40 |
41 |
42 | **METHOD 1**: Start with a data sheet: all the characters and their relationships. Francis stands to inherit, but Evelyn has a life-insurance policy on Victor. Bobbie is strong enough to swing an axe. Alice has poisonous plants in her garden. Charlie has a gun collection. Danielle is a chef and owns sharp knives. Lay out their schedules and whereabouts for several weeks. Finally, write down the murder, the solution, and all the loose ends your romantic detective might try.
43 |
44 |
45 | **METHOD 2**: There's a murder. Your romantic detective asks "Who knew whom? Who benefitted? Who could have been at the scene of the crime? Who had opportunity? What was the murder weapon? Who could have done it?" Your detevtive pursues all the characters and their happenstances. In a final triumph of deductive logic, the detective identifies the killer despite compelling and ultimately distracting evidence to the contrary.
46 |
47 |
48 | If your objective is to _engage_ the audience, to motivate them to unravel the mystery and learn the twists and turns along the way, which is the better method? If your objective is to have them spend several hours wading through reference material, trying to guess where this is all going, which is the better method? If your objective is to organize your own thoughts prior to weaving the narrative, how do you start?
49 |
50 |
51 | Now, you're writing about some software.
52 |
53 |
54 | **METHOD 1**: Present all the functions and interfaces, cross dependencies, asynchronous versus synchronous, global and local state variables, possibilities for side effects. Finally, present unit tests and a main program.
55 |
56 |
57 | **METHOD 2**: Explain the program's rationale and concept of operation, the solution it delivers, its modes and methods. Present the unit tests and main program that fulfill all that. Finally, present all the functions, interfaces, and procedures, all the bits and bobs that could affect and effect the solution.
58 |
59 |
60 | If your objective is to _engage_ your audience, to have them understand the software deeply, as if they wrote it themselves, which is the better method? If your objective is to have them spend unbounded time wading through reference material trying to guess what you mean to do, which is the better method?
61 |
62 |
63 | ## SOFTWARE AS DOCUMENTATION
64 |
65 |
66 | Phaedrus says:
67 |
68 |
69 | > I give good, long, descriptive names to functions and parameters to make my code readable. I use Doxygen and Sphinx to automate document production. _I'm a professional!_
70 |
71 |
72 | And Socrates says:
73 |
74 |
75 | > That's nice, but you only document the pieces, and say nothing about how the pieces fit together. It's like giving me a jigsaw puzzle without the box top. It's almost sadistic.
76 |
77 | > You condemn me to reverse-engineering your software: to running it in a debugger or to tracing logs and printouts.
78 |
79 |
80 | ## LITERATE PROGRAMMING
81 |
82 |
83 | [Literate Programming](https://en.wikipedia.org/wiki/Literate_programming) is the best known way to save your _audience_ the work of reverse engineering your code, of giving them the box top with the jigsaw puzzle.
84 |
85 |
86 | Who is your audience?
87 |
88 | - yourself, first, down the line, trying to remember "why the heck did I do _that_?!?"
89 |
90 | - other programmers, eventually, when they take over maintaining and extending your code
91 |
92 |
93 | ## IMPERATIVES
94 |
95 |
96 | > First, write a paper about your code, explaining, at least to yourself, what you want to do and how you plan to do it. Flesh out your actual code inside the paper. RUN your code inside the paper, capturing printouts and charts and diagrams and what not, so others, including, first, your future self, can see the code at work. Iterate the process, rewriting prose and code as you invent and simplify, in a loop.
97 |
98 |
99 | ## THAT'S JUST JUPYTER, RIGHT?
100 |
101 |
102 | > Ok, that's just ordinary Jupyter-notebook practice, right? Code inside your documentation, right? Doxygen and Sphinx inside-out, right?
103 |
104 |
105 | Notebooks solve the ***inside-out problem***, but ordinary programming is _both_ inside out _and_ upside-down from literate programming. Literate Programming solves the upside-down problem.
106 |
107 |
108 | With ordinary notebook practice, you write everything in ***executable order***, because a Jupyter notebook is just an interface to an execution kernel. Notebooks inherit the sequential constraints of the underlying intrepreter and compiler. Executable order usually forces us to define all details before using them. With Literate Programming, you write in ***narrative order***, much more understandable to humans.
109 |
110 |
111 | Executable order is usually the reverse of narrative order. Humans want to understrand the big picture *first*, then the details. They want to see the box-top of the jigsaw puzzle _before_ looking at all the pieces. Executable order is ***upside-down*** to the human's point-of-view.
112 |
113 |
114 | We've all had the experience of reading code and notebooks backwards so we don't get overwhelmed by details before understanding the big picture. That observation leads us to another imperative.
115 |
116 |
117 | > Write about your code in narrative order. Don't be tyrranized by your programming language into defining everything before you can talk about anything. Use tools to rearrange your code in executable order.
118 |
119 |
120 | [Donald Knuth](http://amturing.acm.org/award_winners/knuth_1013846.cfm) invented Literate Programming so that he could both write _about_ [MetaFont](https://en.wikipedia.org/wiki/Metafont) and [TeX](https://texfaq.org/FAQ-lit) and _implement_ them in the same place. These are two of the most important computer programs ever written. Their longevity and quality testify to the viability of Literate Programming.
121 |
122 |
123 | ## TANGLEDOWN IS HERE, INSIDE THIS README.md
124 |
125 |
126 | ***Tangledown*** is the tool that rearranges code from any Markdown file into executable order. This here document, README.md, the one you're reading right now, is the Literate Program for Tangledown.
127 |
128 |
129 | Because our documentation language is Markdown, the language of this document is _Literate Markdown_. This here README.md, which you're reading right now, contains all the source for the Literate-Markdown tool, `tangledown.py`, with all its documentation, all presented in narrative order, like a story.
130 |
131 |
132 | `tangledown.py` ***tangles*** code out of any Markdown document, not just this here README.md that you're reading right now. The verb "tangle" is traditional in Literate Programming. You might think it should be "untangle," because a Literate Markdown document is all tangled up from executable order. But Knuth prefers the human's point of view. The Markdown document contains the code in the _correct_, narrative order. To address untangling or detangling, we have [TangleUp](#tangleup-intro).
133 |
134 |
135 | You can _also_ run Tangledown inside a Jupyter notebook, specifically one that is linked to this here document, README.md, the one you're reading right now. See [this section](#oh-my-jupytext) for more.
136 |
137 |
138 | We should mention that Tangledown is similar to [org-babel](http://orgmode.org/worg/org-contrib/babel/) in Emacs (or [spacemacs](http://spacemacs.org/) for VIM users). Those are polished, best-of-breed Literate-Programming tools for the Down direction. You have to learn some Emacs to use them, and that's an barrier for many people. Markdown is good enough for Github, and thus for most of us right now.
139 |
140 |
141 | ## TANGLEUP INTRO
142 |
143 |
144 | Tangledown, as a distribution format, is a complete solution to Literate Programming. You get a single Markdown file and all the tested source for your project is included. Run Tangledown and the project is sprayed out on disk, ready for running, further testing, and deploying.
145 |
146 |
147 | As a development format, it's not quite enough. With only Tangledown, when you modify the source tree in executable order, your narrative is _instantly_ out of date. We can't have that. See [Section TangleUp](#tangleup) for more.
148 |
149 |
150 | ## TANGLING UP EXISTING CODE
151 |
152 |
153 | TangleUp can generate unique names, as GUIDs, say, for new source files and blocks. You should be able to TangleUp an existing source tree into a new, fresh, non-pre-existing Markdown file and, then round-trip TangleUp and TangleDown.
154 |
155 |
156 | # TANGLEDOWN DESIGN AND IMPLEMENTATION
157 |
158 |
159 | Let's do Tangledown, first, and TangleUp [later](#tangleup).
160 |
161 |
162 | # OH MY! JUPYTEXT
163 |
164 |
165 | ***Jupytext*** \[sic\] automatically syncs a Markdown file with a Jupyter notebook. Read about it [here](https://github.com/mwouts/jupytext). It works well in ***JupyterLab***. Read about that [here](https://github.com/jupyterlab/jupyterlab). Specifically, it lets you open this here Markdown file, README.md, that you're reading right now, as a Jupyter notebook, and you can evaluate some cells in the notebook.
166 |
167 |
168 | Here's how I installed everything on an Apple Silicon (M1) Mac Book Pro, with Python 3.9:
169 |
170 |
171 | ```
172 | pip install jupyter
173 | pip install jupyterlab
174 | pip install jupytext
175 | ```
176 |
177 |
178 | Here is how I run it:
179 |
180 |
181 | ```
182 | jupyter lab
183 | ```
184 |
185 |
186 | or
187 |
188 |
189 | ```
190 | PYTHONPATH=".:$HOME/Documents/GitHub/tangledow:$HOME/Library/Jupyter/kernels/tangledown_kernel" jupyter lab ~
191 | ```
192 |
193 |
194 | when I want the [Tangledown Kernel](#section-tangledown-kernel), and I almost always want the Tangledown kernel.
195 |
196 |
197 | In JupyterLab
198 |
199 |
200 | - open README.md
201 | - `View->Activate Command Palette`
202 | - check `Pair Notebook with Markdown`
203 | - right-click `README.md` and say `Open With -> Jupytext Notebook`
204 | - edit one of the two, `README.md` or `README.ipynb` ...
205 |
206 |
207 | Jupytext will update the other,
208 |
209 |
210 | > ***IMPORTANT***: To see the updates in the notebook when you modify the Markdown, you must `File->Reload Notebook from Disk`, and to see updates in the Markdown when you modify the notebook, you must `File->Reload Markdown File from Disk`. Jupytext forces you to reload changed files manually. I'll apologize here, on behalf of the Jupytext team.
211 |
212 |
213 | If you're reading or modifying `README.ipynb`, or if you `Open With -> Jupytext Notebook` on `README.md` (my preference), you may see tiny, unrendered Markdown cells above and below all your tagged nowebs and tangles. ***DON'T DELETE THE TINY CELLS***. Renderers of Markdown simply ignore the tags, but Jupytext makes tiny, invisible cells out of them!
214 |
215 |
216 | Unless you're running the optional, new [Tangledown Kernel](#section-tangledown-kernel), don't RUN cells with embedded `block` tags in Jupyter, you'll just get syntax errors from Python.
217 |
218 |
219 | # LET ME TELL YOU A STORY
220 |
221 |
222 | This here README.md, the one you're reading right now, should tell the story of Tangledown. We'll use Tangledown to _create_ Tangledown. That's just like bootstrapping a compiler. We'll use Tangledown to tangle Tangledown itself out of this here document named README.md that you're reading right now.
223 |
224 |
225 | The first part of the story is that I just started writing the story. The plan and outline was in my head (I didn't explicitly do [Method 1](#creative-writing)). Then I filled in the code, moved everything around when I needed to, and kept rewriting until it all worked the way I wanted it to work. Actually, I'm still doing this now. Tangledown and TangleUp are living stories!
226 |
227 |
228 | ## DISCLAIMER
229 |
230 |
231 | This is a useful toy, but it has zero error handling. We currently talk only about the happy path. I try to be rude ("[DUDE!](#dude)") every place where I sense trouble, but I'm only sure I haven't been rude enough. Read this with a sense of humor. You're in on the story with me, and it's supposed to be fun!
232 |
233 |
234 | I also didn't try it on Windows, but I did try it on WSL, the Windows Subsystem for Linux. Works great on WSL!
235 |
236 |
237 | ## HOW TO RUN TANGLEDOWN
238 |
239 |
240 | One way: run `python3 tangledown.py REAMDE.md` or just `python tangledown.py` at the command line. That command should _overwrite_ tangledown.py. The code for tangledown.py is inside this here README.md that you're reading right now. The name of the file to overwrite, namely `tangledown.py`, is embedded inside this here README.md itself, in the `file` attribute of a `` tag. Read about `tangle` tags [below](#section-tangle-tags)!
241 |
242 |
243 | If you said `python3 tangledown.py MY-FOO.md`, then you would tangle
244 | code out of `MY-FOO.md`. You'll do that once you start writing your own code in
245 | Tangledown. You will love it! We have some big examples that we'll write about elsewhere. Those examples include embedded code and microcode for exotic hardware, all written in Python!
246 |
247 |
248 | Tangledown is both a script and a module. You can run Tangledown in a [Jupytext](#oh-my-jupytext) cell after importing some stuff from the module. The next cell illustrates the typical bootstrapping joke of tangling Tangledown itself out of this here README.md that you're reading right now, after this Markdown file has been linked to a Jupytext notebook.
249 |
250 | ```python
251 | from tangledown import get_lines, accumulate_lines, tangle_all
252 | tangle_all(*accumulate_lines(*get_lines("README.md")))
253 | ```
254 |
255 | After you have tangled at least once, as above, and if you switch the notebook kernel to the new, optional [Tangledown Kernel](#section-tangledown-kernel), you can evaluate the source code for the whole program in the [later cell i'm linking right here](#tangle-listing-tangle-all). ***How Cool is That?***
256 |
257 |
258 | You'll also need to re-tangle and restart the Tangledown Kernel when you add new nowebs to your files. Sorry about that. This is still just a toy.
259 |
260 |
261 | Because Tangledown is a Python module, you can also run Tangledown from inside a standalone Python program, say in PyCharm or VS Code or whatever;
262 | `hello_world_tangler.py` in this repository is an example.
263 |
264 |
265 | Once again, Jupytext lets you RUN code from a Markdown
266 | document in a JupyterLab notebook with just the ordinary Python3 kernel. If you open `hello_world.md` as a Jupytext
267 | notebook in JupyterLab then you
268 | can run Tangledown in Jupyter cells. Right-click on the name `hello_world.md` in the JupyterLab GUI and choose
269 |
270 |
271 | `Open With ...` $\longrightarrow$ `Jupytext Notebook`
272 |
273 |
274 | Then run cells! This is close to the high bar set by org-babel!
275 |
276 |
277 | ## HOW IT WORKS: Markdown Ignores Mysterious Tags
278 |
279 |
280 | How can we rearrange code cells in a notebook or Markdown file from human-understandable, narrative order to executable order?
281 |
282 |
283 | Exploit the fact that most Markdown renderers, like Jupytext's, Github's, and [PyCharm's](https://www.jetbrains.com/pycharm/), ignore HTML / XML _tags_ (that is, stuff inside angle brackets) that they don't recognize. Let's enclose blocks of real, live code with `noweb` tags, like this:
284 |
285 |
286 |
287 |
288 | class TestSomething ():
289 | def test_something (self):
290 | assert (3 == 2+1)
291 |
292 |
293 |
294 |
295 | ### TAG CELLS CAN BE RAW OR MARKDOWN, NOT CODE
296 |
297 |
298 | The markdown above renders as follows. You can see the `noweb` one-liner raw cells above and below the code in Jupytext. If they were Markdown cells, they'd be tiny and invisible. That's 100% OK, and may be more to your liking! Try changing the cells from RAW (press "R") to Markdown (press "M") and back, then closing them (Shift-Enter) and opening them (Enter). Don't mark the tag cells CODE (don't press "Y"). Tangledown won't work because Jupytext will surround them with triple-backticks.
299 |
300 |
301 |
302 |
303 |
304 | ```python
305 | class TestSomething ():
306 | def test_something (self):
307 | assert (42 == 6 * 7)
308 | ```
309 |
310 |
311 |
312 |
313 |
314 | What are the `` and `` tags? We explain them immediately below.
315 |
316 |
317 | ## THREE TAGS: noweb, block, and tangle
318 |
319 |
320 | ### `noweb` tags
321 |
322 |
323 | Markdown ignores `` and `` tags, but `tangledown.py` _doesn't_. `tangledown.py` sucks up the ***contents*** of the `noweb` tags and sticks them into a dictionary for later lookup when processing `block` tags.
324 |
325 |
326 | #### CONTENTS OF A TAG
327 |
328 |
329 | The contents of a `noweb` tag are between the opening `` and closing `` fenceposts. Markdown renders code contents with syntax coloring and indentation. That's why we want code cells to be CODE cells and not RAW cells.
330 |
331 |
332 | The term _contents_ is ordinary jargon from HTML, XML, SGML, etc., and applies to any opening `` and closing `` pair.
333 |
334 |
335 | #### ATTRIBUTES OF A TAG
336 |
337 |
338 | The Tangledown dictionary key for contents of a `noweb` tag is the string value of the `name` attribute. For example, in ``, `name` is an _attribute_, and its string value is `"foo"`.
339 |
340 |
341 | > Noweb names must be unique in a document. TangleUp ensures that when it writes a new Markdown file from existing source, or you may do it by hand.
342 |
343 |
344 | **NOTE**: the `name` attribute of a `noweb` opener must be on the same line, like this:
345 |
346 |
347 |
348 |
349 |
350 | Ditto for our other attributes, as in the following. [Don't separate attributes with commas!]( https://www.w3schools.com/html/html_attributes.asp)
351 |
352 |
353 |
354 |
355 |
356 | This single-line rule is a limitation of the regular expressions that detect `noweb` tags. Remeber, [Tangledown is a toy](#disclaimer), a useful toy, but it's limited.
357 |
358 |
359 | #### FENCEPOST CELLS
360 |
361 |
362 | You can create the fencepost cells, `` and ``, either in the plain-text Markdown file, or you can create them in the synchronized Jupytext notebook.
363 |
364 |
365 | If you create fencepost cells in plain-text Markdown as opposed to the Jupytext notebook, leave a blank line after the opening `` and a blank line before the closing ``. If you don't, the Markdown renderer won't color and indent the contents. Tangledown will still work, but the Markdown renderer will format your code like text without syntax coloring and indentatino.
366 |
367 |
368 | If you write fencepost cells in Markdown cells in the notebook or as blank-surrounded tags in the plain-text Markdown, the fenceposts appear as tiny, invisible Markdown cells because the renderer treats them as empty markdown cells. That's the fundamental operating principle of Tangledown: Markdown ignores tags it doesn't recognize! ***DON'T DELETE THE TINY, INVISIBLE CELLS***, but you can open (Enter) and close (Shift-Enter) them.
369 |
370 |
371 | If you create `noweb` and `tangle` tags in the notebook and you want them _visible_, mark them _RAW_ by pressing "R" with the cell highlighted but not editable. Don't mark them CODE (don't press "Y"). Tangledown will break because Jupytext will surround them with triple-backticks.
372 |
373 |
374 | ### `block` tags
375 |
376 |
377 | Later, in the [second pass](#second-pass), Tangledown blows the contents of `noweb` tags back out wherever it sees `block` tags with matching `name` attributes. That's how you can define code anywhere in your document and use it in any other place, later or earlier, more than once if you like.
378 |
379 |
380 | `block` tags can and should appear in the contents of `noweb` tags and in the in the contets of `tangle` tags, too. That's how you structure your narrative!
381 |
382 |
383 | Tangledown discards the contents of `block` tags. Only the `name` attribute of a `block` tag matters.
384 |
385 |
386 | #### WRITE IN ANY-OLD-ORDER YOU LIKE
387 |
388 |
389 | You don't have to write the noweb _before_ you write a matching `block` tag. You can refer to a `noweb` tag before it exists in time and space, more than once if you like. You can define things and name things and use things in any order that makes your thinking and your story more clear. This is literature, after all.
390 |
391 |
392 | ### `tangle` tags
393 |
394 |
395 | A `tangle` tag sprays its block-expanded contents to a file on disk. What file? The file named in the `file` attribute of the `tangle` tag. ***Expanding*** contents of a `tangle` tag means replacing every `block` tag with the contents of its matching `noweb` tag, recursively, until everything bottoms out in valid Python.
396 |
397 |
398 | The same rules about blank lines hold for `tangle` tags as they do for `noweb` tags: if you want Markdown to render the contents like code, surround the contents with blank lines or mark the tag cells _RAW_. The following Markdown
399 |
400 |
401 |
402 |
403 |
404 |
405 | if __name__ == '__main__':
406 | TestSomething().test_something()
407 |
408 |
409 |
410 |
411 | renders like this
412 |
413 |
414 |
415 |
416 | ```python
417 | import unittest
418 |
419 |
420 |
421 | if __name__ == '__main__':
422 | TestSomething().test_something()
423 | ```
424 |
425 |
426 |
427 |
428 | See the tiny, invisible Markdown cells above and below the code? Play around with opening and closing them with Enter and Shift-Enter, respectively, and marking them RAW (Press "R") and Markdown ("M"). Don't mark them CODE ("Y").
429 |
430 |
431 | You can evaluate the cell with the new, optional [Tangledown Kernel](#section-tangledown-kernel). If you evaluate the code cell in the Python Kernel, you'll get a syntax error because the `block` tag is not valid Python. The syntax error is harmless to Tangledown.
432 |
433 |
434 | This code tangles to the file `/dev/null`. That's a nifty trick for temporary `tangle` blocks. You can talk about them, validate them by executing their cells in the [Tangledown Kernel](#section-tangledown-kernel), and throw them away.
435 |
436 |
437 | [TangleUp](#tangleup) knows where Tangledown puts all the blocks and tangles. That's how, when you change code on disk, TangleUp can put it all back in the single file of Literate Markdown.
438 |
439 |
440 | # HUMAN! READ THE `block` TAGS!
441 |
442 |
443 | Markdown renders `block` tags verbatim inside nowebs or tangles. This is good for humans, who will think
444 |
445 |
446 | > AHA!, this `block` refers to some code in a `noweb` tag somewhere else in this Markdown document. I can read all the details of that code later, when it will make more sense. I can look at the picture on the box before the pieces of the jigsaw puzzle.
447 |
448 | > Thank you, kindly, author! Without you, I'd be awash in details. I'd get tired and cranky before understanding the big picture!
449 |
450 |
451 | See, I'll prove it to you. Below is the code for all of `tangledown.py` itself. You can understand this without understanding the _implementations_ of the sub-pieces, just getting a rought idea of _what_ they do from the names of the `block` tags. **READ THE NAMES IN THE BLOCK TAGS**. Later, if you want to, you can read all the details in the `noweb` tags named by the `block` tags.
452 |
453 |
454 | # TANGLE ALL
455 |
456 |
457 | If you're running the new, optional [Tangledown Kernel](#section-tangledown-kernel), you can evaluate this next cell and run Tangledown on Tangledown itself, right here in a Jupyter notebook. ***How Cool is That?***
458 |
459 |
460 |
461 |
462 |
463 | ```python
464 |
465 |
466 |
467 |
468 |
469 |
470 |
471 |
472 |
473 |
474 |
475 | def tangle_all(tracer: Tracer, nowebs: Nowebs, tangles: Tangles) -> None:
476 | for filename, liness in tangles.items ():
477 | Path(filename).parents[0].mkdir(parents=True, exist_ok=True)
478 | contents: str = expand_tangles(tracer, liness, nowebs)
479 | with open (filename, 'w') as outfile:
480 | print(f"WRITING FILE: {filename}")
481 | outfile.write (contents)
482 | tracer.dump()
483 |
484 | if __name__ == "__main__":
485 | fn, lines = get_lines(get_aFile())
486 | # test_re_matching(lines)
487 | tracer, nowebs, tangles = accumulate_lines(fn, lines)
488 | tangle_all(tracer, nowebs, tangles)
489 | ```
490 |
491 |
492 |
493 |
494 |
495 | The whole program is in the function `tangle_all`. We get hybrid vigor by testing `__name__` against `"__main__"`: `tangledown.py` is both a script and module.
496 |
497 |
498 | All we do in `tangle_all` is loop over all the line lists in the tangles (`for filename, liness in tangles.items()`) and [expand them](#expand-tangles) to replace blocks with nowebs. Yes, "liness" has an extra "s". Remember [Smeagol](https://lotr.fandom.com/wiki/Gollum) Pronounce it like "my preciouses, my lineses!"
499 |
500 |
501 | The code will create the subdirectories needed. For example, if you tangle to file `foo/bar/baz/qux.py,` the code creates the directory chain `./foo/br/baz/` if it doesn't exist.
502 |
503 |
504 | ## TYPES
505 |
506 |
507 | Let us now explain the implementation. The first block in the tangle above is _types_. What is the noweb of _types_? It's here.
508 |
509 |
510 | A `Line` is a string, Python base type `str`. `Lines` is the type of a list of lines. `Liness` is the type of a list of list of lines, in a pluralizing shorthand borrowed from Haskell practice. Pronounce `liness` the way [Smeagol](https://lotr.fandom.com/wiki/Gollum) would do: "my preciouses, my lineses!"
511 |
512 |
513 | A noweb name is a string, and a tangle file name is a string. A line number is an `int`, a Python base type.
514 |
515 |
516 | Nowebs are dictionaries from noweb names to lines.
517 |
518 |
519 | Tangles are dictionaries from file names to Liness --- lists of lists of lines. Tangledown accumulates output for `tangle` files mentioned more than once. If you tangle to `qux.py` in one place and then also tangle to `qux.py` somewhere else, the second tangle won't overwrite the first, but append to it. That's why tangles are lists of lists of lines, one list of lines for each mentioning of a tangle file. Read more about that in [expand-tangles](#expand-tangles).
520 |
521 |
522 |
523 |
524 |
525 | ```python
526 | from typing import List, Dict, Tuple, Match
527 |
528 | NowebName = str
529 | FileName = str
530 | TangleFileName = FileName
531 | LineNumber = int
532 | Line = str
533 | Lines = List[Line]
534 | Liness = List[Lines]
535 | LinesTuple = Tuple[LineNumber, Lines]
536 |
537 | Nowebs = Dict[NowebName, Lines]
538 | Tangles = Dict[TangleFileName, Liness]
539 | ```
540 |
541 |
542 |
543 |
544 |
545 | We'll implement all the noweb blocks, like `accumulate_contents` and `eatBlockTag`, later. You can read about them, or not, after you've gotten more of the big picture.
546 |
547 |
548 | ## DEBUGGING AND REFACTORING
549 |
550 |
551 | The Tangledown Kernel doesn't support the Jupytext debugger, yet. Sorry about that. Tangle the code out to disk and debug it with pudb or whatever, then tangle it back up into your Literate Markdown file via [TangleUp](#tangleup).
552 |
553 |
554 | [Tangledown is still a toy](#disclaimer). Ditto refactoring. PyCharm is great for that, but you'll have to do it on tangled files and detangle (paste) back into the Markdown.
555 |
556 |
557 | ## EXPAND TANGLES
558 |
559 |
560 | We separated out the inner loop over Liness \[sic\] into another function, `expand_tangles`, so that the [Tangledown Kernel](#section-tangledown-kernel) can import it and apply it to `block` tags. `tangle_all` calls `expand_tangles`; `expand_tangles` calls `expand_blocks`. Read about `expand_blocks` [here](#expand-blocks).
561 |
562 | ```python
563 | from graphviz import Digraph
564 | g = Digraph(graph_attr={'size': '8,5'}, node_attr={'fontname': 'courier'})
565 | g.attr(rankdir='LR')
566 | g.edge('tangle_all', 'expand_tangles')
567 | g.edge('expand_tangles', 'expand_blocks')
568 | g
569 | ```
570 |
571 |
572 |
573 |
574 |
575 | ```python
576 | def expand_tangles(tracer: Tracer, liness: Liness, nowebs: Nowebs) -> str:
577 | contents: Lines = []
578 | for lines in liness:
579 | while there_is_a_block_tag (lines):
580 | lines = expand_blocks (tracer, nowebs, lines)
581 | contents += lines
582 | return ''.join(contents)
583 | ```
584 |
585 |
586 |
587 |
588 |
589 | ## Tangledown Tangles Itself?
590 |
591 |
592 | Tangledown has two kinds of regular expressions (regexes) for matching tags in a Markdown file:
593 |
594 | - regexes for `noweb` and `tangle` tags that appear on lines by themselves, left-justified
595 |
596 | - regexes that match `` tags that may be indented, and match their closing `` tags, which may appear on the same line as `` or on lines by themselves.
597 |
598 |
599 | Both kinds of regex are ___safe___: they do not match themselves. That means it's safe to run
600 | `tangledown.py` on this here `READMD.md`, which contains tangled source for `tangledown.py`.
601 |
602 |
603 | The two regexes in noweb `left_justified_regexes` match `noweb` and`tangle` tags that appear on lines by themselves, left-justified.
604 |
605 |
606 | > They also wont match `noweb` and `tangle` tags that are indented. That lets us _talk about_ `noweb` and `tangle` tags without processing them: just put the examples you're talking about in an _indented_ Markdown code cell instead of in a triple-backticked Markdown code cell.
607 |
608 |
609 | The names in the attributes of `noweb` and `tangle` tags must start with a letter, and they can contain letters, numbers, hyphens, underscores, whitespace, and dots.
610 |
611 |
612 | The names of `noweb` tags must be globally unique within the Markdown file. Multiple `tangle` tags may refer to the same output file, in which cases, Tangledown appends the contents of the second and subsequent `tangle` tags to a list of list of lines, to a `Liness`.
613 |
614 |
615 | ### LEFT-JUSTIIED REGEXES
616 |
617 |
618 | There is a `.*` at the end to catch attributes beyon `name`. A bit of future-proofing.
619 |
620 |
621 |
622 |
623 |
624 | ```python
625 | noweb_start_re = re.compile (r'^$')
626 | noweb_end_re = re.compile (r'^$')
627 |
628 | tangle_start_re = re.compile (r'^$')
629 | tangle_end_re = re.compile (r'^$')
630 | ```
631 |
632 |
633 |
634 |
635 |
636 | ### ANYWHERE REGEXES
637 |
638 |
639 | The regexes in this noweb, `anywhere_regexes`, match `block` tags that may be indented, preserving indentation. The `block_end_re` regex also preserves indentation. Indentation is critical for Python, Haskell, and other languages.
640 |
641 |
642 | I converted the 'o' in 'block' to a harmless regex group `[o]` so that `block_end_re` won't match itself. That makes it safe to run this code on this here document itself.
643 |
644 |
645 |
646 |
647 |
648 | ```python
649 | block_start_re = re.compile (r'^(\s*)')
650 | block_end_re = re.compile (r'^(\s)*')
651 | ```
652 |
653 |
654 |
655 |
656 |
657 | ## Test the Regular Expressions
658 |
659 |
660 | ### OPENERS
661 |
662 |
663 | The code in noweb `openers` has two `block` tags that refer to the nowebs of the regexes defined above, namely `left_justified_regexes` and `anywhere_regexes`. After Tangledown substitutes the contents of the nowebs for the blocks, the code becomes valid Python and you can call `test_re_matching` in the [Tangledown Kernel](#section-tangledown-kernel) or at the command line. When you call it, it proves that we can recognize all the various kinds of tags. We leave the regexes themselves as global pseudo-constants so that they're both easy to test and to use in the body of the code ([Demeter weeps](https://en.wikipedia.org/wiki/Law_of_Demeter) because of globals).
664 |
665 |
666 | The code in `hello_world.ipynb` (after you have Paired a Notebook with the Markdown File `hello_world.md`) runs this test as its last act to check that `tangledown.py` was correctly tangled from this here `README.md`. That code works in the ordinary Python kernel and in the [Tangledown Kernel](#section-tangledown-kernel).
667 |
668 |
669 | Notice the special treatment for block ends, which are usually on the same lines as their block opener tags, but not necessarily so. That lets you put (useless) contents in `block` tags.
670 |
671 |
672 |
673 |
674 |
675 | ```python
676 | import re
677 | import sys
678 | from pathlib import Path
679 |
680 |
681 |
682 |
683 |
684 | def test_re_matching(fp: Path, lines: Lines) -> None:
685 | for line in lines:
686 | noweb_start_match = noweb_start_re.match (line)
687 | tangle_start_match = tangle_start_re.match (line)
688 | block_start_match = block_start_re.match (line)
689 |
690 | noweb_end_match = noweb_end_re.match (line)
691 | tangle_end_match = tangle_end_re.match (line)
692 | block_end_match = block_end_re.match (line)
693 |
694 | if (noweb_start_match):
695 | print ('NOWEB: ', noweb_start_match.group (0))
696 | print ('name of the block: ', noweb_start_match.group (1))
697 | elif (noweb_end_match):
698 | print ('NOWEB END: ', noweb_end_match.group (0))
699 | elif (tangle_start_match):
700 | print ('TANGLE: ', tangle_start_match.group (0))
701 | print ('name of the file: ', tangle_start_match.group (1))
702 | elif (tangle_end_match):
703 | print ('TANGLE END: ', tangle_end_match.group (0))
704 | elif (block_start_match):
705 | print ('BLOCK: ', block_start_match.group (0))
706 | print ('name of the block: ', block_start_match.group (1))
707 | if (block_end_match):
708 | print ('BLOCK END SAME LINE: ', block_end_match.group (0))
709 | else:
710 | print ('BLOCK NO END')
711 | elif (block_end_match):
712 | print ('BLOCK END ANOTHER LINE: ', block_end_match.group (0))
713 | else:
714 | pass
715 | ```
716 |
717 |
718 |
719 |
720 |
721 | # TANGLEDOWN: Two Passes
722 |
723 |
724 | Tangledown passes once over the file to collect contents of `noweb` and `tangle` tags, and again over the `tangle` tags to expand `block` tags. In the second pass, Tangledown substitutes noweb contents for corresponding `block` tags until there are no more `block` tags, creating valid Python.
725 |
726 |
727 | ## First Pass: Saving Noweb and Tangle Blocks
728 |
729 |
730 | In the first pass over the file, we'll just save the contents of noweb and tangle into dictionaries, without expanding nested `block` tags.
731 |
732 |
733 | ### GET A FILE NAME
734 |
735 |
736 | `tangledown.py` is both a script and a module. As a script, you run it from the command line, so it gets its input file name from command-line arguments. As a module, called from another Python program, you probably want to give the file as an argument to a function, specifically, to `get_lines`.
737 |
738 |
739 | Let's write two functions,
740 |
741 | - `get_aFile`, which parses command-line arguments and produces a file name; the default file name is `README.md`
742 |
743 | - `get_lines`, which
744 |
745 | - gets lines, without processing `noweb`, `tangle`, or `block` tags, from its argument, `aFilename`
746 |
747 | - replaces `#raw` and `#endraw` fenceposts with blank lines
748 |
749 | - writes out the full file path to a secret place where the [Tangledown Kernel](#section-tangledown-kernel) can pick it up
750 |
751 |
752 | `get_aFile` can parse command-line arguments that come from either `python` on the command line or from a `Jupitext` notebook, which has a few kinds of command-line arguments we must ignore, namely command-line arguments that end in `.py` or in `.json`.
753 |
754 |
755 | ### GET LINES
756 |
757 |
758 | This method for getting a file name from the argument list will eat all options. It works for the Tangledown Kernel and for tangling down from a script or a notebook, but it's not future-proofed. Tangledown is still a toy.
759 |
760 |
761 |
762 |
763 |
764 | ```python
765 | print({f'len(sys.argv)': len(sys.argv), f'sys.argv': sys.argv})
766 | ```
767 |
768 |
769 |
770 |
771 |
772 |
773 |
774 |
775 |
776 | ```python
777 | def get_aFile() -> str:
778 | """Get a file name from the command-line arguments
779 | or 'README.md' as a default."""
780 |
781 | aFile = 'README.md' # default
782 | if len(sys.argv) > 1:
783 | file_names = [p for p in sys.argv
784 | if (p[0] != '-') # option
785 | and (p[-3:] != '.py')
786 | and (p[-5:] != '.json')]
787 | if file_names:
788 | aFile = sys.argv[1]
789 | return aFile
790 |
791 | raw_line_re: re = re.compile(r'')
792 | def get_lines(fn: FileName) -> Lines:
793 | """Get lines from a file named fn. Replace
794 | 'raw' fenceposts with blank lines. Write full path to
795 | a secret place for the Tangledown kernel to pick it up.
796 | Return tuple of file path (for TangleUp's Tracer) and
797 | lines."""
798 |
799 | xpath = save_aFile_path_for_kernel(fn)
800 | with open(fn) as f:
801 | in_lines: Lines = f.readlines ()
802 | out_lines: Lines = []
803 | for in_line in in_lines:
804 | out_lines.append(
805 | in_line if not raw_line_re.match(in_line) else "\n")
806 | return xpath, out_lines
807 | ```
808 |
809 |
810 |
811 |
812 |
813 | ### NORMALIZE FILE PATH
814 |
815 |
816 | We must normalize file names so that, for example, "foo.txt" and "./foo.txt" indicate the same file and so that `~/` denotes the home directory on Mac and Linux. I didn't test this on Windows.
817 |
818 |
819 |
820 |
821 |
822 | ```python
823 | def anchor_is_tilde(path_str: str) -> bool:
824 | result = (path_str[0:2] == "~/") and (Path(path_str).anchor == '')
825 | return result
826 |
827 | def normalize_file_path(tangle_file_attribute: str) -> Path:
828 | result: Path = Path(tangle_file_attribute)
829 | if (anchor_is_tilde(tangle_file_attribute)):
830 | result = (Path.home() / tangle_file_attribute[2:])
831 | return result.absolute()
832 | ```
833 |
834 |
835 |
836 |
837 |
838 | ### SAVE A FILE PATH FOR THE KERNEL
839 |
840 |
841 | Returns its input file name after expanding its full path and saving the full path in a special place where the [Tangledown Kernel](#section-tangledown-kernel) can find it.
842 |
843 |
844 |
845 |
846 |
847 | ```python
848 | def save_aFile_path_for_kernel(fn: FileName) -> FileName:
849 | xpath: Path = Path.cwd() / Path(fn).name
850 | victim_file_name = str(xpath.absolute())
851 | safepath: Path = Path.home() / '.tangledown/current_victim_file.txt'
852 | Path(safepath).parents[0].mkdir(parents=True, exist_ok=True)
853 | print(f"SAVING {victim_file_name} in secret place {str(safepath)}")
854 | with open(safepath, "w") as t:
855 | t.write(victim_file_name)
856 | return xpath
857 | ```
858 |
859 |
860 |
861 |
862 |
863 | ### OH NO! THERE ARE TWO WAYS
864 |
865 |
866 | Turns out there are two ways to write code blocks in Markdown:
867 |
868 | 1. indented by four spaces, useful for quoted Markdown and quoted triple-backtick blocks
869 |
870 | 2. surrounded by triple backticks and _not_ indented.
871 |
872 |
873 | Tangledown must handle both ways.
874 |
875 |
876 | We use the trick of a harmless regex group --- regex stuff inside square brackets --- around one of the backticks in the regex that recognizes triple backticks. This regex is safe to run on itself. See `triple_backtick_re` in the code immediately below.
877 |
878 |
879 | The function `first_non_blank_line_is_triple_backtick`, in noweb `oh-no-there-are-two-ways` recognizes code blocks bracketed by triple backticks. The contents of this `noweb` tag is triple-bacticked, itself. Kind of a funny self-toast joke, no?
880 |
881 |
882 | Remember the _use-mention_ dichotomy from Philosophy class? No problem if you don't.
883 |
884 |
885 | When we're _talking about_ `noweb` and `tangle` tags, but don't want to process them, we indent the tags and the code blocks. Tangledown won't process indented `noweb` and `tangle` tags because the regexes in noweb `left_justified_regexes` won't match them.
886 |
887 |
888 | We can also talk about triple-backticked blocks by indenting them. Tangledown won't mess with indented triple-backticked blocks, because the regex needs them left-justified. Markdown also wont get confused, so we can quote whole markdown files by indenting them. Yes, your Literate Markdown can _also_, recursively, tangle out more Markdown files. How cool is that? Will the recursive jokes never end?
889 |
890 |
891 | [TangleUp](#tangleup) has a heuristic for placing language and id information on triple-backtick fence openers. Our function will retrieve those if present.
892 |
893 |
894 | We see, below, why the code tracks line numbers. We might do all this in some super-bitchin', sophomoric list comprehension, but this is more obvious-at-a-glance. That's a good thing.
895 |
896 |
897 | ### FIRST NON-BLANK LINE IS TRIPLE BACKTICK
898 |
899 |
900 | Match lines with left-justified triple-backtick. Pass through lines with indented triple-backtick.
901 |
902 |
903 | We must trace `raw` fenceposts, but not copy them to
904 |
905 |
906 |
907 |
908 |
909 | ```python
910 | triple_backtick_re = re.compile (r'^`[`]`((\w+)?\s*(id=([0-9a-fA-F-]+))?)')
911 | blank_line_re = re.compile (r'^\s*$')
912 |
913 | def first_non_blank_line_is_triple_backtick (
914 | i: LineNumber, lines: Lines) -> Match[Line]:
915 | while (blank_line_re.match (lines[i])):
916 | i = i + 1
917 | yes = triple_backtick_re.match (lines[i])
918 | language = "python" # default
919 | id_ = None # default
920 | if yes:
921 | language = yes.groups()[1] or language
922 | id_ = yes.groups()[3] ## can be 'None'
923 | return i, yes, language, id_
924 | ```
925 |
926 |
927 |
928 |
929 |
930 | ### ACCUMULATE CONTENTS
931 |
932 |
933 | Tangledown is a funny little compiler. It converts Literate Markdown to Python or other languages (Tangledown supports Clojure and Markdown, too). We could go nuts and write it in highfalutin' style, and then it would be much bigger, more elaborate, and easier to explain to a Haskell programmer. It might also be less of a toy. However, we want this toy Tangledown for now to be:
934 |
935 | - very short
936 |
937 | - independent of rich libraries like beautiful soup and parser combinators
938 |
939 | - completely obvious to anyone
940 |
941 |
942 | We'll just use iteration and array indices, but in a tasteful way so our functional friends won't puke. This is Python, after all, not Haskell! We can just _get it done_, with grace, panache, and aplomb.
943 |
944 |
945 | The function `accumulate_contents` accumulates the contents of left-justified `noweb` or `tangle` tags. The function starts at line `i` of the input, then figures out whether a tag's first non-blank line is triple backtick, in which case it _won't_ snip four spaces from the beginning of every line, and finally keeps going until it sees the closing fencepost, `` or ``. It returns a tuple of the line index _after_ the closing fencepost, and the contents, possibly de-dented. The function manipulates line numbers to skip over triple backticks.
946 |
947 |
948 |
949 |
950 |
951 | ```python
952 | def accumulate_contents (
953 | lines: Lines, i: LineNumber, end_re: re) -> LinesTuple:
954 | r"""Harvest contents of a noweb or tangle tag. The start
955 | taglet was consumed by caller. Consume the end taglet."""
956 | i, yes, language, id_ = first_non_blank_line_is_triple_backtick(i, lines)
957 | snip = 0 if yes else 4
958 | contents_lines: Lines = []
959 | for j in range (i, len(lines)):
960 | if (end_re.match(lines[j])):
961 | return j + 1, language, id_, contents_lines # the only return
962 | if not triple_backtick_re.match (lines[j]):
963 | contents_lines.append (lines[j][snip:])
964 | ```
965 |
966 |
967 |
968 |
969 |
970 | ### NEW ACCUMULATE LINES
971 |
972 |
973 | The old `accumulate_lines` has reached the end of its life. It ignores raw cells, except for some hacks for raw noweb and tangle tags. The `new_accumulate_lines` must parse several kinds of line-sequences explicitly. Let's be careful not to call line-sequences _blocks_ so we don't confuse line-sequences with block tags.
974 |
975 | 1. Regular
976 |
977 |
978 | ### ACCUMULATE LINES
979 |
980 |
981 | The function `accumulate_lines` calls `accumulate_contents` to suck up the contents of all the left-justified `noweb` tags and `tangle` tags out of a file, but doesn't expand any `block` tags that it finds. It just builds up dictionaries, `noweb_blocks` and `tangle_files`, keyed by `name` or `file` attributes it finds inside `noweb` or `tangle` tags.
982 |
983 |
984 |
985 |
986 |
987 | ```python
988 |
989 | raw_start_re = re.compile("")
990 | raw_end_re = re.compile("")
991 | from pprint import pprint
992 | def accumulate_lines(fp: Path, lines: Lines) -> Tuple[Tracer, Nowebs, Tangles]:
993 | tracer = Tracer()
994 | tracer.fp = fp
995 | nowebs: Nowebs = {}
996 | tangles: Tangles = {}
997 | i = 0
998 | while i < len(lines):
999 | noweb_start_match = noweb_start_re.match (lines[i])
1000 | tangle_start_match = tangle_start_re.match (lines[i])
1001 | if noweb_start_match:
1002 |
1003 | elif tangle_start_match:
1004 |
1005 | elif raw_start_re.match (lines[i]):
1006 |
1007 | else:
1008 |
1009 | if in_between: # Close out final markdown.
1010 | tracer._end_betweens(i)
1011 | return tracer, nowebs, tangles
1012 | ```
1013 |
1014 |
1015 |
1016 |
1017 |
1018 | #### ACCUMULATE LINES: HANDLE RAW
1019 |
1020 |
1021 |
1022 |
1023 |
1024 | ```python
1025 | pass
1026 | ```
1027 |
1028 |
1029 |
1030 |
1031 |
1032 | #### ACCUMULATE LINES: HANDLE MARKDOWN
1033 |
1034 |
1035 |
1036 |
1037 |
1038 | ```python
1039 | in_between = True
1040 | tracer.add_markdown(i, lines[i])
1041 | i += 1
1042 | ```
1043 |
1044 |
1045 |
1046 |
1047 |
1048 | #### ACCUMULATE LINES: HANDLE NOWEB
1049 |
1050 |
1051 |
1052 |
1053 |
1054 | ```python
1055 | in_between = False
1056 | key: NowebName = noweb_start_match.group(1)
1057 | (i, language, id_, nowebs[key]) = \
1058 | accumulate_contents(lines, i + 1, noweb_end_re)
1059 | tracer.add_noweb(i, language, id_, key, nowebs[key])
1060 | ```
1061 |
1062 |
1063 |
1064 |
1065 |
1066 | #### ACCUMULATE LINES: HANDLE TANGLE
1067 |
1068 |
1069 |
1070 |
1071 |
1072 | ```python
1073 | in_between = False
1074 | key: TangleFileName = \
1075 | str(normalize_file_path(tangle_start_match.group(1)))
1076 | if not (key in tangles):
1077 | tangles[key]: Liness = []
1078 | (i, language, id_, things) = accumulate_contents(lines, i + 1, tangle_end_re)
1079 | tangles[key] += [things]
1080 | tracer.add_tangle(i, language, id_, key, tangles[key])
1081 | ```
1082 |
1083 |
1084 |
1085 |
1086 |
1087 | ## DUDE!
1088 |
1089 |
1090 | There is a lot that can go wrong. We can have all kinds of mal-formed contents:
1091 |
1092 | - too many or not enough triple-backtick lines
1093 | - indentation errors
1094 | - broken tags
1095 | - mismatched fenceposts
1096 | - dangling tags
1097 | - misspelled names
1098 | - syntax errors
1099 | - infinite loops (cycles, hangs)
1100 | - much, much more
1101 |
1102 |
1103 | We'll get to error handling someday, maybe. Tangledown is [just a little toy at the moment](#disclaimer), but I thought it interesting to write about. If it's ever distributed to hostile users, then we will handle all the bad cases. But not now. Let's get the happy case right.
1104 |
1105 |
1106 | ## Second Pass: Expanding Blocks
1107 |
1108 |
1109 | Iterate over all the `noweb` or `tangle` tag contents and expand the
1110 | `block` tags we find in there, recursively. That means keep going until there are no more `block` tags, because nowebss are allowed (encouraged!) to refer to other nowebs via `block` tags. If there are cycles, this will hang.
1111 |
1112 |
1113 | ### DUDE! HANG?
1114 |
1115 |
1116 | We're doing the happy cases first, and will get to cycle detection someday, maybe.
1117 |
1118 |
1119 | ### THERE IS A BLOCK TAG
1120 |
1121 |
1122 | First, we need to detect that some list of lines contains a `block` tag, left-justified or not. That means we must keep running the expander on that list.
1123 |
1124 |
1125 |
1126 |
1127 |
1128 | ```python
1129 | def there_is_a_block_tag (lines: Lines) -> bool:
1130 | for line in lines:
1131 | block_start_match = block_start_re.match (line)
1132 | if (block_start_match):
1133 | return True
1134 | return False
1135 | ```
1136 |
1137 |
1138 |
1139 |
1140 |
1141 | ### EAT A BLOCK TAG
1142 |
1143 |
1144 | If there is a `block` tag, we must eat the tag and its meaningless contents:
1145 |
1146 |
1147 |
1148 |
1149 |
1150 | ```python
1151 | def eat_block_tag (i: LineNumber, lines: Lines) -> LineNumber:
1152 | for j in range (i, len(lines)):
1153 | end_match = block_end_re.match (lines[j])
1154 | # DUDE! Check leading whitespace against block_start_re
1155 | if (end_match):
1156 | return j + 1
1157 | else: # DUDE!
1158 | pass
1159 | ```
1160 |
1161 |
1162 |
1163 |
1164 |
1165 | ### EXPAND BLOCKS
1166 |
1167 |
1168 | The following function does one round of block expansion. The caller must test whether any `block` tags remain, and keep running the expander until there are no more `block` tags. Our functional fu grandmaster might be appalled, but sometimes it's just easier to iterate than to recurse.
1169 |
1170 |
1171 |
1172 |
1173 |
1174 | ```python
1175 | def expand_blocks (tracer: Tracer, nowebs: Nowebs, lines: Lines,
1176 | language: str = "python") -> Lines:
1177 | out_lines = []
1178 | block_key: NowebName = ""
1179 | for i in range (len (lines)):
1180 | block_start_match = block_start_re.match (lines[i])
1181 | if (block_start_match):
1182 | leading_whitespace: str = block_start_match.group (1)
1183 | block_key: NowebName = block_start_match.group (2)
1184 | block_lines: Lines = nowebs [block_key] # DUDE!
1185 | i: LineNumber = eat_block_tag (i, lines)
1186 | for block_line in block_lines:
1187 | out_lines.append (leading_whitespace + block_line)
1188 | else:
1189 | out_lines.append (lines[i])
1190 | return out_lines
1191 | ```
1192 |
1193 |
1194 |
1195 |
1196 |
1197 | ## TRACER
1198 |
1199 |
1200 | For [TangleUp](#tangleup), we'll need to trace the entire operation of Tangledown, first and second passes. TangleUp reverses Tangledown, so we will want a best-effort reconstruction of the original Markdown file.
1201 |
1202 |
1203 | Our first approach will be a sequential list of dictionaries with all the needed information.
1204 |
1205 |
1206 |
1207 |
1208 |
1209 | ```python
1210 | from dataclasses import dataclass, field
1211 | from typing import Union ## TODO
1212 | @dataclass
1213 | class Tracer:
1214 | trace: List[Dict] = field(default_factory=list)
1215 | line_no = 0
1216 | current_betweens: Lines = field(default_factory=list)
1217 | fp: Path = None
1218 | # First Pass
1219 |
1220 |
1221 |
1222 |
1223 |
1224 |
1225 | # Second Pass
1226 |
1227 |
1228 | ```
1229 |
1230 |
1231 |
1232 |
1233 |
1234 | ### TRACER.ADD_RAW
1235 |
1236 |
1237 |
1238 |
1239 |
1240 | ```python
1241 | def add_raw(self, i, between: Line):
1242 | self.line_no += 1
1243 | self.current_betweens.append((self.line_no, between))
1244 | ```
1245 |
1246 |
1247 |
1248 |
1249 |
1250 | ### TRACER.ADD_MARKDOWN
1251 |
1252 |
1253 |
1254 |
1255 |
1256 | ```python
1257 | def add_markdown(self, i, between: Line):
1258 | self.line_no += 1
1259 | self.current_betweens.append((self.line_no, between))
1260 | ```
1261 |
1262 |
1263 |
1264 |
1265 |
1266 | ### TRACER._END_BETWEENS
1267 |
1268 |
1269 |
1270 |
1271 |
1272 | ```python
1273 | def _end_betweens(self, i):
1274 | if self.current_betweens:
1275 | self.trace.append({"ending_line_number": self.line_no, "i": i,
1276 | "language": "markdown", "kind": 'between',
1277 | "text": self.current_betweens})
1278 | self.current_betweens = []
1279 | ```
1280 |
1281 |
1282 |
1283 |
1284 |
1285 | ### TRACER.ADD_NOWEB
1286 |
1287 |
1288 |
1289 |
1290 |
1291 | ```python
1292 | def add_noweb(self, i, language, id_, key, noweb_lines):
1293 | self._end_betweens(i)
1294 | self.line_no = i
1295 | self.trace.append({"ending_line_number": self.line_no, "i": i,
1296 | "language": language, "id_": id_,
1297 | "kind": 'noweb', key: noweb_lines})
1298 | ```
1299 |
1300 |
1301 |
1302 |
1303 |
1304 | ### TRACER.ADD_TANGLE
1305 |
1306 |
1307 |
1308 |
1309 |
1310 | ```python
1311 | def add_tangle(self, i, language, id_, key, tangle_liness):
1312 | self._end_betweens(i)
1313 | self.line_no = i
1314 | self.trace.append({"ending_line_number": self.line_no, "i": i,
1315 | "language": language, "id_": id_,
1316 | "kind": 'tangle', key: tangle_liness})
1317 | ```
1318 |
1319 |
1320 |
1321 |
1322 |
1323 | ### TRACER.ADD_EXPANDED_NOWEB
1324 |
1325 |
1326 |
1327 |
1328 |
1329 | ```python
1330 | def add_expandedn_noweb(self, i, language, id_, key, noweb_lines):
1331 | self._end_betweens(i)
1332 | self.line_no = i
1333 | self.trace.append({"ending_line_number": self.line_no, "i": i,
1334 | "language": language, "id_": id_,
1335 | "kind": 'expanded_noweb', key: noweb_lines})
1336 | ```
1337 |
1338 |
1339 |
1340 |
1341 |
1342 | ### TRACER.ADD_EXPANDED_TANGLE
1343 |
1344 |
1345 |
1346 |
1347 |
1348 | ```python
1349 | def add_expanded_tangle(self, i, language, id_, key, tangle_liness):
1350 | self._end_betweens(i)
1351 | self.line_no = i
1352 | self.trace.append({"ending_line_number": self.line_no, "i": i,
1353 | "language": language, "id_": id_,
1354 | "kind": 'expanded_tangle', key: tangle_liness})
1355 | ```
1356 |
1357 |
1358 |
1359 |
1360 |
1361 | ### TRACER.DUMP
1362 |
1363 |
1364 |
1365 |
1366 |
1367 | ```python
1368 | def dump(self):
1369 | pr = self.fp.parent
1370 | fn = self.fp.name
1371 | fn2 = fn.translate(str.maketrans('.', '_'))
1372 | # Store the trace in the dir where the input md file is:
1373 | vr = f'tangledown_trace_{fn2}'
1374 | np = pr / (vr + ".py")
1375 | with open(np, "w") as fs:
1376 | print(f'sequential_structure = (', file=fs)
1377 | pprint(self.trace, stream=fs)
1378 | print(')', file=fs)
1379 | ```
1380 |
1381 |
1382 |
1383 |
1384 |
1385 | # TANGLE IT, ALREADY!
1386 |
1387 |
1388 | Ok, you saw [at the top](#how-to-run) that the code in this here Markdown document, README.md, when run as a script, will read in all the lines in ... this here Markdown document, `README.md`. Bootstrapping!
1389 |
1390 |
1391 | But you have to run something first. For that, I tangled the code manually just
1392 | once and provide `tangledown.py` in the repository. The chicken definitely comes
1393 | before the egg.
1394 |
1395 |
1396 | But if you have the chicken (`tangledown.py`), you can import it as a module and execute the following cell, a copy of the one [at the top](#how-to-run). That should overwrite `tangledown.py` with the contents of this notebook or Markdown file. So our little bootstrapping technique will forever update the Tangledown compiler if you change it in this here README.md that you're reading right now!
1397 |
1398 | ```python
1399 | from tangledown import get_lines, accumulate_lines, tangle_all
1400 | tangle_all(*accumulate_lines(*get_lines("README.md")))
1401 | ```
1402 |
1403 | # TODO
1404 |
1405 |
1406 | - IN-PROGRESS: more examples, specifically, a test-generator in Clojure in subdirectory `examples/asr`.
1407 | - IN-PROGRESS: TangleUp
1408 | - NOT-STARTED: Have the Tangledown Kernel, when evaluating tangle-able cells, write them out one at a time. Without this feature, the only way to write out files is to tangle the entire notebook. Possibly do these as cell magics.
1409 | - NOT-STARTED: Research cell magics for `noweb` and `tangle` cells.
1410 | - NOT-STARTED: error handling (big job)
1411 | - NOT-STARTED: type annotations for the kernel
1412 | - DONE: convert relative file paths to absolute
1413 | - DONE: modern Pythonic Type Annotation (PEP 484)
1414 | - DONE: use pathlib to compare tangle file names
1415 | - DONE: somehow get the Tangledown Kernel to tangle everything automatically when it's restarted
1416 | - DONE: Support multiple instances of the Tangledown Kernel. Because it reads files with fixed names in the home directory, it has no way of processing multiple Tangledown notebooks.
1417 | - DONE: investigate [Papermill](https://papermill.readthedocs.io/en/latest/) as a solution
1418 | - DONE: find out whether pickle is a better alternative to json for dumping dictionaries for the kernel
1419 | - DONE: Jupytext kernel for `tangledown` so we can run `noweb` and `block` tags that have `block` tags in them.
1420 |
1421 |
1422 | ## DUDE!
1423 |
1424 |
1425 | Some people write "TODO" in their code so they can find all the spots where they thought they might have trouble but didn't have time to write the error-handling (prophylactic) code at the time. I like to write "DUDE" because it sounds like both "TODO" but is more RUDE (also sounds like "DUDE") and funny. This story is supposed to be amusing.
1426 |
1427 |
1428 | ## KNOWN BUGS
1429 |
1430 |
1431 | I must apologize once again, but this is just a toy at this point! Recall the [DISCLAIMER](#disclaimer). The following are stackranked from highest to lowest priority.
1432 |
1433 |
1434 | 1. FIXED: writing to "tangledown.py" and to "./tangledown.py" clobbers the file rather than appending. Use pathlib to compare filenames rather than string comparison.
1435 | 2. FIXED: tangling to files in the home directory via `~` does not work. We know one dirty way to fix it, but proper practice with pathlib is a better answer.
1436 |
1437 |
1438 | # TANGLEUP DESIGN AND IMPLEMENTATION
1439 |
1440 |
1441 | ## TANGLEUP TENETS
1442 |
1443 |
1444 | 1. Keep source tree and Literate Markdown consistent.
1445 |
1446 |
1447 | ## NON-REAL-TIME
1448 |
1449 |
1450 | We'll start with a non-real-time solution. You'll manually run `tangleup` to put modified source back into the Markdown. Later, we'll do something that can track changes on disk and update the Markdown in real time.
1451 |
1452 |
1453 | When you modify your source tree, `tangleup` puts the modified code back into the Markdown file with reminders to _detangle_ and to _write_. There are two cases:
1454 |
1455 | 1. You modified some source that corresponds to an existing noweb block in the Markdown.
1456 |
1457 | 2. You added some source that doesn't yet correspond to a noweb block in the Markdown.
1458 |
1459 |
1460 | To assist TangleUp, Tangledown records unique names for existing noweb blocks along with the tangled source. Tangledown also records robust locations for existing blocks. _Robust_ means that the boundary locations are flexible: starting and ending line and character positions in a source file are not enough because changing an early one invalidates all later ones.
1461 |
1462 |
1463 | ## NO PRE-EXISTING MARKDOWN
1464 |
1465 |
1466 | We don't need the trace file for this case.
1467 |
1468 |
1469 | Enumerate all the files in a directory tree. Pair each file name with a short, unique name for the nowebs. TODO: ignore files and directories listed in the `.gitignore`.
1470 |
1471 | ```python
1472 | %pip install gitignore-parser
1473 | ```
1474 |
1475 | ### TANGLEUP FILES LIST
1476 |
1477 |
1478 |
1479 |
1480 |
1481 | ```python
1482 |
1483 | def files_list(dir_name: str) -> List[str]:
1484 | dir_path = Path(dir_name)
1485 | files_result = []
1486 | nyms_result = []
1487 | file_count = 0
1488 |
1489 |
1490 |
1491 | find_first_gitignore()
1492 | recurse_a_dir(dir_path)
1493 | assert file_count == len(nyms_collision_check)
1494 | return list(zip(files_result, nyms_result))
1495 | ```
1496 |
1497 |
1498 |
1499 |
1500 |
1501 | #### RECURSE A DIR
1502 |
1503 |
1504 | The only complexity, here, is ignoring `.git` and files in `.gitignore`
1505 |
1506 |
1507 |
1508 |
1509 |
1510 | ```python
1511 | def recurse_a_dir(dir_path: Path) -> None:
1512 | for p in dir_path.glob('*'):
1513 | q = p.absolute()
1514 | qs = str(q)
1515 | try: # don't skip files in dirs above .gitignore
1516 | ok = not in_gitignore(qs)
1517 | except ValueError as e: # one absolute and one relative?
1518 | ok = True
1519 | if p.name == '.git':
1520 | ok = False
1521 | if not ok:
1522 | pprint(f'... IGNORING file or dir {p}')
1523 | if ok and q.is_file():
1524 | nonlocal file_count # Assignment requires 'nonlocal'
1525 | file_count += 1
1526 | nyms_result.append(gsnym(q)) # 'nonlocal' not required
1527 | files_result.append(qs) # because not ass'gt but mutation
1528 | elif ok and p.is_dir:
1529 | recurse_a_dir(p)
1530 | ```
1531 |
1532 |
1533 |
1534 |
1535 |
1536 | #### UNIQUE NAMES
1537 |
1538 |
1539 | Correct for collisions, which will be really rare, so there is a negligible effect on speed.
1540 |
1541 |
1542 |
1543 |
1544 |
1545 | ```python
1546 | nyms_collision_check = set()
1547 |
1548 | def gsnym(p: Path) -> str:
1549 | """Generate a short, unique name for a path."""
1550 | nym = gsnym_candidate(p)
1551 | while nym in nyms_collision_check:
1552 | nym = gsnym_candidate(p)
1553 | nyms_collision_check.add(nym)
1554 | return nym
1555 |
1556 |
1557 | def gsnym_candidate(p: Path) -> str:
1558 | """Generate a candidate short, unique name for a path."""
1559 | return p.stem + '_' + uuid.uuid4().hex[:6].upper()
1560 | ```
1561 |
1562 |
1563 |
1564 |
1565 |
1566 | #### IGNORE FILES IN GITIGNORE
1567 |
1568 |
1569 | Find the first `.gitignore` in a directory tree. Parse it to produce a function that tests whether a file must be ignored by TangleUp.
1570 |
1571 |
1572 |
1573 |
1574 |
1575 | ```python
1576 | in_gitignore = lambda _: False
1577 |
1578 | def find_first_gitignore() -> Path:
1579 | p = dir_path
1580 | for p in dir_path.rglob('*'):
1581 | if p.name == '.gitignore':
1582 | in_gitignore = parse_gitignore(str(p.absolute()))
1583 | break;
1584 | return p
1585 | ```
1586 |
1587 |
1588 |
1589 |
1590 |
1591 | ### TANGLEUP IMPORTS
1592 |
1593 |
1594 |
1595 |
1596 |
1597 | ```python
1598 | from pathlib import Path
1599 | from typing import List
1600 | import uuid
1601 | from gitignore_parser import parse_gitignore
1602 | from pprint import pprint
1603 | ```
1604 |
1605 |
1606 |
1607 |
1608 |
1609 | ### WRITE NOWEB TO LINES
1610 |
1611 |
1612 | Now write the contents of each Python or Clojure file to a noweb block with its ginned-up name and a corresponding tangle block. Parenthetically, this just _screams_ for the Writer monad, but we'll just do it by hand in an obvious, kindergarten way.files_result
1613 |
1614 |
1615 | **WARNING**: The explicit '\n' newlines probably won't work on Windows.
1616 |
1617 |
1618 |
1619 |
1620 |
1621 | ```python
1622 | from typing import Tuple
1623 | from pprint import pprint
1624 |
1625 |
1626 |
1627 |
1628 | def write_noweb_to_lines(lines: List[str],
1629 | file_gsnym_pair: Tuple[str],
1630 | language: str) -> None:
1631 | path = Path(file_gsnym_pair[0])
1632 | wrap_n_blank(lines, [f'## {path.name}\n'])
1633 | wrap_1_raw(lines, f'\n')
1634 | with open(file_gsnym_pair[0]) as f:
1635 | try:
1636 | inlines = f.readlines()
1637 | except UnicodeDecodeError as e:
1638 | pprint(f'... SKIPPING UNDECODABLE FILE {path}')
1639 | return
1640 | pprint(f'DETANGLING file {path}')
1641 | bound = [] ## Really want the monadic bind, here.
1642 | if language == "markdown":
1643 | indent_4(bound, inlines)
1644 | else:
1645 | wrap_triple_backtick(bound, inlines, language)
1646 | wrap_n_blank(lines, bound)
1647 | wrap_1_raw(lines, '\n')
1648 | lines.append(BLANK_LINE)
1649 | ```
1650 |
1651 |
1652 |
1653 |
1654 |
1655 | #### WRAP ONE LINE AS RAW
1656 |
1657 |
1658 |
1659 |
1660 |
1661 | ```python
1662 | BEGIN_RAW = '\n'
1663 | END_RAW = '\n'
1664 | def wrap_1_raw(lines: List[str], s: str) -> None:
1665 | lines.append(BEGIN_RAW)
1666 | lines.append(s)
1667 | lines.append(END_RAW)
1668 | ```
1669 |
1670 |
1671 |
1672 |
1673 |
1674 | #### WRAP SEVERAL LINES IN BLANK LINES
1675 |
1676 |
1677 |
1678 |
1679 |
1680 | ```python
1681 | BLANK_LINE = '\n'
1682 | def wrap_n_blank(lines: List[str], ss: List[str]) -> None:
1683 | lines.append(BLANK_LINE)
1684 | for s in ss:
1685 | lines.append(s)
1686 | lines.append(BLANK_LINE)
1687 | ```
1688 |
1689 |
1690 |
1691 |
1692 |
1693 | #### WRAP LINES IN TRIPLE BACKTICKS
1694 |
1695 |
1696 |
1697 |
1698 |
1699 | ```python
1700 | def wrap_triple_backtick(lines: List[str],
1701 | ss: List[str],
1702 | language: str) -> None:
1703 | lines.append(f'```{language}\n')
1704 | for s in ss:
1705 | lines.append(s)
1706 | lines.append(f'```\n')
1707 | ```
1708 |
1709 |
1710 |
1711 |
1712 |
1713 | #### INDENT ALL LINES FOUR SPACES
1714 |
1715 |
1716 |
1717 |
1718 |
1719 | ```python
1720 | def indent_4(lines: List[str], ss: List[str]):
1721 | for s in ss:
1722 | lines.append(' ' + s)
1723 | ```
1724 |
1725 |
1726 |
1727 |
1728 |
1729 | ### WRITE TANGLE TO LINES
1730 |
1731 |
1732 |
1733 |
1734 |
1735 | ```python
1736 | def write_tangle_to_lines(lines: List[str],
1737 | file_gsnym_pair: Tuple[str],
1738 | language: str) -> List[str]:
1739 | wrap_1_raw(lines, f'\n')
1740 | bound = []
1741 | wrap_triple_backtick(bound,
1742 | [f'\n'],
1743 | language)
1744 | wrap_n_blank(lines, bound)
1745 | wrap_1_raw(lines, f'\n')
1746 | ```
1747 |
1748 |
1749 |
1750 |
1751 |
1752 | ### TANGLEUP OVERWRITE MARKDOWN
1753 |
1754 |
1755 | Test the whole magillah, the up direction. You may have to backpatch some 'language' names when you open the markdown, but 'language' only affects syntax coloring.
1756 |
1757 |
1758 |
1759 |
1760 |
1761 | ```python
1762 |
1763 |
1764 |
1765 | def tangleup_overwrite_markdown(
1766 | output_markdown_filename: str,
1767 | input_directory: str,
1768 | title: str = "Untitled") -> None:
1769 | pprint(f'WRITING LITERATE MARKDOWN to file {output_markdown_filename}')
1770 | file_gsnym_pairs = files_list(input_directory)
1771 | lines: List[str] = [f'# {title}\n\n']
1772 | for pair in file_gsnym_pairs:
1773 | p = Path(pair[0])
1774 | if p.suffix == '.clj':
1775 | language = f'clojure id={uuid.uuid4()}'
1776 | elif p.suffix == '.py':
1777 | language = f'python id={uuid.uuid4()}'
1778 | elif p.suffix == '.md':
1779 | language = 'markdown'
1780 | else:
1781 | language = ''
1782 | write_noweb_to_lines(lines, pair, language)
1783 | write_tangle_to_lines(lines, pair, language)
1784 | import json
1785 |
1786 | with open(output_markdown_filename, "w") as f:
1787 | for line in lines:
1788 | f.write(line)
1789 | pass
1790 | ```
1791 |
1792 |
1793 |
1794 |
1795 |
1796 | ## YES PRE-EXISTING MARKDOWN
1797 |
1798 |
1799 | ### NO CHANGES ON DISK
1800 |
1801 |
1802 | If there are no changes to the tangled files on disk, then we must merely reassemble the nowebs, tangles, and block tags from the files on disk. On its first pass, Tangledown recorded the structure of nowebs and tangles and of the Markdown that surrounds them. When detangling a file:
1803 |
1804 | 1. look for every tangle that mentions that file
1805 |
1806 |
1807 | ### YES CHANGES ON DISK
1808 |
1809 |
1810 | #### CHANGES TO EXISTING CONTENTS
1811 |
1812 |
1813 | #### NEW CONTENTS
1814 |
1815 |
1816 | #### DELETED CONTENTS
1817 |
1818 |
1819 | ### FIRST SHOT
1820 |
1821 |
1822 | **PRO TIP**: For the Tangldown Kernel, if your little scripts contain noweb tags, surround them with tangle to `/dev/null`, reload the kernel spec, restart the kernel, then you can run them in the notebook.
1823 |
1824 |
1825 |
1826 |
1827 | ```python
1828 | from pprint import pprint
1829 | from tangledown_trace_foobar_md import sequential_structure as cells
1830 | pprint(cells)
1831 | fn = "tanglup_foobar.md"
1832 | line_no = 0
1833 | for cell in cells:
1834 | if cell["kind"] == "between":
1835 |
1836 | elif cell["kind"] == "noweb":
1837 |
1838 | elif cell["kind"] == "tangle":
1839 |
1840 | else:
1841 | assert False, f"unknown kind: {cell['kind']}"
1842 | ```
1843 |
1844 |
1845 |
1846 |
1847 |
1848 |
1849 |
1850 | ```python
1851 | pass
1852 | ```
1853 |
1854 |
1855 |
1856 |
1857 |
1858 |
1859 |
1860 |
1861 |
1862 | ```python
1863 | pass
1864 | ```
1865 |
1866 |
1867 |
1868 |
1869 |
1870 |
1871 |
1872 |
1873 |
1874 | ```python
1875 | pass
1876 | ```
1877 |
1878 |
1879 |
1880 |
1881 |
1882 | ## UNIT TESTS
1883 |
1884 |
1885 | ### NO PRE-EXISTING MARKDOWN FILE
1886 |
1887 |
1888 | Run these at the console for now.
1889 |
1890 |
1891 |
1892 |
1893 |
1894 | ```python
1895 |
1896 | if __name__ == "__main__":
1897 | tangleup_overwrite_markdown(
1898 | "asr_tangleup_test.md",
1899 | "./examples",
1900 | title="This is a First Test of the Emergency Tangleup System")
1901 | ```
1902 |
1903 |
1904 |
1905 |
1906 |
1907 |
1908 |
1909 |
1910 |
1911 | ```python
1912 |
1913 | if __name__ == "__main__":
1914 | tangleup_overwrite_markdown(
1915 | "tangleup-test.md",
1916 | ".",
1917 | title="This is a Second Test of the Emergency Tangleup System")
1918 | ```
1919 |
1920 |
1921 |
1922 |
1923 |
1924 | # APPENDIX: Developer Notes
1925 |
1926 |
1927 | If you change the code in this README.md and you want to test it by running the cell in Section [Tangle It, Already!](#tangle-already), you usually must restart whatever Jupyter kernel you're running because Jupytext caches code. If things continue to not make sense, try restarting the notebook server. It rarely but occasionally produces incorrect answers for more obscure reasons.
1928 |
1929 |
1930 | # APPENDIX: Tangledown Kernel
1931 |
1932 |
1933 | The Tangledown kernel is ***OPTIONAL***, but nice. Everything I talked about so far works fine without it, but the Tangledown Kernel lets you evaluate Jupytext notebook cells that have `block` tags in them. For example, you can run Tangledown on Tangledown itself in this notebook just by evaluating the cell that contains all of Tangledown, including the source for the kernel, [here](#tangle-listing-tangle-all).
1934 |
1935 |
1936 | The Tangledown Compiler writes the full path of the current Markdown file corresponding to the current notebook to fixed place in the home directory, and the Tangledown Kernel reads gets all the nowebs from there.
1937 |
1938 |
1939 | > If you run more than one instance of the Tangledown Kernel at one time on your machine, you must ***RETANGLE THE FILE AND RESTART THE TANGLEDOWN KERNEL WHEN YOU SWITCH NOTEBOOKS*** because the name of the current file is a fixed singleton. The Tangledown Kernel has no way to dynamically know what file you're working with. Sorry about that!
1940 |
1941 |
1942 | ## Installing the Tangledown Kernel
1943 |
1944 |
1945 | After you tangle the code out of this here README.md at least once, you will have two new files
1946 | - `./tangledown_kernel/tangledown_kernel.py`
1947 | - `./tangledown_kernel/kernel.json`
1948 |
1949 |
1950 | You must inform Jupyter about your new kernel. The following works for me on the Mac. It might be different on your machine:
1951 |
1952 | ```bash
1953 | jupyter kernelspec install --user tangledown_kernel
1954 | ```
1955 |
1956 | ## Running the Tangledown Kernel
1957 |
1958 |
1959 | You must put the source for the Tangledown Kernel somewhere Python can find it before you start Jupyter Lab. One way is to modify the `PYTHONPATH` environment variable. The following works for me on the Mac:
1960 |
1961 |
1962 | ```
1963 | PYTHONPATH=".:/Users/brian/Library/Jupyter/kernels/tangledown_kernel" jupyter lab
1964 | ```
1965 |
1966 |
1967 | Once the kernel is installed, there are multiple ways to run it in Jupyter Lab. When you first open a notebook, you get a menu. The default is the regular Python 3 kernel, and it works fine, but you won't be able to run cells that have `block` tags in them. If you choose the Tangledown Kernel, you can run such cells.
1968 |
1969 |
1970 | If you modify the kernel:
1971 |
1972 | 1. re-tangle the kernel source, say by running the cell in [this section](#how-to-run)
1973 | 2. re-install the kernel by running the little bash script above
1974 | 3. restart the kernel inside the notebook
1975 |
1976 |
1977 | Most of the time, you don't have to restart Jupyter Lab itself, but sometimes after a really bad bug, you might have to.
1978 |
1979 |
1980 | ## Source for the Tangledown Kernel
1981 |
1982 |
1983 | Adapted from [these official docs](https://jupyter-client.readthedocs.io/en/latest/wrapperkernels.html).
1984 |
1985 |
1986 | The kernel calls [`expand_tangles`](#expand-tangles) after reformatting the lines a little. We learned about the reformatting by experiment. We explain `expand_tangles` [here](#expand-tangles) in the [section about Tangledown itself](#tangle-listing-tangle-all). The rest of this is boilerplate from the [official kernel documentation](https://jupyter-client.readthedocs.io/en/stable/wrapperkernels.html). There is no point, by the way, in running the cell below in any kernel. It's meant for the Jupyterlab startup engine, only. You just need to tangle it out and install it, as above.
1987 |
1988 |
1989 | > **NOTE**: You will get errors if you run this cell in the notebook.
1990 |
1991 |
1992 | TODO: plumb a Tracer through here?
1993 |
1994 |
1995 |
1996 |
1997 |
1998 | ```python
1999 |
2000 | class TangledownKernel(IPythonKernel):
2001 |
2002 | async def do_execute(self, code, silent, store_history=True, user_expressions=None,
2003 | allow_stdin=False):
2004 | if not silent:
2005 | cleaned_lines = [line + '\n' for line in code.split('\n')]
2006 | # HERE'S THE BEEF!
2007 | expanded_code = expand_tangles(None, [cleaned_lines], self.nowebs)
2008 | reply_content = await super().do_execute(
2009 | expanded_code, silent, store_history, user_expressions)
2010 | stream_content = {
2011 | 'name': 'stdout',
2012 | 'text': reply_content,
2013 | }
2014 | self.send_response(self.iopub_socket, 'stream', stream_content)
2015 | return {'status': 'ok',
2016 | # The base class increments the execution count
2017 | 'execution_count': self.execution_count,
2018 | 'payload': [],
2019 | 'user_expressions': {},
2020 | }
2021 | if __name__ == '__main__':
2022 | from ipykernel.kernelapp import IPKernelApp
2023 | IPKernelApp.launch_instance(kernel_class=TangledownKernel)
2024 | ```
2025 |
2026 |
2027 |
2028 |
2029 |
2030 |
2031 |
2032 |
2033 |
2034 | ```python
2035 | from ipykernel.ipkernel import IPythonKernel
2036 | from pprint import pprint
2037 | import sys # for version_info
2038 | from pathlib import Path
2039 | from tangledown import \
2040 | accumulate_lines, \
2041 | get_lines, \
2042 | expand_tangles
2043 | ```
2044 |
2045 |
2046 |
2047 |
2048 |
2049 | ## KERNEL INSTANCE VARIABLES
2050 |
2051 |
2052 | These get indented on expansion because the `block` tag is indented. You could do it the other way: indent the code here and DON'T indent the block tag, but that would be ugly, wouldn't it?
2053 |
2054 |
2055 | Notice this kernel runs Tangledown on the full file path that's stored in `current_victim_file.txt`. That file path got [written to that special place](#save-afile-path-for-kernel) when you tangled the file the first time. This may explain why you must tangle the file once and then restart the kernel whenever you switch notebooks that are running the Tangledown Kernel.
2056 |
2057 |
2058 |
2059 |
2060 |
2061 | ```python
2062 | current_victim_filepath = ""
2063 | with open(Path.home() / '.tangledown/current_victim_file.txt') as v:
2064 | fp = v.read()
2065 | tracer_, nowebs, tangles_ = accumulate_lines(*get_lines(fp))
2066 | implementation = 'Tangledown'
2067 | implementation_version = '1.0'
2068 | language = 'no-op'
2069 | language_version = '0.1'
2070 | language_info = { # for syntax coloring
2071 | "name": "python",
2072 | "version": sys.version.split()[0],
2073 | "mimetype": "text/x-python",
2074 | "codemirror_mode": {"name": "ipython", "version": sys.version_info[0]},
2075 | "pygments_lexer": "ipython%d" % 3,
2076 | "nbconvert_exporter": "python",
2077 | "file_extension": ".py",
2078 | }
2079 | banner = "Tangledown kernel - expanding 'block' tags"
2080 | ```
2081 |
2082 |
2083 |
2084 |
2085 |
2086 | ## Kernel JSON Installation Helper
2087 |
2088 |
2089 |
2090 |
2091 |
2092 | ```json
2093 | {"argv":["python","-m","tangledown_kernel", "-f", "{connection_file}"],
2094 | "display_name":"Tangledown"
2095 | }
2096 | ```
2097 |
2098 |
2099 |
2100 |
2101 |
2102 | # APPENDIX: Experimental Playground
2103 |
2104 | ```python
2105 |
2106 | ```
2107 |
--------------------------------------------------------------------------------