├── .idea └── .gitignore ├── hello_world.py ├── hello_world_tangler.py ├── tangledown_kernel ├── kernel.json └── tangledown_kernel.py ├── tangleup-roundtrip-test.sh ├── hello_world.md ├── check_internal_anchor_links.py ├── .gitignore ├── bootstrap_tangledown.py ├── tangledown.py └── README.md /.idea/.gitignore: -------------------------------------------------------------------------------- 1 | # Default ignored files 2 | /shelf/ 3 | /workspace.xml 4 | -------------------------------------------------------------------------------- /hello_world.py: -------------------------------------------------------------------------------- 1 | def hello_world(): 2 | print("Hello, world, from Tangledown!") 3 | if __name__ == "__main__": 4 | hello_world() 5 | -------------------------------------------------------------------------------- /hello_world_tangler.py: -------------------------------------------------------------------------------- 1 | from tangledown import get_lines, accumulate_lines, tangle_all 2 | tangle_all(*accumulate_lines(*get_lines("hello_world.md"))) 3 | -------------------------------------------------------------------------------- /tangledown_kernel/kernel.json: -------------------------------------------------------------------------------- 1 | {"argv":["python","-m","tangledown_kernel", "-f", "{connection_file}"], 2 | "display_name":"Tangledown" 3 | } 4 | 5 | 6 | -------------------------------------------------------------------------------- /tangleup-roundtrip-test.sh: -------------------------------------------------------------------------------- 1 | python tangleup_experiment.py 2 | python tangledown.py asr_tangleup_test.md 3 | pushd examples/asr/asr 4 | lein test 5 | lein run 6 | popd 7 | -------------------------------------------------------------------------------- /hello_world.md: -------------------------------------------------------------------------------- 1 | # TANGLE HELLO_WORLD 2 | 3 | 4 | In JupyterLab, pair this markdown file with a Jupytext notebook, open the notebook, then 5 | 6 | 7 | ## RUN ALL CELLS IN THIS NOTEBOOK 8 | 9 | 10 | A noweb tag named "hello world def" defining a function named "hello_world." 11 | 12 | 13 | 14 | 15 | def hello_world(): 16 | print("Hello, world, from Tangledown!") 17 | 18 | 19 | 20 | 21 | A tangle tag that refers to the noweb tag named "hello world def" and writes a python file, "hello_world.py." That python file can be called as a script from the python command line. It can also be imported as a module, and the importing code (example below) can call the function, "hello_world," which is defined in "hello world def." 22 | 23 | 24 | 25 | 26 | 27 | if __name__ == "__main__": 28 | hello_world() 29 | 30 | 31 | 32 | 33 | Tangle the code out of this here Markdown file using tangledown as a module. 34 | 35 | ```python 36 | from tangledown import get_lines, accumulate_lines, tangle_all 37 | tangle_all(*accumulate_lines(*get_lines("hello_world.md"))) 38 | ``` 39 | 40 | Call the "hello_world" function imported from the "hello_world" module. 41 | 42 | ```python 43 | import hello_world 44 | hello_world.hello_world() 45 | ``` 46 | 47 | Isn't that cool? 48 | 49 | 50 | Well, hell, let's bootstrap tangledown itself from "README.md." This is how you bootstrap a compiler. 51 | 52 | ```python 53 | tangle_all(*accumulate_lines(*get_lines("README.md"))) 54 | ``` 55 | 56 | Do it again to make sure it all worked! 57 | 58 | ```python 59 | tangle_all(*accumulate_lines(*get_lines("README.md"))) 60 | ``` 61 | 62 | Hot Dayyum! Here is a deeper test that everything is ok: 63 | 64 | ```python 65 | from tangledown import test_re_matching 66 | test_re_matching(*get_lines("README.md")) 67 | ``` 68 | 69 | ```python 70 | 71 | ``` 72 | -------------------------------------------------------------------------------- /tangledown_kernel/tangledown_kernel.py: -------------------------------------------------------------------------------- 1 | from ipykernel.ipkernel import IPythonKernel 2 | from pprint import pprint 3 | import sys # for version_info 4 | from pathlib import Path 5 | from tangledown import \ 6 | accumulate_lines, \ 7 | get_lines, \ 8 | expand_tangles 9 | 10 | 11 | class TangledownKernel(IPythonKernel): 12 | current_victim_filepath = "" 13 | with open(Path.home() / '.tangledown/current_victim_file.txt') as v: 14 | fp = v.read() 15 | tracer_, nowebs, tangles_ = accumulate_lines(*get_lines(fp)) 16 | implementation = 'Tangledown' 17 | implementation_version = '1.0' 18 | language = 'no-op' 19 | language_version = '0.1' 20 | language_info = { # for syntax coloring 21 | "name": "python", 22 | "version": sys.version.split()[0], 23 | "mimetype": "text/x-python", 24 | "codemirror_mode": {"name": "ipython", "version": sys.version_info[0]}, 25 | "pygments_lexer": "ipython%d" % 3, 26 | "nbconvert_exporter": "python", 27 | "file_extension": ".py", 28 | } 29 | banner = "Tangledown kernel - expanding 'block' tags" 30 | 31 | 32 | async def do_execute(self, code, silent, store_history=True, user_expressions=None, 33 | allow_stdin=False): 34 | if not silent: 35 | cleaned_lines = [line + '\n' for line in code.split('\n')] 36 | # HERE'S THE BEEF! 37 | expanded_code = expand_tangles(None, [cleaned_lines], self.nowebs) 38 | reply_content = await super().do_execute( 39 | expanded_code, silent, store_history, user_expressions) 40 | stream_content = { 41 | 'name': 'stdout', 42 | 'text': reply_content, 43 | } 44 | self.send_response(self.iopub_socket, 'stream', stream_content) 45 | return {'status': 'ok', 46 | # The base class increments the execution count 47 | 'execution_count': self.execution_count, 48 | 'payload': [], 49 | 'user_expressions': {}, 50 | } 51 | if __name__ == '__main__': 52 | from ipykernel.kernelapp import IPKernelApp 53 | IPKernelApp.launch_instance(kernel_class=TangledownKernel) 54 | 55 | 56 | -------------------------------------------------------------------------------- /check_internal_anchor_links.py: -------------------------------------------------------------------------------- 1 | from typing import List, Dict, Set 2 | import re, sys 3 | from pprint import pprint 4 | 5 | filename = "literate-copperhead-003.md" 6 | 7 | anchor_re = r'.*' 8 | ref_re = r'\[(.*?)\]\(#(.*?)\)' 9 | 10 | 11 | def find_anchors(filename: str) -> Set[str]: 12 | anchor_list = [] 13 | with open(filename) as f: 14 | lines = f.readlines() 15 | index: int = 0 16 | for line_no, line in enumerate(lines): 17 | mm: List = re.findall(anchor_re, line) 18 | if mm: 19 | index += len(mm) 20 | for m in mm: 21 | anchor_list.append(m) 22 | anchor_set = set(anchor_list) 23 | listlen = len(anchor_list) 24 | dupes_exist = (not (listlen == len(anchor_set))) 25 | print(f'THERE ARE ' 26 | f'{"NO" if not dupes_exist else ""}' 27 | f' DUPLICATES AMONGST {listlen} ANCHORS') 28 | if dupes_exist: 29 | dupes_list = [] 30 | dupes = set() 31 | for a in anchor_list: 32 | if a in dupes: 33 | dupes_list.append(a) 34 | else: 35 | dupes.add(a) 36 | print(f'THE DUPES ARE:') 37 | pprint(dupes_list) 38 | return anchor_set 39 | 40 | 41 | def match_refs(filename: str, anchors: Set): 42 | with open(filename) as f: 43 | lines = f.readlines() 44 | index = 0; matchcount = 0; failcount = 0 45 | for line_no, line in enumerate(lines): 46 | mm: list = re.findall(ref_re, line) 47 | if mm: 48 | index += len(mm) 49 | for m in mm: 50 | ref = m[1] 51 | if ref in anchors: 52 | matchcount += 1 53 | else: 54 | failcount += 1 55 | print(f'FAILED TO FIND MATCHING ANCHOR: ' 56 | f'index: {index}, line_no: {line_no + 1}, ref: {ref} ' 57 | f'matchcount: {matchcount}, failcount: {failcount}.') 58 | if index == matchcount: 59 | print(f'SUCCESS IN FINDING ALL MATCHING ANCHORS: ') 60 | print(f'number of refs: {index}, matchcount: {matchcount}.') 61 | else: 62 | print(f'FAILED TO FIND {failcount} ANCHORS WITH ' 63 | f'{index} TRIALS AND {matchcount} SUCCESSES.') 64 | 65 | 66 | if __name__ == "__main__": 67 | filename = sys.argv[1] 68 | match_refs(filename, find_anchors(filename)) 69 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # This 2 | 3 | tangledown-venv 4 | 5 | # Mac 6 | 7 | .DS_Store 8 | 9 | # Notebooks 10 | 11 | *.ipynb 12 | 13 | # Clojure 14 | 15 | target 16 | classes 17 | checkouts 18 | profiles.clj 19 | pom.xml 20 | pom.xml.asc 21 | pom.properties 22 | *.jar 23 | *.class 24 | .lein-* 25 | .nrepl-port 26 | .prepl-port 27 | .hgignore 28 | .hg/ 29 | 30 | # Python 31 | 32 | # Byte-compiled / optimized / DLL files 33 | __pycache__/ 34 | *.py[cod] 35 | *$py.class 36 | 37 | # C extensions 38 | *.so 39 | 40 | # Distribution / packaging 41 | .Python 42 | build/ 43 | develop-eggs/ 44 | dist/ 45 | downloads/ 46 | eggs/ 47 | .eggs/ 48 | lib/ 49 | lib64/ 50 | parts/ 51 | sdist/ 52 | var/ 53 | wheels/ 54 | share/python-wheels/ 55 | *.egg-info/ 56 | .installed.cfg 57 | *.egg 58 | MANIFEST 59 | 60 | # PyInstaller 61 | # Usually these files are written by a python script from a template 62 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 63 | *.manifest 64 | *.spec 65 | 66 | # Installer logs 67 | pip-log.txt 68 | pip-delete-this-directory.txt 69 | 70 | # Unit test / coverage reports 71 | htmlcov/ 72 | .tox/ 73 | .nox/ 74 | .coverage 75 | .coverage.* 76 | .cache 77 | nosetests.xml 78 | coverage.xml 79 | *.cover 80 | *.py,cover 81 | .hypothesis/ 82 | .pytest_cache/ 83 | cover/ 84 | 85 | # Translations 86 | *.mo 87 | *.pot 88 | 89 | # Django stuff: 90 | *.log 91 | local_settings.py 92 | db.sqlite3 93 | db.sqlite3-journal 94 | 95 | # Flask stuff: 96 | instance/ 97 | .webassets-cache 98 | 99 | # Scrapy stuff: 100 | .scrapy 101 | 102 | # Sphinx documentation 103 | docs/_build/ 104 | 105 | # PyBuilder 106 | .pybuilder/ 107 | target/ 108 | 109 | # Jupyter Notebook 110 | .ipynb_checkpoints 111 | 112 | # IPython 113 | profile_default/ 114 | ipython_config.py 115 | 116 | # pyenv 117 | # For a library or package, you might want to ignore these files since the code is 118 | # intended to run in multiple environments; otherwise, check them in: 119 | # .python-version 120 | 121 | # pipenv 122 | # According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. 123 | # However, in case of collaboration, if having platform-specific dependencies or dependencies 124 | # having no cross-platform support, pipenv may install dependencies that don't work, or not 125 | # install all needed dependencies. 126 | Pipfile.lock 127 | 128 | # poetry 129 | # Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. 130 | # This is especially recommended for binary packages to ensure reproducibility, and is more 131 | # commonly ignored for libraries. 132 | # https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control 133 | poetry.lock 134 | 135 | # pdm 136 | # Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. 137 | #pdm.lock 138 | # pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it 139 | # in version control. 140 | # https://pdm.fming.dev/#use-with-ide 141 | .pdm.toml 142 | 143 | # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm 144 | __pypackages__/ 145 | 146 | # Celery stuff 147 | celerybeat-schedule 148 | celerybeat.pid 149 | 150 | # SageMath parsed files 151 | *.sage.py 152 | 153 | # Environments 154 | .env 155 | .venv 156 | env/ 157 | venv/ 158 | ENV/ 159 | env.bak/ 160 | venv.bak/ 161 | 162 | # Spyder project settings 163 | .spyderproject 164 | .spyproject 165 | 166 | # Rope project settings 167 | .ropeproject 168 | 169 | # mkdocs documentation 170 | /site 171 | 172 | # mypy 173 | .mypy_cache/ 174 | .dmypy.json 175 | dmypy.json 176 | 177 | # Pyre type checker 178 | .pyre/ 179 | 180 | # pytype static type analyzer 181 | .pytype/ 182 | 183 | # Cython debug symbols 184 | cython_debug/ 185 | 186 | # PyCharm 187 | # JetBrains specific template is maintained in a separate JetBrains.gitignore that can 188 | # be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore 189 | # and can be added to the global gitignore or merged into this file. For a more nuclear 190 | # option (not recommended) you can uncomment the following to ignore the entire idea folder. 191 | .idea/ -------------------------------------------------------------------------------- /bootstrap_tangledown.py: -------------------------------------------------------------------------------- 1 | from typing import List, Dict, Tuple, Match 2 | 3 | NowebName = str 4 | FileName = str 5 | TangleFileName = FileName 6 | LineNumber = int 7 | Line = str 8 | Lines = List[Line] 9 | Liness = List[Lines] 10 | LinesTuple = Tuple[LineNumber, Lines] 11 | 12 | Nowebs = Dict[NowebName, Lines] 13 | Tangles = Dict[TangleFileName, Liness] 14 | 15 | 16 | import re 17 | import sys 18 | from pathlib import Path 19 | 20 | def get_aFile() -> str: 21 | """Get a file name from the command-line arguments 22 | or 'README.md' as a default.""" 23 | print({f'len(sys.argv)': len(sys.argv), f'sys.argv': sys.argv}) 24 | 25 | 26 | aFile = 'README.md' # default 27 | if len(sys.argv) > 1: 28 | file_names = [p for p in sys.argv 29 | if (p[0] != '-') # option 30 | and (p[-3:] != '.py') 31 | and (p[-5:] != '.json')] 32 | if file_names: 33 | aFile = sys.argv[1] 34 | return aFile 35 | 36 | raw_line_re: re = re.compile(r'') 37 | def get_lines(fn: FileName) -> Lines: 38 | """Get lines from a file named fn. Replace 39 | 'raw' fenceposts with blank lines. Write full path to 40 | a secret place for the Tangledown kernel to pick it up. 41 | Return tuple of file path (for TangleUp's Tracer) and 42 | lines.""" 43 | def save_aFile_path_for_kernel(fn: FileName) -> FileName: 44 | xpath: Path = Path.cwd() / Path(fn).name 45 | victim_file_name = str(xpath.absolute()) 46 | safepath: Path = Path.home() / '.tangledown/current_victim_file.txt' 47 | Path(safepath).parents[0].mkdir(parents=True, exist_ok=True) 48 | print(f"SAVING {victim_file_name} in secret place {str(safepath)}") 49 | with open(safepath, "w") as t: 50 | t.write(victim_file_name) 51 | return xpath 52 | 53 | 54 | xpath = save_aFile_path_for_kernel(fn) 55 | with open(fn) as f: 56 | in_lines: Lines = f.readlines () 57 | out_lines: Lines = [] 58 | for in_line in in_lines: 59 | out_lines.append( 60 | in_line if not raw_line_re.match(in_line) else "\n") 61 | return xpath, out_lines 62 | 63 | 64 | noweb_start_re = re.compile (r'^$') 65 | noweb_end_re = re.compile (r'^$') 66 | 67 | tangle_start_re = re.compile (r'^$') 68 | tangle_end_re = re.compile (r'^$') 69 | 70 | 71 | block_start_re = re.compile (r'^(\s*)') 72 | block_end_re = re.compile (r'^(\s)*') 73 | 74 | 75 | 76 | def test_re_matching(fp: Path, lines: Lines) -> None: 77 | for line in lines: 78 | noweb_start_match = noweb_start_re.match (line) 79 | tangle_start_match = tangle_start_re.match (line) 80 | block_start_match = block_start_re.match (line) 81 | 82 | noweb_end_match = noweb_end_re.match (line) 83 | tangle_end_match = tangle_end_re.match (line) 84 | block_end_match = block_end_re.match (line) 85 | 86 | if (noweb_start_match): 87 | print ('NOWEB: ', noweb_start_match.group (0)) 88 | print ('name of the block: ', noweb_start_match.group (1)) 89 | elif (noweb_end_match): 90 | print ('NOWEB END: ', noweb_end_match.group (0)) 91 | elif (tangle_start_match): 92 | print ('TANGLE: ', tangle_start_match.group (0)) 93 | print ('name of the file: ', tangle_start_match.group (1)) 94 | elif (tangle_end_match): 95 | print ('TANGLE END: ', tangle_end_match.group (0)) 96 | elif (block_start_match): 97 | print ('BLOCK: ', block_start_match.group (0)) 98 | print ('name of the block: ', block_start_match.group (1)) 99 | if (block_end_match): 100 | print ('BLOCK END SAME LINE: ', block_end_match.group (0)) 101 | else: 102 | print ('BLOCK NO END') 103 | elif (block_end_match): 104 | print ('BLOCK END ANOTHER LINE: ', block_end_match.group (0)) 105 | else: 106 | pass 107 | 108 | 109 | from dataclasses import dataclass, field 110 | from typing import Union ## TODO 111 | @dataclass 112 | class Tracer: 113 | trace: List[Dict] = field(default_factory=list) 114 | line_no = 0 115 | current_betweens: Lines = field(default_factory=list) 116 | fp: Path = None 117 | # First Pass 118 | def add_markdown(self, i, between: Line): 119 | self.line_no += 1 120 | self.current_betweens.append((self.line_no, between)) 121 | 122 | 123 | def _end_betweens(self, i): 124 | if self.current_betweens: 125 | self.trace.append({"ending_line_number": self.line_no, "i": i, 126 | "language": "markdown", "kind": 'between', 127 | "text": self.current_betweens}) 128 | self.current_betweens = [] 129 | 130 | 131 | def add_noweb(self, i, language, id_, key, noweb_lines): 132 | self._end_betweens(i) 133 | self.line_no = i 134 | self.trace.append({"ending_line_number": self.line_no, "i": i, 135 | "language": language, "id_": id_, 136 | "kind": 'noweb', key: noweb_lines}) 137 | 138 | 139 | def add_tangle(self, i, language, id_, key, tangle_liness): 140 | self._end_betweens(i) 141 | self.line_no = i 142 | self.trace.append({"ending_line_number": self.line_no, "i": i, 143 | "language": language, "id_": id_, 144 | "kind": 'tangle', key: tangle_liness}) 145 | 146 | 147 | def dump(self): 148 | pr = self.fp.parent 149 | fn = self.fp.name 150 | fn2 = fn.translate(str.maketrans('.', '_')) 151 | # Store the trace in the dir where the input md file is: 152 | vr = f'tangledown_trace_{fn2}' 153 | np = pr / (vr + ".py") 154 | with open(np, "w") as fs: 155 | print(f'{vr} = (', file=fs) 156 | pprint(self.trace, stream=fs) 157 | print(')', file=fs) 158 | 159 | 160 | # Second Pass 161 | def add_expandedn_noweb(self, i, language, id_, key, noweb_lines): 162 | self._end_betweens(i) 163 | self.line_no = i 164 | self.trace.append({"ending_line_number": self.line_no, "i": i, 165 | "language": language, "id_": id_, 166 | "kind": 'expanded_noweb', key: noweb_lines}) 167 | 168 | 169 | def add_expanded_tangle(self, i, language, id_, key, tangle_liness): 170 | self._end_betweens(i) 171 | self.line_no = i 172 | self.trace.append({"ending_line_number": self.line_no, "i": i, 173 | "language": language, "id_": id_, 174 | "kind": 'expanded_tangle', key: tangle_liness}) 175 | 176 | 177 | 178 | 179 | triple_backtick_re = re.compile (r'^`[`]`((\w+)?\s*(id=([0-9a-fA-F-]+))?)') 180 | blank_line_re = re.compile (r'^\s*$') 181 | 182 | def first_non_blank_line_is_triple_backtick ( 183 | i: LineNumber, lines: Lines) -> Match[Line]: 184 | while (blank_line_re.match (lines[i])): 185 | i = i + 1 186 | yes = triple_backtick_re.match (lines[i]) 187 | language = "python" # default 188 | id_ = None # default 189 | if yes: 190 | language = yes.groups()[1] or language 191 | id_ = yes.groups()[3] ## can be 'None' 192 | return i, yes, language, id_ 193 | 194 | 195 | def accumulate_contents ( 196 | lines: Lines, i: LineNumber, end_re: re) -> LinesTuple: 197 | r"""Harvest contents of a noweb or tangle tag. The start 198 | taglet was consumed by caller. Consume the end taglet.""" 199 | i, yes, language, id_ = first_non_blank_line_is_triple_backtick(i, lines) 200 | snip = 0 if yes else 4 201 | contents_lines: Lines = [] 202 | for j in range (i, len(lines)): 203 | if (end_re.match(lines[j])): 204 | return j + 1, language, id_, contents_lines # the only return 205 | if not triple_backtick_re.match (lines[j]): 206 | contents_lines.append (lines[j][snip:]) 207 | 208 | 209 | def anchor_is_tilde(path_str: str) -> bool: 210 | result = (path_str[0:2] == "~/") and (Path(path_str).anchor == '') 211 | return result 212 | 213 | def normalize_file_path(tangle_file_attribute: str) -> Path: 214 | result: Path = Path(tangle_file_attribute) 215 | if (anchor_is_tilde(tangle_file_attribute)): 216 | result = (Path.home() / tangle_file_attribute[2:]) 217 | return result.absolute() 218 | 219 | 220 | from pprint import pprint 221 | def accumulate_lines(fp: Path, lines: Lines) -> Tuple[Tracer, Nowebs, Tangles]: 222 | tracer = Tracer() 223 | tracer.fp = fp 224 | nowebs: Nowebs = {} 225 | tangles: Tangles = {} 226 | i = 0 227 | while i < len(lines): 228 | noweb_start_match = noweb_start_re.match (lines[i]) 229 | tangle_start_match = tangle_start_re.match (lines[i]) 230 | if (noweb_start_match): 231 | key: NowebName = noweb_start_match.group(1) 232 | (i, language, id_, nowebs[key]) = \ 233 | accumulate_contents(lines, i + 1, noweb_end_re) 234 | tracer.add_noweb(i, language, id_, key, nowebs[key]) 235 | elif (tangle_start_match): 236 | key: TangleFileName = \ 237 | str(normalize_file_path(tangle_start_match.group(1))) 238 | if not (key in tangles): 239 | tangles[key]: Liness = [] 240 | (i, language, id_, things) = accumulate_contents(lines, i + 1, tangle_end_re) 241 | tangles[key] += [things] 242 | tracer.add_tangle(i, language, id_, key, tangles[key]) 243 | else: 244 | tracer.add_markdown(i, lines[i]) 245 | i += 1 246 | return tracer, nowebs, tangles 247 | 248 | 249 | def there_is_a_block_tag (lines: Lines) -> bool: 250 | for line in lines: 251 | block_start_match = block_start_re.match (line) 252 | if (block_start_match): 253 | return True 254 | return False 255 | 256 | 257 | def eat_block_tag (i: LineNumber, lines: Lines) -> LineNumber: 258 | for j in range (i, len(lines)): 259 | end_match = block_end_re.match (lines[j]) 260 | # DUDE! Check leading whitespace against block_start_re 261 | if (end_match): 262 | return j + 1 263 | else: # DUDE! 264 | pass 265 | 266 | 267 | def expand_blocks (nowebs: Nowebs, lines: Lines, 268 | language: str = "python") -> Lines: 269 | out_lines = [] 270 | block_key: NowebName = "" 271 | for i in range (len (lines)): 272 | block_start_match = block_start_re.match (lines[i]) 273 | if (block_start_match): 274 | leading_whitespace: str = block_start_match.group (1) 275 | block_key: NowebName = block_start_match.group (2) 276 | block_lines: Lines = nowebs [block_key] # DUDE! 277 | i: LineNumber = eat_block_tag (i, lines) 278 | for block_line in block_lines: 279 | out_lines.append (leading_whitespace + block_line) 280 | else: 281 | out_lines.append (lines[i]) 282 | return out_lines 283 | 284 | 285 | def expand_tangles(liness: Liness, nowebs: Nowebs) -> str: 286 | contents: Lines = [] 287 | for lines in liness: 288 | while there_is_a_block_tag (lines): 289 | lines = expand_blocks (nowebs, lines) 290 | contents += lines 291 | return ''.join(contents) 292 | 293 | 294 | 295 | def tangle_all(tracer: Tracer, nowebs: Nowebs, tangles: Tangles) -> None: 296 | for filename, liness in tangles.items (): 297 | Path(filename).parents[0].mkdir(parents=True, exist_ok=True) 298 | contents: str = expand_tangles(liness, nowebs) 299 | with open (filename, 'w') as outfile: 300 | print(f"WRITING FILE: {filename}") 301 | outfile.write (contents) 302 | tracer.dump() 303 | 304 | if __name__ == "__main__": 305 | fn, lines = get_lines(get_aFile()) 306 | # test_re_matching(lines) 307 | tracer, nowebs, tangles = accumulate_lines(fn, lines) 308 | tangle_all(tracer, nowebs, tangles) 309 | 310 | 311 | -------------------------------------------------------------------------------- /tangledown.py: -------------------------------------------------------------------------------- 1 | from typing import List, Dict, Tuple, Match 2 | 3 | NowebName = str 4 | FileName = str 5 | TangleFileName = FileName 6 | LineNumber = int 7 | Line = str 8 | Lines = List[Line] 9 | Liness = List[Lines] 10 | LinesTuple = Tuple[LineNumber, Lines] 11 | 12 | Nowebs = Dict[NowebName, Lines] 13 | Tangles = Dict[TangleFileName, Liness] 14 | 15 | 16 | import re 17 | import sys 18 | from pathlib import Path 19 | 20 | def get_aFile() -> str: 21 | """Get a file name from the command-line arguments 22 | or 'README.md' as a default.""" 23 | print({f'len(sys.argv)': len(sys.argv), f'sys.argv': sys.argv}) 24 | 25 | 26 | aFile = 'README.md' # default 27 | if len(sys.argv) > 1: 28 | file_names = [p for p in sys.argv 29 | if (p[0] != '-') # option 30 | and (p[-3:] != '.py') 31 | and (p[-5:] != '.json')] 32 | if file_names: 33 | aFile = sys.argv[1] 34 | return aFile 35 | 36 | raw_line_re: re = re.compile(r'') 37 | def get_lines(fn: FileName) -> Lines: 38 | """Get lines from a file named fn. Replace 39 | 'raw' fenceposts with blank lines. Write full path to 40 | a secret place for the Tangledown kernel to pick it up. 41 | Return tuple of file path (for TangleUp's Tracer) and 42 | lines.""" 43 | def save_aFile_path_for_kernel(fn: FileName) -> FileName: 44 | xpath: Path = Path.cwd() / Path(fn).name 45 | victim_file_name = str(xpath.absolute()) 46 | safepath: Path = Path.home() / '.tangledown/current_victim_file.txt' 47 | Path(safepath).parents[0].mkdir(parents=True, exist_ok=True) 48 | print(f"SAVING {victim_file_name} in secret place {str(safepath)}") 49 | with open(safepath, "w") as t: 50 | t.write(victim_file_name) 51 | return xpath 52 | 53 | 54 | xpath = save_aFile_path_for_kernel(fn) 55 | with open(fn) as f: 56 | in_lines: Lines = f.readlines () 57 | out_lines: Lines = [] 58 | for in_line in in_lines: 59 | out_lines.append( 60 | in_line if not raw_line_re.match(in_line) else "\n") 61 | return xpath, out_lines 62 | 63 | 64 | noweb_start_re = re.compile (r'^$') 65 | noweb_end_re = re.compile (r'^$') 66 | 67 | tangle_start_re = re.compile (r'^$') 68 | tangle_end_re = re.compile (r'^$') 69 | 70 | 71 | block_start_re = re.compile (r'^(\s*)') 72 | block_end_re = re.compile (r'^(\s)*') 73 | 74 | 75 | 76 | def test_re_matching(fp: Path, lines: Lines) -> None: 77 | for line in lines: 78 | noweb_start_match = noweb_start_re.match (line) 79 | tangle_start_match = tangle_start_re.match (line) 80 | block_start_match = block_start_re.match (line) 81 | 82 | noweb_end_match = noweb_end_re.match (line) 83 | tangle_end_match = tangle_end_re.match (line) 84 | block_end_match = block_end_re.match (line) 85 | 86 | if (noweb_start_match): 87 | print ('NOWEB: ', noweb_start_match.group (0)) 88 | print ('name of the block: ', noweb_start_match.group (1)) 89 | elif (noweb_end_match): 90 | print ('NOWEB END: ', noweb_end_match.group (0)) 91 | elif (tangle_start_match): 92 | print ('TANGLE: ', tangle_start_match.group (0)) 93 | print ('name of the file: ', tangle_start_match.group (1)) 94 | elif (tangle_end_match): 95 | print ('TANGLE END: ', tangle_end_match.group (0)) 96 | elif (block_start_match): 97 | print ('BLOCK: ', block_start_match.group (0)) 98 | print ('name of the block: ', block_start_match.group (1)) 99 | if (block_end_match): 100 | print ('BLOCK END SAME LINE: ', block_end_match.group (0)) 101 | else: 102 | print ('BLOCK NO END') 103 | elif (block_end_match): 104 | print ('BLOCK END ANOTHER LINE: ', block_end_match.group (0)) 105 | else: 106 | pass 107 | 108 | 109 | from dataclasses import dataclass, field 110 | from typing import Union ## TODO 111 | @dataclass 112 | class Tracer: 113 | trace: List[Dict] = field(default_factory=list) 114 | line_no = 0 115 | current_betweens: Lines = field(default_factory=list) 116 | fp: Path = None 117 | # First Pass 118 | def add_markdown(self, i, between: Line): 119 | self.line_no += 1 120 | self.current_betweens.append((self.line_no, between)) 121 | 122 | 123 | def add_raw(self, i, between: Line): 124 | self.line_no += 1 125 | self.current_betweens.append((self.line_no, between)) 126 | 127 | 128 | def _end_betweens(self, i): 129 | if self.current_betweens: 130 | self.trace.append({"ending_line_number": self.line_no, "i": i, 131 | "language": "markdown", "kind": 'between', 132 | "text": self.current_betweens}) 133 | self.current_betweens = [] 134 | 135 | 136 | def add_noweb(self, i, language, id_, key, noweb_lines): 137 | self._end_betweens(i) 138 | self.line_no = i 139 | self.trace.append({"ending_line_number": self.line_no, "i": i, 140 | "language": language, "id_": id_, 141 | "kind": 'noweb', key: noweb_lines}) 142 | 143 | 144 | def add_tangle(self, i, language, id_, key, tangle_liness): 145 | self._end_betweens(i) 146 | self.line_no = i 147 | self.trace.append({"ending_line_number": self.line_no, "i": i, 148 | "language": language, "id_": id_, 149 | "kind": 'tangle', key: tangle_liness}) 150 | 151 | 152 | def dump(self): 153 | pr = self.fp.parent 154 | fn = self.fp.name 155 | fn2 = fn.translate(str.maketrans('.', '_')) 156 | # Store the trace in the dir where the input md file is: 157 | vr = f'tangledown_trace_{fn2}' 158 | np = pr / (vr + ".py") 159 | with open(np, "w") as fs: 160 | print(f'sequential_structure = (', file=fs) 161 | pprint(self.trace, stream=fs) 162 | print(')', file=fs) 163 | 164 | 165 | # Second Pass 166 | def add_expandedn_noweb(self, i, language, id_, key, noweb_lines): 167 | self._end_betweens(i) 168 | self.line_no = i 169 | self.trace.append({"ending_line_number": self.line_no, "i": i, 170 | "language": language, "id_": id_, 171 | "kind": 'expanded_noweb', key: noweb_lines}) 172 | 173 | 174 | def add_expanded_tangle(self, i, language, id_, key, tangle_liness): 175 | self._end_betweens(i) 176 | self.line_no = i 177 | self.trace.append({"ending_line_number": self.line_no, "i": i, 178 | "language": language, "id_": id_, 179 | "kind": 'expanded_tangle', key: tangle_liness}) 180 | 181 | 182 | 183 | 184 | triple_backtick_re = re.compile (r'^`[`]`((\w+)?\s*(id=([0-9a-fA-F-]+))?)') 185 | blank_line_re = re.compile (r'^\s*$') 186 | 187 | def first_non_blank_line_is_triple_backtick ( 188 | i: LineNumber, lines: Lines) -> Match[Line]: 189 | while (blank_line_re.match (lines[i])): 190 | i = i + 1 191 | yes = triple_backtick_re.match (lines[i]) 192 | language = "python" # default 193 | id_ = None # default 194 | if yes: 195 | language = yes.groups()[1] or language 196 | id_ = yes.groups()[3] ## can be 'None' 197 | return i, yes, language, id_ 198 | 199 | 200 | def accumulate_contents ( 201 | lines: Lines, i: LineNumber, end_re: re) -> LinesTuple: 202 | r"""Harvest contents of a noweb or tangle tag. The start 203 | taglet was consumed by caller. Consume the end taglet.""" 204 | i, yes, language, id_ = first_non_blank_line_is_triple_backtick(i, lines) 205 | snip = 0 if yes else 4 206 | contents_lines: Lines = [] 207 | for j in range (i, len(lines)): 208 | if (end_re.match(lines[j])): 209 | return j + 1, language, id_, contents_lines # the only return 210 | if not triple_backtick_re.match (lines[j]): 211 | contents_lines.append (lines[j][snip:]) 212 | 213 | 214 | def anchor_is_tilde(path_str: str) -> bool: 215 | result = (path_str[0:2] == "~/") and (Path(path_str).anchor == '') 216 | return result 217 | 218 | def normalize_file_path(tangle_file_attribute: str) -> Path: 219 | result: Path = Path(tangle_file_attribute) 220 | if (anchor_is_tilde(tangle_file_attribute)): 221 | result = (Path.home() / tangle_file_attribute[2:]) 222 | return result.absolute() 223 | 224 | 225 | raw_start_re = re.compile("") 226 | raw_end_re = re.compile("") 227 | from pprint import pprint 228 | def accumulate_lines(fp: Path, lines: Lines) -> Tuple[Tracer, Nowebs, Tangles]: 229 | tracer = Tracer() 230 | tracer.fp = fp 231 | nowebs: Nowebs = {} 232 | tangles: Tangles = {} 233 | i = 0 234 | while i < len(lines): 235 | noweb_start_match = noweb_start_re.match (lines[i]) 236 | tangle_start_match = tangle_start_re.match (lines[i]) 237 | if noweb_start_match: 238 | in_between = False 239 | key: NowebName = noweb_start_match.group(1) 240 | (i, language, id_, nowebs[key]) = \ 241 | accumulate_contents(lines, i + 1, noweb_end_re) 242 | tracer.add_noweb(i, language, id_, key, nowebs[key]) 243 | 244 | 245 | elif tangle_start_match: 246 | in_between = False 247 | key: TangleFileName = \ 248 | str(normalize_file_path(tangle_start_match.group(1))) 249 | if not (key in tangles): 250 | tangles[key]: Liness = [] 251 | (i, language, id_, things) = accumulate_contents(lines, i + 1, tangle_end_re) 252 | tangles[key] += [things] 253 | tracer.add_tangle(i, language, id_, key, tangles[key]) 254 | 255 | 256 | elif raw_start_re.match (lines[i]): 257 | pass 258 | 259 | 260 | else: 261 | in_between = True 262 | tracer.add_markdown(i, lines[i]) 263 | i += 1 264 | 265 | 266 | if in_between: # Close out final markdown. 267 | tracer._end_betweens(i) 268 | return tracer, nowebs, tangles 269 | 270 | 271 | def there_is_a_block_tag (lines: Lines) -> bool: 272 | for line in lines: 273 | block_start_match = block_start_re.match (line) 274 | if (block_start_match): 275 | return True 276 | return False 277 | 278 | 279 | def eat_block_tag (i: LineNumber, lines: Lines) -> LineNumber: 280 | for j in range (i, len(lines)): 281 | end_match = block_end_re.match (lines[j]) 282 | # DUDE! Check leading whitespace against block_start_re 283 | if (end_match): 284 | return j + 1 285 | else: # DUDE! 286 | pass 287 | 288 | 289 | def expand_blocks (tracer: Tracer, nowebs: Nowebs, lines: Lines, 290 | language: str = "python") -> Lines: 291 | out_lines = [] 292 | block_key: NowebName = "" 293 | for i in range (len (lines)): 294 | block_start_match = block_start_re.match (lines[i]) 295 | if (block_start_match): 296 | leading_whitespace: str = block_start_match.group (1) 297 | block_key: NowebName = block_start_match.group (2) 298 | block_lines: Lines = nowebs [block_key] # DUDE! 299 | i: LineNumber = eat_block_tag (i, lines) 300 | for block_line in block_lines: 301 | out_lines.append (leading_whitespace + block_line) 302 | else: 303 | out_lines.append (lines[i]) 304 | return out_lines 305 | 306 | 307 | def expand_tangles(tracer: Tracer, liness: Liness, nowebs: Nowebs) -> str: 308 | contents: Lines = [] 309 | for lines in liness: 310 | while there_is_a_block_tag (lines): 311 | lines = expand_blocks (tracer, nowebs, lines) 312 | contents += lines 313 | return ''.join(contents) 314 | 315 | 316 | 317 | def tangle_all(tracer: Tracer, nowebs: Nowebs, tangles: Tangles) -> None: 318 | for filename, liness in tangles.items (): 319 | Path(filename).parents[0].mkdir(parents=True, exist_ok=True) 320 | contents: str = expand_tangles(tracer, liness, nowebs) 321 | with open (filename, 'w') as outfile: 322 | print(f"WRITING FILE: {filename}") 323 | outfile.write (contents) 324 | tracer.dump() 325 | 326 | if __name__ == "__main__": 327 | fn, lines = get_lines(get_aFile()) 328 | # test_re_matching(lines) 329 | tracer, nowebs, tangles = accumulate_lines(fn, lines) 330 | tangle_all(tracer, nowebs, tangles) 331 | 332 | 333 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Tangledown: One-Step Literate Markdown 2 | 3 | 4 | #### Brian Beckman 5 | #### Friday, 23 Sep 2022 6 | #### v0.0.8 7 | 8 | 9 | # OVERVIEW 10 | 11 | 12 | ## WRITING MATTERS 13 | 14 | 15 | Leslie Lamport, Turing-Award Winner, 2013, said, approximately: 16 | 17 | 18 | > Writing is Nature's Way of showing you how sloppy your thinking is. Coding is Nature's Way of showing you how sloppy your writing is. Testing is Nature's Way of showing you how sloppy your coding is. 19 | 20 | 21 | > If you can't write, you can't think. If you're not writing, you only think you're thinking. 22 | 23 | 24 | In here, we will show you how to combine thinking, writing, coding, and testing in a natural way. Your code will be the central character in a narrative, a story crafted to help your readers understand _both_ what you're doing _and_ how you're doing it. Your code will be tested because you (and your readers) can run it, right here and now, inside this [Jupytext](#oh-my-jupytext) \[sic\] notebook. Your story and your code will _never_ get out of sync because you will be working with both of them all the time. 25 | 26 | 27 | ### NARRATIVE ORDER 28 | 29 | 30 | Narrative order is the natural order for a story, but it's not the natural order for interpreters and compilers, even for Jupyter kernels. Tangledown lets you write in narrative order, then, later, tangle the code out into executable order, where the definitions of the parts precede the story. That executable order is backwards and inside-out from the reader's point of view! TangleUp lets you maintain your code by rebuilding your story in narrative order from sources, in executable order, that you may have changed on disk (_TangleUp_ is abandoned. Turned out to be too difficult). 31 | 32 | 33 | Without something like this, you're condemned to explaining _how_ your code works before you can say much or anything about _what_ your code is doing. Indulge us in a little _theory of writing_, will you? 34 | 35 | 36 | ## CREATIVE WRITING 101 37 | 38 | 39 | You're writing a murder mystery. 40 | 41 | 42 | **METHOD 1**: Start with a data sheet: all the characters and their relationships. Francis stands to inherit, but Evelyn has a life-insurance policy on Victor. Bobbie is strong enough to swing an axe. Alice has poisonous plants in her garden. Charlie has a gun collection. Danielle is a chef and owns sharp knives. Lay out their schedules and whereabouts for several weeks. Finally, write down the murder, the solution, and all the loose ends your romantic detective might try. 43 | 44 | 45 | **METHOD 2**: There's a murder. Your romantic detective asks "Who knew whom? Who benefitted? Who could have been at the scene of the crime? Who had opportunity? What was the murder weapon? Who could have done it?" Your detevtive pursues all the characters and their happenstances. In a final triumph of deductive logic, the detective identifies the killer despite compelling and ultimately distracting evidence to the contrary. 46 | 47 | 48 | If your objective is to _engage_ the audience, to motivate them to unravel the mystery and learn the twists and turns along the way, which is the better method? If your objective is to have them spend several hours wading through reference material, trying to guess where this is all going, which is the better method? If your objective is to organize your own thoughts prior to weaving the narrative, how do you start? 49 | 50 | 51 | Now, you're writing about some software. 52 | 53 | 54 | **METHOD 1**: Present all the functions and interfaces, cross dependencies, asynchronous versus synchronous, global and local state variables, possibilities for side effects. Finally, present unit tests and a main program. 55 | 56 | 57 | **METHOD 2**: Explain the program's rationale and concept of operation, the solution it delivers, its modes and methods. Present the unit tests and main program that fulfill all that. Finally, present all the functions, interfaces, and procedures, all the bits and bobs that could affect and effect the solution. 58 | 59 | 60 | If your objective is to _engage_ your audience, to have them understand the software deeply, as if they wrote it themselves, which is the better method? If your objective is to have them spend unbounded time wading through reference material trying to guess what you mean to do, which is the better method? 61 | 62 | 63 | ## SOFTWARE AS DOCUMENTATION 64 | 65 | 66 | Phaedrus says: 67 | 68 | 69 | > I give good, long, descriptive names to functions and parameters to make my code readable. I use Doxygen and Sphinx to automate document production. _I'm a professional!_ 70 | 71 | 72 | And Socrates says: 73 | 74 | 75 | > That's nice, but you only document the pieces, and say nothing about how the pieces fit together. It's like giving me a jigsaw puzzle without the box top. It's almost sadistic. 76 | 77 | > You condemn me to reverse-engineering your software: to running it in a debugger or to tracing logs and printouts. 78 | 79 | 80 | ## LITERATE PROGRAMMING 81 | 82 | 83 | [Literate Programming](https://en.wikipedia.org/wiki/Literate_programming) is the best known way to save your _audience_ the work of reverse engineering your code, of giving them the box top with the jigsaw puzzle. 84 | 85 | 86 | Who is your audience? 87 | 88 | - yourself, first, down the line, trying to remember "why the heck did I do _that_?!?" 89 | 90 | - other programmers, eventually, when they take over maintaining and extending your code 91 | 92 | 93 | ## IMPERATIVES 94 | 95 | 96 | > First, write a paper about your code, explaining, at least to yourself, what you want to do and how you plan to do it. Flesh out your actual code inside the paper. RUN your code inside the paper, capturing printouts and charts and diagrams and what not, so others, including, first, your future self, can see the code at work. Iterate the process, rewriting prose and code as you invent and simplify, in a loop. 97 | 98 | 99 | ## THAT'S JUST JUPYTER, RIGHT? 100 | 101 | 102 | > Ok, that's just ordinary Jupyter-notebook practice, right? Code inside your documentation, right? Doxygen and Sphinx inside-out, right? 103 | 104 | 105 | Notebooks solve the ***inside-out problem***, but ordinary programming is _both_ inside out _and_ upside-down from literate programming. Literate Programming solves the upside-down problem. 106 | 107 | 108 | With ordinary notebook practice, you write everything in ***executable order***, because a Jupyter notebook is just an interface to an execution kernel. Notebooks inherit the sequential constraints of the underlying intrepreter and compiler. Executable order usually forces us to define all details before using them. With Literate Programming, you write in ***narrative order***, much more understandable to humans. 109 | 110 | 111 | Executable order is usually the reverse of narrative order. Humans want to understrand the big picture *first*, then the details. They want to see the box-top of the jigsaw puzzle _before_ looking at all the pieces. Executable order is ***upside-down*** to the human's point-of-view. 112 | 113 | 114 | We've all had the experience of reading code and notebooks backwards so we don't get overwhelmed by details before understanding the big picture. That observation leads us to another imperative. 115 | 116 | 117 | > Write about your code in narrative order. Don't be tyrranized by your programming language into defining everything before you can talk about anything. Use tools to rearrange your code in executable order. 118 | 119 | 120 | [Donald Knuth](http://amturing.acm.org/award_winners/knuth_1013846.cfm) invented Literate Programming so that he could both write _about_ [MetaFont](https://en.wikipedia.org/wiki/Metafont) and [TeX](https://texfaq.org/FAQ-lit) and _implement_ them in the same place. These are two of the most important computer programs ever written. Their longevity and quality testify to the viability of Literate Programming. 121 | 122 | 123 | ## TANGLEDOWN IS HERE, INSIDE THIS README.md 124 | 125 | 126 | ***Tangledown*** is the tool that rearranges code from any Markdown file into executable order. This here document, README.md, the one you're reading right now, is the Literate Program for Tangledown. 127 | 128 | 129 | Because our documentation language is Markdown, the language of this document is _Literate Markdown_. This here README.md, which you're reading right now, contains all the source for the Literate-Markdown tool, `tangledown.py`, with all its documentation, all presented in narrative order, like a story. 130 | 131 | 132 | `tangledown.py` ***tangles*** code out of any Markdown document, not just this here README.md that you're reading right now. The verb "tangle" is traditional in Literate Programming. You might think it should be "untangle," because a Literate Markdown document is all tangled up from executable order. But Knuth prefers the human's point of view. The Markdown document contains the code in the _correct_, narrative order. To address untangling or detangling, we have [TangleUp](#tangleup-intro). 133 | 134 | 135 | You can _also_ run Tangledown inside a Jupyter notebook, specifically one that is linked to this here document, README.md, the one you're reading right now. See [this section](#oh-my-jupytext) for more. 136 | 137 | 138 | We should mention that Tangledown is similar to [org-babel](http://orgmode.org/worg/org-contrib/babel/) in Emacs (or [spacemacs](http://spacemacs.org/) for VIM users). Those are polished, best-of-breed Literate-Programming tools for the Down direction. You have to learn some Emacs to use them, and that's an barrier for many people. Markdown is good enough for Github, and thus for most of us right now. 139 | 140 | 141 | ## TANGLEUP INTRO 142 | 143 | 144 | Tangledown, as a distribution format, is a complete solution to Literate Programming. You get a single Markdown file and all the tested source for your project is included. Run Tangledown and the project is sprayed out on disk, ready for running, further testing, and deploying. 145 | 146 | 147 | As a development format, it's not quite enough. With only Tangledown, when you modify the source tree in executable order, your narrative is _instantly_ out of date. We can't have that. See [Section TangleUp](#tangleup) for more. 148 | 149 | 150 | ## TANGLING UP EXISTING CODE 151 | 152 | 153 | TangleUp can generate unique names, as GUIDs, say, for new source files and blocks. You should be able to TangleUp an existing source tree into a new, fresh, non-pre-existing Markdown file and, then round-trip TangleUp and TangleDown. 154 | 155 | 156 | # TANGLEDOWN DESIGN AND IMPLEMENTATION 157 | 158 | 159 | Let's do Tangledown, first, and TangleUp [later](#tangleup). 160 | 161 | 162 | # OH MY! JUPYTEXT 163 | 164 | 165 | ***Jupytext*** \[sic\] automatically syncs a Markdown file with a Jupyter notebook. Read about it [here](https://github.com/mwouts/jupytext). It works well in ***JupyterLab***. Read about that [here](https://github.com/jupyterlab/jupyterlab). Specifically, it lets you open this here Markdown file, README.md, that you're reading right now, as a Jupyter notebook, and you can evaluate some cells in the notebook. 166 | 167 | 168 | Here's how I installed everything on an Apple Silicon (M1) Mac Book Pro, with Python 3.9: 169 | 170 | 171 | ``` 172 | pip install jupyter 173 | pip install jupyterlab 174 | pip install jupytext 175 | ``` 176 | 177 | 178 | Here is how I run it: 179 | 180 | 181 | ``` 182 | jupyter lab 183 | ``` 184 | 185 | 186 | or 187 | 188 | 189 | ``` 190 | PYTHONPATH=".:$HOME/Documents/GitHub/tangledow:$HOME/Library/Jupyter/kernels/tangledown_kernel" jupyter lab ~ 191 | ``` 192 | 193 | 194 | when I want the [Tangledown Kernel](#section-tangledown-kernel), and I almost always want the Tangledown kernel. 195 | 196 | 197 | In JupyterLab 198 | 199 | 200 | - open README.md 201 | - `View->Activate Command Palette` 202 | - check `Pair Notebook with Markdown` 203 | - right-click `README.md` and say `Open With -> Jupytext Notebook` 204 | - edit one of the two, `README.md` or `README.ipynb` ... 205 | 206 | 207 | Jupytext will update the other, 208 | 209 | 210 | > ***IMPORTANT***: To see the updates in the notebook when you modify the Markdown, you must `File->Reload Notebook from Disk`, and to see updates in the Markdown when you modify the notebook, you must `File->Reload Markdown File from Disk`. Jupytext forces you to reload changed files manually. I'll apologize here, on behalf of the Jupytext team. 211 | 212 | 213 | If you're reading or modifying `README.ipynb`, or if you `Open With -> Jupytext Notebook` on `README.md` (my preference), you may see tiny, unrendered Markdown cells above and below all your tagged nowebs and tangles. ***DON'T DELETE THE TINY CELLS***. Renderers of Markdown simply ignore the tags, but Jupytext makes tiny, invisible cells out of them! 214 | 215 | 216 | Unless you're running the optional, new [Tangledown Kernel](#section-tangledown-kernel), don't RUN cells with embedded `block` tags in Jupyter, you'll just get syntax errors from Python. 217 | 218 | 219 | # LET ME TELL YOU A STORY 220 | 221 | 222 | This here README.md, the one you're reading right now, should tell the story of Tangledown. We'll use Tangledown to _create_ Tangledown. That's just like bootstrapping a compiler. We'll use Tangledown to tangle Tangledown itself out of this here document named README.md that you're reading right now. 223 | 224 | 225 | The first part of the story is that I just started writing the story. The plan and outline was in my head (I didn't explicitly do [Method 1](#creative-writing)). Then I filled in the code, moved everything around when I needed to, and kept rewriting until it all worked the way I wanted it to work. Actually, I'm still doing this now. Tangledown and TangleUp are living stories! 226 | 227 | 228 | ## DISCLAIMER 229 | 230 | 231 | This is a useful toy, but it has zero error handling. We currently talk only about the happy path. I try to be rude ("[DUDE!](#dude)") every place where I sense trouble, but I'm only sure I haven't been rude enough. Read this with a sense of humor. You're in on the story with me, and it's supposed to be fun! 232 | 233 | 234 | I also didn't try it on Windows, but I did try it on WSL, the Windows Subsystem for Linux. Works great on WSL! 235 | 236 | 237 | ## HOW TO RUN TANGLEDOWN 238 | 239 | 240 | One way: run `python3 tangledown.py REAMDE.md` or just `python tangledown.py` at the command line. That command should _overwrite_ tangledown.py. The code for tangledown.py is inside this here README.md that you're reading right now. The name of the file to overwrite, namely `tangledown.py`, is embedded inside this here README.md itself, in the `file` attribute of a `` tag. Read about `tangle` tags [below](#section-tangle-tags)! 241 | 242 | 243 | If you said `python3 tangledown.py MY-FOO.md`, then you would tangle 244 | code out of `MY-FOO.md`. You'll do that once you start writing your own code in 245 | Tangledown. You will love it! We have some big examples that we'll write about elsewhere. Those examples include embedded code and microcode for exotic hardware, all written in Python! 246 | 247 | 248 | Tangledown is both a script and a module. You can run Tangledown in a [Jupytext](#oh-my-jupytext) cell after importing some stuff from the module. The next cell illustrates the typical bootstrapping joke of tangling Tangledown itself out of this here README.md that you're reading right now, after this Markdown file has been linked to a Jupytext notebook. 249 | 250 | ```python 251 | from tangledown import get_lines, accumulate_lines, tangle_all 252 | tangle_all(*accumulate_lines(*get_lines("README.md"))) 253 | ``` 254 | 255 | After you have tangled at least once, as above, and if you switch the notebook kernel to the new, optional [Tangledown Kernel](#section-tangledown-kernel), you can evaluate the source code for the whole program in the [later cell i'm linking right here](#tangle-listing-tangle-all). ***How Cool is That?*** 256 | 257 | 258 | You'll also need to re-tangle and restart the Tangledown Kernel when you add new nowebs to your files. Sorry about that. This is still just a toy. 259 | 260 | 261 | Because Tangledown is a Python module, you can also run Tangledown from inside a standalone Python program, say in PyCharm or VS Code or whatever; 262 | `hello_world_tangler.py` in this repository is an example. 263 | 264 | 265 | Once again, Jupytext lets you RUN code from a Markdown 266 | document in a JupyterLab notebook with just the ordinary Python3 kernel. If you open `hello_world.md` as a Jupytext 267 | notebook in JupyterLab then you 268 | can run Tangledown in Jupyter cells. Right-click on the name `hello_world.md` in the JupyterLab GUI and choose 269 | 270 | 271 | `Open With ...` $\longrightarrow$ `Jupytext Notebook` 272 | 273 | 274 | Then run cells! This is close to the high bar set by org-babel! 275 | 276 | 277 | ## HOW IT WORKS: Markdown Ignores Mysterious Tags 278 | 279 | 280 | How can we rearrange code cells in a notebook or Markdown file from human-understandable, narrative order to executable order? 281 | 282 | 283 | Exploit the fact that most Markdown renderers, like Jupytext's, Github's, and [PyCharm's](https://www.jetbrains.com/pycharm/), ignore HTML / XML _tags_ (that is, stuff inside angle brackets) that they don't recognize. Let's enclose blocks of real, live code with `noweb` tags, like this: 284 | 285 | 286 | 287 | 288 | class TestSomething (): 289 | def test_something (self): 290 | assert (3 == 2+1) 291 | 292 | 293 | 294 | 295 | ### TAG CELLS CAN BE RAW OR MARKDOWN, NOT CODE 296 | 297 | 298 | The markdown above renders as follows. You can see the `noweb` one-liner raw cells above and below the code in Jupytext. If they were Markdown cells, they'd be tiny and invisible. That's 100% OK, and may be more to your liking! Try changing the cells from RAW (press "R") to Markdown (press "M") and back, then closing them (Shift-Enter) and opening them (Enter). Don't mark the tag cells CODE (don't press "Y"). Tangledown won't work because Jupytext will surround them with triple-backticks. 299 | 300 | 301 | 302 | 303 | 304 | ```python 305 | class TestSomething (): 306 | def test_something (self): 307 | assert (42 == 6 * 7) 308 | ``` 309 | 310 | 311 | 312 | 313 | 314 | What are the `` and `` tags? We explain them immediately below. 315 | 316 | 317 | ## THREE TAGS: noweb, block, and tangle 318 | 319 | 320 | ### `noweb` tags 321 | 322 | 323 | Markdown ignores `` and `` tags, but `tangledown.py` _doesn't_. `tangledown.py` sucks up the ***contents*** of the `noweb` tags and sticks them into a dictionary for later lookup when processing `block` tags. 324 | 325 | 326 | #### CONTENTS OF A TAG 327 | 328 | 329 | The contents of a `noweb` tag are between the opening `` and closing `` fenceposts. Markdown renders code contents with syntax coloring and indentation. That's why we want code cells to be CODE cells and not RAW cells. 330 | 331 | 332 | The term _contents_ is ordinary jargon from HTML, XML, SGML, etc., and applies to any opening `` and closing `` pair. 333 | 334 | 335 | #### ATTRIBUTES OF A TAG 336 | 337 | 338 | The Tangledown dictionary key for contents of a `noweb` tag is the string value of the `name` attribute. For example, in ``, `name` is an _attribute_, and its string value is `"foo"`. 339 | 340 | 341 | > Noweb names must be unique in a document. TangleUp ensures that when it writes a new Markdown file from existing source, or you may do it by hand. 342 | 343 | 344 | **NOTE**: the `name` attribute of a `noweb` opener must be on the same line, like this: 345 | 346 | 347 | 348 | 349 | 350 | Ditto for our other attributes, as in the following. [Don't separate attributes with commas!]( https://www.w3schools.com/html/html_attributes.asp) 351 | 352 | 353 | 354 | 355 | 356 | This single-line rule is a limitation of the regular expressions that detect `noweb` tags. Remeber, [Tangledown is a toy](#disclaimer), a useful toy, but it's limited. 357 | 358 | 359 | #### FENCEPOST CELLS 360 | 361 | 362 | You can create the fencepost cells, `` and ``, either in the plain-text Markdown file, or you can create them in the synchronized Jupytext notebook. 363 | 364 | 365 | If you create fencepost cells in plain-text Markdown as opposed to the Jupytext notebook, leave a blank line after the opening `` and a blank line before the closing ``. If you don't, the Markdown renderer won't color and indent the contents. Tangledown will still work, but the Markdown renderer will format your code like text without syntax coloring and indentatino. 366 | 367 | 368 | If you write fencepost cells in Markdown cells in the notebook or as blank-surrounded tags in the plain-text Markdown, the fenceposts appear as tiny, invisible Markdown cells because the renderer treats them as empty markdown cells. That's the fundamental operating principle of Tangledown: Markdown ignores tags it doesn't recognize! ***DON'T DELETE THE TINY, INVISIBLE CELLS***, but you can open (Enter) and close (Shift-Enter) them. 369 | 370 | 371 | If you create `noweb` and `tangle` tags in the notebook and you want them _visible_, mark them _RAW_ by pressing "R" with the cell highlighted but not editable. Don't mark them CODE (don't press "Y"). Tangledown will break because Jupytext will surround them with triple-backticks. 372 | 373 | 374 | ### `block` tags 375 | 376 | 377 | Later, in the [second pass](#second-pass), Tangledown blows the contents of `noweb` tags back out wherever it sees `block` tags with matching `name` attributes. That's how you can define code anywhere in your document and use it in any other place, later or earlier, more than once if you like. 378 | 379 | 380 | `block` tags can and should appear in the contents of `noweb` tags and in the in the contets of `tangle` tags, too. That's how you structure your narrative! 381 | 382 | 383 | Tangledown discards the contents of `block` tags. Only the `name` attribute of a `block` tag matters. 384 | 385 | 386 | #### WRITE IN ANY-OLD-ORDER YOU LIKE 387 | 388 | 389 | You don't have to write the noweb _before_ you write a matching `block` tag. You can refer to a `noweb` tag before it exists in time and space, more than once if you like. You can define things and name things and use things in any order that makes your thinking and your story more clear. This is literature, after all. 390 | 391 | 392 | ### `tangle` tags 393 | 394 | 395 | A `tangle` tag sprays its block-expanded contents to a file on disk. What file? The file named in the `file` attribute of the `tangle` tag. ***Expanding*** contents of a `tangle` tag means replacing every `block` tag with the contents of its matching `noweb` tag, recursively, until everything bottoms out in valid Python. 396 | 397 | 398 | The same rules about blank lines hold for `tangle` tags as they do for `noweb` tags: if you want Markdown to render the contents like code, surround the contents with blank lines or mark the tag cells _RAW_. The following Markdown 399 | 400 | 401 | 402 | 403 | 404 | 405 | if __name__ == '__main__': 406 | TestSomething().test_something() 407 | 408 | 409 | 410 | 411 | renders like this 412 | 413 | 414 | 415 | 416 | ```python 417 | import unittest 418 | 419 | 420 | 421 | if __name__ == '__main__': 422 | TestSomething().test_something() 423 | ``` 424 | 425 | 426 | 427 | 428 | See the tiny, invisible Markdown cells above and below the code? Play around with opening and closing them with Enter and Shift-Enter, respectively, and marking them RAW (Press "R") and Markdown ("M"). Don't mark them CODE ("Y"). 429 | 430 | 431 | You can evaluate the cell with the new, optional [Tangledown Kernel](#section-tangledown-kernel). If you evaluate the code cell in the Python Kernel, you'll get a syntax error because the `block` tag is not valid Python. The syntax error is harmless to Tangledown. 432 | 433 | 434 | This code tangles to the file `/dev/null`. That's a nifty trick for temporary `tangle` blocks. You can talk about them, validate them by executing their cells in the [Tangledown Kernel](#section-tangledown-kernel), and throw them away. 435 | 436 | 437 | [TangleUp](#tangleup) knows where Tangledown puts all the blocks and tangles. That's how, when you change code on disk, TangleUp can put it all back in the single file of Literate Markdown. 438 | 439 | 440 | # HUMAN! READ THE `block` TAGS! 441 | 442 | 443 | Markdown renders `block` tags verbatim inside nowebs or tangles. This is good for humans, who will think 444 | 445 | 446 | > AHA!, this `block` refers to some code in a `noweb` tag somewhere else in this Markdown document. I can read all the details of that code later, when it will make more sense. I can look at the picture on the box before the pieces of the jigsaw puzzle. 447 | 448 | > Thank you, kindly, author! Without you, I'd be awash in details. I'd get tired and cranky before understanding the big picture! 449 | 450 | 451 | See, I'll prove it to you. Below is the code for all of `tangledown.py` itself. You can understand this without understanding the _implementations_ of the sub-pieces, just getting a rought idea of _what_ they do from the names of the `block` tags. **READ THE NAMES IN THE BLOCK TAGS**. Later, if you want to, you can read all the details in the `noweb` tags named by the `block` tags. 452 | 453 | 454 | # TANGLE ALL 455 | 456 | 457 | If you're running the new, optional [Tangledown Kernel](#section-tangledown-kernel), you can evaluate this next cell and run Tangledown on Tangledown itself, right here in a Jupyter notebook. ***How Cool is That?*** 458 | 459 | 460 | 461 | 462 | 463 | ```python 464 | 465 | 466 | 467 | 468 | 469 | 470 | 471 | 472 | 473 | 474 | 475 | def tangle_all(tracer: Tracer, nowebs: Nowebs, tangles: Tangles) -> None: 476 | for filename, liness in tangles.items (): 477 | Path(filename).parents[0].mkdir(parents=True, exist_ok=True) 478 | contents: str = expand_tangles(tracer, liness, nowebs) 479 | with open (filename, 'w') as outfile: 480 | print(f"WRITING FILE: {filename}") 481 | outfile.write (contents) 482 | tracer.dump() 483 | 484 | if __name__ == "__main__": 485 | fn, lines = get_lines(get_aFile()) 486 | # test_re_matching(lines) 487 | tracer, nowebs, tangles = accumulate_lines(fn, lines) 488 | tangle_all(tracer, nowebs, tangles) 489 | ``` 490 | 491 | 492 | 493 | 494 | 495 | The whole program is in the function `tangle_all`. We get hybrid vigor by testing `__name__` against `"__main__"`: `tangledown.py` is both a script and module. 496 | 497 | 498 | All we do in `tangle_all` is loop over all the line lists in the tangles (`for filename, liness in tangles.items()`) and [expand them](#expand-tangles) to replace blocks with nowebs. Yes, "liness" has an extra "s". Remember [Smeagol](https://lotr.fandom.com/wiki/Gollum) Pronounce it like "my preciouses, my lineses!" 499 | 500 | 501 | The code will create the subdirectories needed. For example, if you tangle to file `foo/bar/baz/qux.py,` the code creates the directory chain `./foo/br/baz/` if it doesn't exist. 502 | 503 | 504 | ## TYPES 505 | 506 | 507 | Let us now explain the implementation. The first block in the tangle above is _types_. What is the noweb of _types_? It's here. 508 | 509 | 510 | A `Line` is a string, Python base type `str`. `Lines` is the type of a list of lines. `Liness` is the type of a list of list of lines, in a pluralizing shorthand borrowed from Haskell practice. Pronounce `liness` the way [Smeagol](https://lotr.fandom.com/wiki/Gollum) would do: "my preciouses, my lineses!" 511 | 512 | 513 | A noweb name is a string, and a tangle file name is a string. A line number is an `int`, a Python base type. 514 | 515 | 516 | Nowebs are dictionaries from noweb names to lines. 517 | 518 | 519 | Tangles are dictionaries from file names to Liness --- lists of lists of lines. Tangledown accumulates output for `tangle` files mentioned more than once. If you tangle to `qux.py` in one place and then also tangle to `qux.py` somewhere else, the second tangle won't overwrite the first, but append to it. That's why tangles are lists of lists of lines, one list of lines for each mentioning of a tangle file. Read more about that in [expand-tangles](#expand-tangles). 520 | 521 | 522 | 523 | 524 | 525 | ```python 526 | from typing import List, Dict, Tuple, Match 527 | 528 | NowebName = str 529 | FileName = str 530 | TangleFileName = FileName 531 | LineNumber = int 532 | Line = str 533 | Lines = List[Line] 534 | Liness = List[Lines] 535 | LinesTuple = Tuple[LineNumber, Lines] 536 | 537 | Nowebs = Dict[NowebName, Lines] 538 | Tangles = Dict[TangleFileName, Liness] 539 | ``` 540 | 541 | 542 | 543 | 544 | 545 | We'll implement all the noweb blocks, like `accumulate_contents` and `eatBlockTag`, later. You can read about them, or not, after you've gotten more of the big picture. 546 | 547 | 548 | ## DEBUGGING AND REFACTORING 549 | 550 | 551 | The Tangledown Kernel doesn't support the Jupytext debugger, yet. Sorry about that. Tangle the code out to disk and debug it with pudb or whatever, then tangle it back up into your Literate Markdown file via [TangleUp](#tangleup). 552 | 553 | 554 | [Tangledown is still a toy](#disclaimer). Ditto refactoring. PyCharm is great for that, but you'll have to do it on tangled files and detangle (paste) back into the Markdown. 555 | 556 | 557 | ## EXPAND TANGLES 558 | 559 | 560 | We separated out the inner loop over Liness \[sic\] into another function, `expand_tangles`, so that the [Tangledown Kernel](#section-tangledown-kernel) can import it and apply it to `block` tags. `tangle_all` calls `expand_tangles`; `expand_tangles` calls `expand_blocks`. Read about `expand_blocks` [here](#expand-blocks). 561 | 562 | ```python 563 | from graphviz import Digraph 564 | g = Digraph(graph_attr={'size': '8,5'}, node_attr={'fontname': 'courier'}) 565 | g.attr(rankdir='LR') 566 | g.edge('tangle_all', 'expand_tangles') 567 | g.edge('expand_tangles', 'expand_blocks') 568 | g 569 | ``` 570 | 571 | 572 | 573 | 574 | 575 | ```python 576 | def expand_tangles(tracer: Tracer, liness: Liness, nowebs: Nowebs) -> str: 577 | contents: Lines = [] 578 | for lines in liness: 579 | while there_is_a_block_tag (lines): 580 | lines = expand_blocks (tracer, nowebs, lines) 581 | contents += lines 582 | return ''.join(contents) 583 | ``` 584 | 585 | 586 | 587 | 588 | 589 | ## Tangledown Tangles Itself? 590 | 591 | 592 | Tangledown has two kinds of regular expressions (regexes) for matching tags in a Markdown file: 593 | 594 | - regexes for `noweb` and `tangle` tags that appear on lines by themselves, left-justified 595 | 596 | - regexes that match `` tags that may be indented, and match their closing `` tags, which may appear on the same line as `` or on lines by themselves. 597 | 598 | 599 | Both kinds of regex are ___safe___: they do not match themselves. That means it's safe to run 600 | `tangledown.py` on this here `READMD.md`, which contains tangled source for `tangledown.py`. 601 | 602 | 603 | The two regexes in noweb `left_justified_regexes` match `noweb` and`tangle` tags that appear on lines by themselves, left-justified. 604 | 605 | 606 | > They also wont match `noweb` and `tangle` tags that are indented. That lets us _talk about_ `noweb` and `tangle` tags without processing them: just put the examples you're talking about in an _indented_ Markdown code cell instead of in a triple-backticked Markdown code cell. 607 | 608 | 609 | The names in the attributes of `noweb` and `tangle` tags must start with a letter, and they can contain letters, numbers, hyphens, underscores, whitespace, and dots. 610 | 611 | 612 | The names of `noweb` tags must be globally unique within the Markdown file. Multiple `tangle` tags may refer to the same output file, in which cases, Tangledown appends the contents of the second and subsequent `tangle` tags to a list of list of lines, to a `Liness`. 613 | 614 | 615 | ### LEFT-JUSTIIED REGEXES 616 | 617 | 618 | There is a `.*` at the end to catch attributes beyon `name`. A bit of future-proofing. 619 | 620 | 621 | 622 | 623 | 624 | ```python 625 | noweb_start_re = re.compile (r'^$') 626 | noweb_end_re = re.compile (r'^$') 627 | 628 | tangle_start_re = re.compile (r'^$') 629 | tangle_end_re = re.compile (r'^$') 630 | ``` 631 | 632 | 633 | 634 | 635 | 636 | ### ANYWHERE REGEXES 637 | 638 | 639 | The regexes in this noweb, `anywhere_regexes`, match `block` tags that may be indented, preserving indentation. The `block_end_re` regex also preserves indentation. Indentation is critical for Python, Haskell, and other languages. 640 | 641 | 642 | I converted the 'o' in 'block' to a harmless regex group `[o]` so that `block_end_re` won't match itself. That makes it safe to run this code on this here document itself. 643 | 644 | 645 | 646 | 647 | 648 | ```python 649 | block_start_re = re.compile (r'^(\s*)') 650 | block_end_re = re.compile (r'^(\s)*') 651 | ``` 652 | 653 | 654 | 655 | 656 | 657 | ## Test the Regular Expressions 658 | 659 | 660 | ### OPENERS 661 | 662 | 663 | The code in noweb `openers` has two `block` tags that refer to the nowebs of the regexes defined above, namely `left_justified_regexes` and `anywhere_regexes`. After Tangledown substitutes the contents of the nowebs for the blocks, the code becomes valid Python and you can call `test_re_matching` in the [Tangledown Kernel](#section-tangledown-kernel) or at the command line. When you call it, it proves that we can recognize all the various kinds of tags. We leave the regexes themselves as global pseudo-constants so that they're both easy to test and to use in the body of the code ([Demeter weeps](https://en.wikipedia.org/wiki/Law_of_Demeter) because of globals). 664 | 665 | 666 | The code in `hello_world.ipynb` (after you have Paired a Notebook with the Markdown File `hello_world.md`) runs this test as its last act to check that `tangledown.py` was correctly tangled from this here `README.md`. That code works in the ordinary Python kernel and in the [Tangledown Kernel](#section-tangledown-kernel). 667 | 668 | 669 | Notice the special treatment for block ends, which are usually on the same lines as their block opener tags, but not necessarily so. That lets you put (useless) contents in `block` tags. 670 | 671 | 672 | 673 | 674 | 675 | ```python 676 | import re 677 | import sys 678 | from pathlib import Path 679 | 680 | 681 | 682 | 683 | 684 | def test_re_matching(fp: Path, lines: Lines) -> None: 685 | for line in lines: 686 | noweb_start_match = noweb_start_re.match (line) 687 | tangle_start_match = tangle_start_re.match (line) 688 | block_start_match = block_start_re.match (line) 689 | 690 | noweb_end_match = noweb_end_re.match (line) 691 | tangle_end_match = tangle_end_re.match (line) 692 | block_end_match = block_end_re.match (line) 693 | 694 | if (noweb_start_match): 695 | print ('NOWEB: ', noweb_start_match.group (0)) 696 | print ('name of the block: ', noweb_start_match.group (1)) 697 | elif (noweb_end_match): 698 | print ('NOWEB END: ', noweb_end_match.group (0)) 699 | elif (tangle_start_match): 700 | print ('TANGLE: ', tangle_start_match.group (0)) 701 | print ('name of the file: ', tangle_start_match.group (1)) 702 | elif (tangle_end_match): 703 | print ('TANGLE END: ', tangle_end_match.group (0)) 704 | elif (block_start_match): 705 | print ('BLOCK: ', block_start_match.group (0)) 706 | print ('name of the block: ', block_start_match.group (1)) 707 | if (block_end_match): 708 | print ('BLOCK END SAME LINE: ', block_end_match.group (0)) 709 | else: 710 | print ('BLOCK NO END') 711 | elif (block_end_match): 712 | print ('BLOCK END ANOTHER LINE: ', block_end_match.group (0)) 713 | else: 714 | pass 715 | ``` 716 | 717 | 718 | 719 | 720 | 721 | # TANGLEDOWN: Two Passes 722 | 723 | 724 | Tangledown passes once over the file to collect contents of `noweb` and `tangle` tags, and again over the `tangle` tags to expand `block` tags. In the second pass, Tangledown substitutes noweb contents for corresponding `block` tags until there are no more `block` tags, creating valid Python. 725 | 726 | 727 | ## First Pass: Saving Noweb and Tangle Blocks 728 | 729 | 730 | In the first pass over the file, we'll just save the contents of noweb and tangle into dictionaries, without expanding nested `block` tags. 731 | 732 | 733 | ### GET A FILE NAME 734 | 735 | 736 | `tangledown.py` is both a script and a module. As a script, you run it from the command line, so it gets its input file name from command-line arguments. As a module, called from another Python program, you probably want to give the file as an argument to a function, specifically, to `get_lines`. 737 | 738 | 739 | Let's write two functions, 740 | 741 | - `get_aFile`, which parses command-line arguments and produces a file name; the default file name is `README.md` 742 | 743 | - `get_lines`, which 744 | 745 | - gets lines, without processing `noweb`, `tangle`, or `block` tags, from its argument, `aFilename` 746 | 747 | - replaces `#raw` and `#endraw` fenceposts with blank lines 748 | 749 | - writes out the full file path to a secret place where the [Tangledown Kernel](#section-tangledown-kernel) can pick it up 750 | 751 | 752 | `get_aFile` can parse command-line arguments that come from either `python` on the command line or from a `Jupitext` notebook, which has a few kinds of command-line arguments we must ignore, namely command-line arguments that end in `.py` or in `.json`. 753 | 754 | 755 | ### GET LINES 756 | 757 | 758 | This method for getting a file name from the argument list will eat all options. It works for the Tangledown Kernel and for tangling down from a script or a notebook, but it's not future-proofed. Tangledown is still a toy. 759 | 760 | 761 | 762 | 763 | 764 | ```python 765 | print({f'len(sys.argv)': len(sys.argv), f'sys.argv': sys.argv}) 766 | ``` 767 | 768 | 769 | 770 | 771 | 772 | 773 | 774 | 775 | 776 | ```python 777 | def get_aFile() -> str: 778 | """Get a file name from the command-line arguments 779 | or 'README.md' as a default.""" 780 | 781 | aFile = 'README.md' # default 782 | if len(sys.argv) > 1: 783 | file_names = [p for p in sys.argv 784 | if (p[0] != '-') # option 785 | and (p[-3:] != '.py') 786 | and (p[-5:] != '.json')] 787 | if file_names: 788 | aFile = sys.argv[1] 789 | return aFile 790 | 791 | raw_line_re: re = re.compile(r'') 792 | def get_lines(fn: FileName) -> Lines: 793 | """Get lines from a file named fn. Replace 794 | 'raw' fenceposts with blank lines. Write full path to 795 | a secret place for the Tangledown kernel to pick it up. 796 | Return tuple of file path (for TangleUp's Tracer) and 797 | lines.""" 798 | 799 | xpath = save_aFile_path_for_kernel(fn) 800 | with open(fn) as f: 801 | in_lines: Lines = f.readlines () 802 | out_lines: Lines = [] 803 | for in_line in in_lines: 804 | out_lines.append( 805 | in_line if not raw_line_re.match(in_line) else "\n") 806 | return xpath, out_lines 807 | ``` 808 | 809 | 810 | 811 | 812 | 813 | ### NORMALIZE FILE PATH 814 | 815 | 816 | We must normalize file names so that, for example, "foo.txt" and "./foo.txt" indicate the same file and so that `~/` denotes the home directory on Mac and Linux. I didn't test this on Windows. 817 | 818 | 819 | 820 | 821 | 822 | ```python 823 | def anchor_is_tilde(path_str: str) -> bool: 824 | result = (path_str[0:2] == "~/") and (Path(path_str).anchor == '') 825 | return result 826 | 827 | def normalize_file_path(tangle_file_attribute: str) -> Path: 828 | result: Path = Path(tangle_file_attribute) 829 | if (anchor_is_tilde(tangle_file_attribute)): 830 | result = (Path.home() / tangle_file_attribute[2:]) 831 | return result.absolute() 832 | ``` 833 | 834 | 835 | 836 | 837 | 838 | ### SAVE A FILE PATH FOR THE KERNEL 839 | 840 | 841 | Returns its input file name after expanding its full path and saving the full path in a special place where the [Tangledown Kernel](#section-tangledown-kernel) can find it. 842 | 843 | 844 | 845 | 846 | 847 | ```python 848 | def save_aFile_path_for_kernel(fn: FileName) -> FileName: 849 | xpath: Path = Path.cwd() / Path(fn).name 850 | victim_file_name = str(xpath.absolute()) 851 | safepath: Path = Path.home() / '.tangledown/current_victim_file.txt' 852 | Path(safepath).parents[0].mkdir(parents=True, exist_ok=True) 853 | print(f"SAVING {victim_file_name} in secret place {str(safepath)}") 854 | with open(safepath, "w") as t: 855 | t.write(victim_file_name) 856 | return xpath 857 | ``` 858 | 859 | 860 | 861 | 862 | 863 | ### OH NO! THERE ARE TWO WAYS 864 | 865 | 866 | Turns out there are two ways to write code blocks in Markdown: 867 | 868 | 1. indented by four spaces, useful for quoted Markdown and quoted triple-backtick blocks 869 | 870 | 2. surrounded by triple backticks and _not_ indented. 871 | 872 | 873 | Tangledown must handle both ways. 874 | 875 | 876 | We use the trick of a harmless regex group --- regex stuff inside square brackets --- around one of the backticks in the regex that recognizes triple backticks. This regex is safe to run on itself. See `triple_backtick_re` in the code immediately below. 877 | 878 | 879 | The function `first_non_blank_line_is_triple_backtick`, in noweb `oh-no-there-are-two-ways` recognizes code blocks bracketed by triple backticks. The contents of this `noweb` tag is triple-bacticked, itself. Kind of a funny self-toast joke, no? 880 | 881 | 882 | Remember the _use-mention_ dichotomy from Philosophy class? No problem if you don't. 883 | 884 | 885 | When we're _talking about_ `noweb` and `tangle` tags, but don't want to process them, we indent the tags and the code blocks. Tangledown won't process indented `noweb` and `tangle` tags because the regexes in noweb `left_justified_regexes` won't match them. 886 | 887 | 888 | We can also talk about triple-backticked blocks by indenting them. Tangledown won't mess with indented triple-backticked blocks, because the regex needs them left-justified. Markdown also wont get confused, so we can quote whole markdown files by indenting them. Yes, your Literate Markdown can _also_, recursively, tangle out more Markdown files. How cool is that? Will the recursive jokes never end? 889 | 890 | 891 | [TangleUp](#tangleup) has a heuristic for placing language and id information on triple-backtick fence openers. Our function will retrieve those if present. 892 | 893 | 894 | We see, below, why the code tracks line numbers. We might do all this in some super-bitchin', sophomoric list comprehension, but this is more obvious-at-a-glance. That's a good thing. 895 | 896 | 897 | ### FIRST NON-BLANK LINE IS TRIPLE BACKTICK 898 | 899 | 900 | Match lines with left-justified triple-backtick. Pass through lines with indented triple-backtick. 901 | 902 | 903 | We must trace `raw` fenceposts, but not copy them to 904 | 905 | 906 | 907 | 908 | 909 | ```python 910 | triple_backtick_re = re.compile (r'^`[`]`((\w+)?\s*(id=([0-9a-fA-F-]+))?)') 911 | blank_line_re = re.compile (r'^\s*$') 912 | 913 | def first_non_blank_line_is_triple_backtick ( 914 | i: LineNumber, lines: Lines) -> Match[Line]: 915 | while (blank_line_re.match (lines[i])): 916 | i = i + 1 917 | yes = triple_backtick_re.match (lines[i]) 918 | language = "python" # default 919 | id_ = None # default 920 | if yes: 921 | language = yes.groups()[1] or language 922 | id_ = yes.groups()[3] ## can be 'None' 923 | return i, yes, language, id_ 924 | ``` 925 | 926 | 927 | 928 | 929 | 930 | ### ACCUMULATE CONTENTS 931 | 932 | 933 | Tangledown is a funny little compiler. It converts Literate Markdown to Python or other languages (Tangledown supports Clojure and Markdown, too). We could go nuts and write it in highfalutin' style, and then it would be much bigger, more elaborate, and easier to explain to a Haskell programmer. It might also be less of a toy. However, we want this toy Tangledown for now to be: 934 | 935 | - very short 936 | 937 | - independent of rich libraries like beautiful soup and parser combinators 938 | 939 | - completely obvious to anyone 940 | 941 | 942 | We'll just use iteration and array indices, but in a tasteful way so our functional friends won't puke. This is Python, after all, not Haskell! We can just _get it done_, with grace, panache, and aplomb. 943 | 944 | 945 | The function `accumulate_contents` accumulates the contents of left-justified `noweb` or `tangle` tags. The function starts at line `i` of the input, then figures out whether a tag's first non-blank line is triple backtick, in which case it _won't_ snip four spaces from the beginning of every line, and finally keeps going until it sees the closing fencepost, `` or ``. It returns a tuple of the line index _after_ the closing fencepost, and the contents, possibly de-dented. The function manipulates line numbers to skip over triple backticks. 946 | 947 | 948 | 949 | 950 | 951 | ```python 952 | def accumulate_contents ( 953 | lines: Lines, i: LineNumber, end_re: re) -> LinesTuple: 954 | r"""Harvest contents of a noweb or tangle tag. The start 955 | taglet was consumed by caller. Consume the end taglet.""" 956 | i, yes, language, id_ = first_non_blank_line_is_triple_backtick(i, lines) 957 | snip = 0 if yes else 4 958 | contents_lines: Lines = [] 959 | for j in range (i, len(lines)): 960 | if (end_re.match(lines[j])): 961 | return j + 1, language, id_, contents_lines # the only return 962 | if not triple_backtick_re.match (lines[j]): 963 | contents_lines.append (lines[j][snip:]) 964 | ``` 965 | 966 | 967 | 968 | 969 | 970 | ### NEW ACCUMULATE LINES 971 | 972 | 973 | The old `accumulate_lines` has reached the end of its life. It ignores raw cells, except for some hacks for raw noweb and tangle tags. The `new_accumulate_lines` must parse several kinds of line-sequences explicitly. Let's be careful not to call line-sequences _blocks_ so we don't confuse line-sequences with block tags. 974 | 975 | 1. Regular 976 | 977 | 978 | ### ACCUMULATE LINES 979 | 980 | 981 | The function `accumulate_lines` calls `accumulate_contents` to suck up the contents of all the left-justified `noweb` tags and `tangle` tags out of a file, but doesn't expand any `block` tags that it finds. It just builds up dictionaries, `noweb_blocks` and `tangle_files`, keyed by `name` or `file` attributes it finds inside `noweb` or `tangle` tags. 982 | 983 | 984 | 985 | 986 | 987 | ```python 988 | 989 | raw_start_re = re.compile("") 990 | raw_end_re = re.compile("") 991 | from pprint import pprint 992 | def accumulate_lines(fp: Path, lines: Lines) -> Tuple[Tracer, Nowebs, Tangles]: 993 | tracer = Tracer() 994 | tracer.fp = fp 995 | nowebs: Nowebs = {} 996 | tangles: Tangles = {} 997 | i = 0 998 | while i < len(lines): 999 | noweb_start_match = noweb_start_re.match (lines[i]) 1000 | tangle_start_match = tangle_start_re.match (lines[i]) 1001 | if noweb_start_match: 1002 | 1003 | elif tangle_start_match: 1004 | 1005 | elif raw_start_re.match (lines[i]): 1006 | 1007 | else: 1008 | 1009 | if in_between: # Close out final markdown. 1010 | tracer._end_betweens(i) 1011 | return tracer, nowebs, tangles 1012 | ``` 1013 | 1014 | 1015 | 1016 | 1017 | 1018 | #### ACCUMULATE LINES: HANDLE RAW 1019 | 1020 | 1021 | 1022 | 1023 | 1024 | ```python 1025 | pass 1026 | ``` 1027 | 1028 | 1029 | 1030 | 1031 | 1032 | #### ACCUMULATE LINES: HANDLE MARKDOWN 1033 | 1034 | 1035 | 1036 | 1037 | 1038 | ```python 1039 | in_between = True 1040 | tracer.add_markdown(i, lines[i]) 1041 | i += 1 1042 | ``` 1043 | 1044 | 1045 | 1046 | 1047 | 1048 | #### ACCUMULATE LINES: HANDLE NOWEB 1049 | 1050 | 1051 | 1052 | 1053 | 1054 | ```python 1055 | in_between = False 1056 | key: NowebName = noweb_start_match.group(1) 1057 | (i, language, id_, nowebs[key]) = \ 1058 | accumulate_contents(lines, i + 1, noweb_end_re) 1059 | tracer.add_noweb(i, language, id_, key, nowebs[key]) 1060 | ``` 1061 | 1062 | 1063 | 1064 | 1065 | 1066 | #### ACCUMULATE LINES: HANDLE TANGLE 1067 | 1068 | 1069 | 1070 | 1071 | 1072 | ```python 1073 | in_between = False 1074 | key: TangleFileName = \ 1075 | str(normalize_file_path(tangle_start_match.group(1))) 1076 | if not (key in tangles): 1077 | tangles[key]: Liness = [] 1078 | (i, language, id_, things) = accumulate_contents(lines, i + 1, tangle_end_re) 1079 | tangles[key] += [things] 1080 | tracer.add_tangle(i, language, id_, key, tangles[key]) 1081 | ``` 1082 | 1083 | 1084 | 1085 | 1086 | 1087 | ## DUDE! 1088 | 1089 | 1090 | There is a lot that can go wrong. We can have all kinds of mal-formed contents: 1091 | 1092 | - too many or not enough triple-backtick lines 1093 | - indentation errors 1094 | - broken tags 1095 | - mismatched fenceposts 1096 | - dangling tags 1097 | - misspelled names 1098 | - syntax errors 1099 | - infinite loops (cycles, hangs) 1100 | - much, much more 1101 | 1102 | 1103 | We'll get to error handling someday, maybe. Tangledown is [just a little toy at the moment](#disclaimer), but I thought it interesting to write about. If it's ever distributed to hostile users, then we will handle all the bad cases. But not now. Let's get the happy case right. 1104 | 1105 | 1106 | ## Second Pass: Expanding Blocks 1107 | 1108 | 1109 | Iterate over all the `noweb` or `tangle` tag contents and expand the 1110 | `block` tags we find in there, recursively. That means keep going until there are no more `block` tags, because nowebss are allowed (encouraged!) to refer to other nowebs via `block` tags. If there are cycles, this will hang. 1111 | 1112 | 1113 | ### DUDE! HANG? 1114 | 1115 | 1116 | We're doing the happy cases first, and will get to cycle detection someday, maybe. 1117 | 1118 | 1119 | ### THERE IS A BLOCK TAG 1120 | 1121 | 1122 | First, we need to detect that some list of lines contains a `block` tag, left-justified or not. That means we must keep running the expander on that list. 1123 | 1124 | 1125 | 1126 | 1127 | 1128 | ```python 1129 | def there_is_a_block_tag (lines: Lines) -> bool: 1130 | for line in lines: 1131 | block_start_match = block_start_re.match (line) 1132 | if (block_start_match): 1133 | return True 1134 | return False 1135 | ``` 1136 | 1137 | 1138 | 1139 | 1140 | 1141 | ### EAT A BLOCK TAG 1142 | 1143 | 1144 | If there is a `block` tag, we must eat the tag and its meaningless contents: 1145 | 1146 | 1147 | 1148 | 1149 | 1150 | ```python 1151 | def eat_block_tag (i: LineNumber, lines: Lines) -> LineNumber: 1152 | for j in range (i, len(lines)): 1153 | end_match = block_end_re.match (lines[j]) 1154 | # DUDE! Check leading whitespace against block_start_re 1155 | if (end_match): 1156 | return j + 1 1157 | else: # DUDE! 1158 | pass 1159 | ``` 1160 | 1161 | 1162 | 1163 | 1164 | 1165 | ### EXPAND BLOCKS 1166 | 1167 | 1168 | The following function does one round of block expansion. The caller must test whether any `block` tags remain, and keep running the expander until there are no more `block` tags. Our functional fu grandmaster might be appalled, but sometimes it's just easier to iterate than to recurse. 1169 | 1170 | 1171 | 1172 | 1173 | 1174 | ```python 1175 | def expand_blocks (tracer: Tracer, nowebs: Nowebs, lines: Lines, 1176 | language: str = "python") -> Lines: 1177 | out_lines = [] 1178 | block_key: NowebName = "" 1179 | for i in range (len (lines)): 1180 | block_start_match = block_start_re.match (lines[i]) 1181 | if (block_start_match): 1182 | leading_whitespace: str = block_start_match.group (1) 1183 | block_key: NowebName = block_start_match.group (2) 1184 | block_lines: Lines = nowebs [block_key] # DUDE! 1185 | i: LineNumber = eat_block_tag (i, lines) 1186 | for block_line in block_lines: 1187 | out_lines.append (leading_whitespace + block_line) 1188 | else: 1189 | out_lines.append (lines[i]) 1190 | return out_lines 1191 | ``` 1192 | 1193 | 1194 | 1195 | 1196 | 1197 | ## TRACER 1198 | 1199 | 1200 | For [TangleUp](#tangleup), we'll need to trace the entire operation of Tangledown, first and second passes. TangleUp reverses Tangledown, so we will want a best-effort reconstruction of the original Markdown file. 1201 | 1202 | 1203 | Our first approach will be a sequential list of dictionaries with all the needed information. 1204 | 1205 | 1206 | 1207 | 1208 | 1209 | ```python 1210 | from dataclasses import dataclass, field 1211 | from typing import Union ## TODO 1212 | @dataclass 1213 | class Tracer: 1214 | trace: List[Dict] = field(default_factory=list) 1215 | line_no = 0 1216 | current_betweens: Lines = field(default_factory=list) 1217 | fp: Path = None 1218 | # First Pass 1219 | 1220 | 1221 | 1222 | 1223 | 1224 | 1225 | # Second Pass 1226 | 1227 | 1228 | ``` 1229 | 1230 | 1231 | 1232 | 1233 | 1234 | ### TRACER.ADD_RAW 1235 | 1236 | 1237 | 1238 | 1239 | 1240 | ```python 1241 | def add_raw(self, i, between: Line): 1242 | self.line_no += 1 1243 | self.current_betweens.append((self.line_no, between)) 1244 | ``` 1245 | 1246 | 1247 | 1248 | 1249 | 1250 | ### TRACER.ADD_MARKDOWN 1251 | 1252 | 1253 | 1254 | 1255 | 1256 | ```python 1257 | def add_markdown(self, i, between: Line): 1258 | self.line_no += 1 1259 | self.current_betweens.append((self.line_no, between)) 1260 | ``` 1261 | 1262 | 1263 | 1264 | 1265 | 1266 | ### TRACER._END_BETWEENS 1267 | 1268 | 1269 | 1270 | 1271 | 1272 | ```python 1273 | def _end_betweens(self, i): 1274 | if self.current_betweens: 1275 | self.trace.append({"ending_line_number": self.line_no, "i": i, 1276 | "language": "markdown", "kind": 'between', 1277 | "text": self.current_betweens}) 1278 | self.current_betweens = [] 1279 | ``` 1280 | 1281 | 1282 | 1283 | 1284 | 1285 | ### TRACER.ADD_NOWEB 1286 | 1287 | 1288 | 1289 | 1290 | 1291 | ```python 1292 | def add_noweb(self, i, language, id_, key, noweb_lines): 1293 | self._end_betweens(i) 1294 | self.line_no = i 1295 | self.trace.append({"ending_line_number": self.line_no, "i": i, 1296 | "language": language, "id_": id_, 1297 | "kind": 'noweb', key: noweb_lines}) 1298 | ``` 1299 | 1300 | 1301 | 1302 | 1303 | 1304 | ### TRACER.ADD_TANGLE 1305 | 1306 | 1307 | 1308 | 1309 | 1310 | ```python 1311 | def add_tangle(self, i, language, id_, key, tangle_liness): 1312 | self._end_betweens(i) 1313 | self.line_no = i 1314 | self.trace.append({"ending_line_number": self.line_no, "i": i, 1315 | "language": language, "id_": id_, 1316 | "kind": 'tangle', key: tangle_liness}) 1317 | ``` 1318 | 1319 | 1320 | 1321 | 1322 | 1323 | ### TRACER.ADD_EXPANDED_NOWEB 1324 | 1325 | 1326 | 1327 | 1328 | 1329 | ```python 1330 | def add_expandedn_noweb(self, i, language, id_, key, noweb_lines): 1331 | self._end_betweens(i) 1332 | self.line_no = i 1333 | self.trace.append({"ending_line_number": self.line_no, "i": i, 1334 | "language": language, "id_": id_, 1335 | "kind": 'expanded_noweb', key: noweb_lines}) 1336 | ``` 1337 | 1338 | 1339 | 1340 | 1341 | 1342 | ### TRACER.ADD_EXPANDED_TANGLE 1343 | 1344 | 1345 | 1346 | 1347 | 1348 | ```python 1349 | def add_expanded_tangle(self, i, language, id_, key, tangle_liness): 1350 | self._end_betweens(i) 1351 | self.line_no = i 1352 | self.trace.append({"ending_line_number": self.line_no, "i": i, 1353 | "language": language, "id_": id_, 1354 | "kind": 'expanded_tangle', key: tangle_liness}) 1355 | ``` 1356 | 1357 | 1358 | 1359 | 1360 | 1361 | ### TRACER.DUMP 1362 | 1363 | 1364 | 1365 | 1366 | 1367 | ```python 1368 | def dump(self): 1369 | pr = self.fp.parent 1370 | fn = self.fp.name 1371 | fn2 = fn.translate(str.maketrans('.', '_')) 1372 | # Store the trace in the dir where the input md file is: 1373 | vr = f'tangledown_trace_{fn2}' 1374 | np = pr / (vr + ".py") 1375 | with open(np, "w") as fs: 1376 | print(f'sequential_structure = (', file=fs) 1377 | pprint(self.trace, stream=fs) 1378 | print(')', file=fs) 1379 | ``` 1380 | 1381 | 1382 | 1383 | 1384 | 1385 | # TANGLE IT, ALREADY! 1386 | 1387 | 1388 | Ok, you saw [at the top](#how-to-run) that the code in this here Markdown document, README.md, when run as a script, will read in all the lines in ... this here Markdown document, `README.md`. Bootstrapping! 1389 | 1390 | 1391 | But you have to run something first. For that, I tangled the code manually just 1392 | once and provide `tangledown.py` in the repository. The chicken definitely comes 1393 | before the egg. 1394 | 1395 | 1396 | But if you have the chicken (`tangledown.py`), you can import it as a module and execute the following cell, a copy of the one [at the top](#how-to-run). That should overwrite `tangledown.py` with the contents of this notebook or Markdown file. So our little bootstrapping technique will forever update the Tangledown compiler if you change it in this here README.md that you're reading right now! 1397 | 1398 | ```python 1399 | from tangledown import get_lines, accumulate_lines, tangle_all 1400 | tangle_all(*accumulate_lines(*get_lines("README.md"))) 1401 | ``` 1402 | 1403 | # TODO 1404 | 1405 | 1406 | - IN-PROGRESS: more examples, specifically, a test-generator in Clojure in subdirectory `examples/asr`. 1407 | - IN-PROGRESS: TangleUp 1408 | - NOT-STARTED: Have the Tangledown Kernel, when evaluating tangle-able cells, write them out one at a time. Without this feature, the only way to write out files is to tangle the entire notebook. Possibly do these as cell magics. 1409 | - NOT-STARTED: Research cell magics for `noweb` and `tangle` cells. 1410 | - NOT-STARTED: error handling (big job) 1411 | - NOT-STARTED: type annotations for the kernel 1412 | - DONE: convert relative file paths to absolute 1413 | - DONE: modern Pythonic Type Annotation (PEP 484) 1414 | - DONE: use pathlib to compare tangle file names 1415 | - DONE: somehow get the Tangledown Kernel to tangle everything automatically when it's restarted 1416 | - DONE: Support multiple instances of the Tangledown Kernel. Because it reads files with fixed names in the home directory, it has no way of processing multiple Tangledown notebooks. 1417 | - DONE: investigate [Papermill](https://papermill.readthedocs.io/en/latest/) as a solution 1418 | - DONE: find out whether pickle is a better alternative to json for dumping dictionaries for the kernel 1419 | - DONE: Jupytext kernel for `tangledown` so we can run `noweb` and `block` tags that have `block` tags in them. 1420 | 1421 | 1422 | ## DUDE! 1423 | 1424 | 1425 | Some people write "TODO" in their code so they can find all the spots where they thought they might have trouble but didn't have time to write the error-handling (prophylactic) code at the time. I like to write "DUDE" because it sounds like both "TODO" but is more RUDE (also sounds like "DUDE") and funny. This story is supposed to be amusing. 1426 | 1427 | 1428 | ## KNOWN BUGS 1429 | 1430 | 1431 | I must apologize once again, but this is just a toy at this point! Recall the [DISCLAIMER](#disclaimer). The following are stackranked from highest to lowest priority. 1432 | 1433 | 1434 | 1. FIXED: writing to "tangledown.py" and to "./tangledown.py" clobbers the file rather than appending. Use pathlib to compare filenames rather than string comparison. 1435 | 2. FIXED: tangling to files in the home directory via `~` does not work. We know one dirty way to fix it, but proper practice with pathlib is a better answer. 1436 | 1437 | 1438 | # TANGLEUP DESIGN AND IMPLEMENTATION 1439 | 1440 | 1441 | ## TANGLEUP TENETS 1442 | 1443 | 1444 | 1. Keep source tree and Literate Markdown consistent. 1445 | 1446 | 1447 | ## NON-REAL-TIME 1448 | 1449 | 1450 | We'll start with a non-real-time solution. You'll manually run `tangleup` to put modified source back into the Markdown. Later, we'll do something that can track changes on disk and update the Markdown in real time. 1451 | 1452 | 1453 | When you modify your source tree, `tangleup` puts the modified code back into the Markdown file with reminders to _detangle_ and to _write_. There are two cases: 1454 | 1455 | 1. You modified some source that corresponds to an existing noweb block in the Markdown. 1456 | 1457 | 2. You added some source that doesn't yet correspond to a noweb block in the Markdown. 1458 | 1459 | 1460 | To assist TangleUp, Tangledown records unique names for existing noweb blocks along with the tangled source. Tangledown also records robust locations for existing blocks. _Robust_ means that the boundary locations are flexible: starting and ending line and character positions in a source file are not enough because changing an early one invalidates all later ones. 1461 | 1462 | 1463 | ## NO PRE-EXISTING MARKDOWN 1464 | 1465 | 1466 | We don't need the trace file for this case. 1467 | 1468 | 1469 | Enumerate all the files in a directory tree. Pair each file name with a short, unique name for the nowebs. TODO: ignore files and directories listed in the `.gitignore`. 1470 | 1471 | ```python 1472 | %pip install gitignore-parser 1473 | ``` 1474 | 1475 | ### TANGLEUP FILES LIST 1476 | 1477 | 1478 | 1479 | 1480 | 1481 | ```python 1482 | 1483 | def files_list(dir_name: str) -> List[str]: 1484 | dir_path = Path(dir_name) 1485 | files_result = [] 1486 | nyms_result = [] 1487 | file_count = 0 1488 | 1489 | 1490 | 1491 | find_first_gitignore() 1492 | recurse_a_dir(dir_path) 1493 | assert file_count == len(nyms_collision_check) 1494 | return list(zip(files_result, nyms_result)) 1495 | ``` 1496 | 1497 | 1498 | 1499 | 1500 | 1501 | #### RECURSE A DIR 1502 | 1503 | 1504 | The only complexity, here, is ignoring `.git` and files in `.gitignore` 1505 | 1506 | 1507 | 1508 | 1509 | 1510 | ```python 1511 | def recurse_a_dir(dir_path: Path) -> None: 1512 | for p in dir_path.glob('*'): 1513 | q = p.absolute() 1514 | qs = str(q) 1515 | try: # don't skip files in dirs above .gitignore 1516 | ok = not in_gitignore(qs) 1517 | except ValueError as e: # one absolute and one relative? 1518 | ok = True 1519 | if p.name == '.git': 1520 | ok = False 1521 | if not ok: 1522 | pprint(f'... IGNORING file or dir {p}') 1523 | if ok and q.is_file(): 1524 | nonlocal file_count # Assignment requires 'nonlocal' 1525 | file_count += 1 1526 | nyms_result.append(gsnym(q)) # 'nonlocal' not required 1527 | files_result.append(qs) # because not ass'gt but mutation 1528 | elif ok and p.is_dir: 1529 | recurse_a_dir(p) 1530 | ``` 1531 | 1532 | 1533 | 1534 | 1535 | 1536 | #### UNIQUE NAMES 1537 | 1538 | 1539 | Correct for collisions, which will be really rare, so there is a negligible effect on speed. 1540 | 1541 | 1542 | 1543 | 1544 | 1545 | ```python 1546 | nyms_collision_check = set() 1547 | 1548 | def gsnym(p: Path) -> str: 1549 | """Generate a short, unique name for a path.""" 1550 | nym = gsnym_candidate(p) 1551 | while nym in nyms_collision_check: 1552 | nym = gsnym_candidate(p) 1553 | nyms_collision_check.add(nym) 1554 | return nym 1555 | 1556 | 1557 | def gsnym_candidate(p: Path) -> str: 1558 | """Generate a candidate short, unique name for a path.""" 1559 | return p.stem + '_' + uuid.uuid4().hex[:6].upper() 1560 | ``` 1561 | 1562 | 1563 | 1564 | 1565 | 1566 | #### IGNORE FILES IN GITIGNORE 1567 | 1568 | 1569 | Find the first `.gitignore` in a directory tree. Parse it to produce a function that tests whether a file must be ignored by TangleUp. 1570 | 1571 | 1572 | 1573 | 1574 | 1575 | ```python 1576 | in_gitignore = lambda _: False 1577 | 1578 | def find_first_gitignore() -> Path: 1579 | p = dir_path 1580 | for p in dir_path.rglob('*'): 1581 | if p.name == '.gitignore': 1582 | in_gitignore = parse_gitignore(str(p.absolute())) 1583 | break; 1584 | return p 1585 | ``` 1586 | 1587 | 1588 | 1589 | 1590 | 1591 | ### TANGLEUP IMPORTS 1592 | 1593 | 1594 | 1595 | 1596 | 1597 | ```python 1598 | from pathlib import Path 1599 | from typing import List 1600 | import uuid 1601 | from gitignore_parser import parse_gitignore 1602 | from pprint import pprint 1603 | ``` 1604 | 1605 | 1606 | 1607 | 1608 | 1609 | ### WRITE NOWEB TO LINES 1610 | 1611 | 1612 | Now write the contents of each Python or Clojure file to a noweb block with its ginned-up name and a corresponding tangle block. Parenthetically, this just _screams_ for the Writer monad, but we'll just do it by hand in an obvious, kindergarten way.files_result 1613 | 1614 | 1615 | **WARNING**: The explicit '\n' newlines probably won't work on Windows. 1616 | 1617 | 1618 | 1619 | 1620 | 1621 | ```python 1622 | from typing import Tuple 1623 | from pprint import pprint 1624 | 1625 | 1626 | 1627 | 1628 | def write_noweb_to_lines(lines: List[str], 1629 | file_gsnym_pair: Tuple[str], 1630 | language: str) -> None: 1631 | path = Path(file_gsnym_pair[0]) 1632 | wrap_n_blank(lines, [f'## {path.name}\n']) 1633 | wrap_1_raw(lines, f'\n') 1634 | with open(file_gsnym_pair[0]) as f: 1635 | try: 1636 | inlines = f.readlines() 1637 | except UnicodeDecodeError as e: 1638 | pprint(f'... SKIPPING UNDECODABLE FILE {path}') 1639 | return 1640 | pprint(f'DETANGLING file {path}') 1641 | bound = [] ## Really want the monadic bind, here. 1642 | if language == "markdown": 1643 | indent_4(bound, inlines) 1644 | else: 1645 | wrap_triple_backtick(bound, inlines, language) 1646 | wrap_n_blank(lines, bound) 1647 | wrap_1_raw(lines, '\n') 1648 | lines.append(BLANK_LINE) 1649 | ``` 1650 | 1651 | 1652 | 1653 | 1654 | 1655 | #### WRAP ONE LINE AS RAW 1656 | 1657 | 1658 | 1659 | 1660 | 1661 | ```python 1662 | BEGIN_RAW = '\n' 1663 | END_RAW = '\n' 1664 | def wrap_1_raw(lines: List[str], s: str) -> None: 1665 | lines.append(BEGIN_RAW) 1666 | lines.append(s) 1667 | lines.append(END_RAW) 1668 | ``` 1669 | 1670 | 1671 | 1672 | 1673 | 1674 | #### WRAP SEVERAL LINES IN BLANK LINES 1675 | 1676 | 1677 | 1678 | 1679 | 1680 | ```python 1681 | BLANK_LINE = '\n' 1682 | def wrap_n_blank(lines: List[str], ss: List[str]) -> None: 1683 | lines.append(BLANK_LINE) 1684 | for s in ss: 1685 | lines.append(s) 1686 | lines.append(BLANK_LINE) 1687 | ``` 1688 | 1689 | 1690 | 1691 | 1692 | 1693 | #### WRAP LINES IN TRIPLE BACKTICKS 1694 | 1695 | 1696 | 1697 | 1698 | 1699 | ```python 1700 | def wrap_triple_backtick(lines: List[str], 1701 | ss: List[str], 1702 | language: str) -> None: 1703 | lines.append(f'```{language}\n') 1704 | for s in ss: 1705 | lines.append(s) 1706 | lines.append(f'```\n') 1707 | ``` 1708 | 1709 | 1710 | 1711 | 1712 | 1713 | #### INDENT ALL LINES FOUR SPACES 1714 | 1715 | 1716 | 1717 | 1718 | 1719 | ```python 1720 | def indent_4(lines: List[str], ss: List[str]): 1721 | for s in ss: 1722 | lines.append(' ' + s) 1723 | ``` 1724 | 1725 | 1726 | 1727 | 1728 | 1729 | ### WRITE TANGLE TO LINES 1730 | 1731 | 1732 | 1733 | 1734 | 1735 | ```python 1736 | def write_tangle_to_lines(lines: List[str], 1737 | file_gsnym_pair: Tuple[str], 1738 | language: str) -> List[str]: 1739 | wrap_1_raw(lines, f'\n') 1740 | bound = [] 1741 | wrap_triple_backtick(bound, 1742 | [f'\n'], 1743 | language) 1744 | wrap_n_blank(lines, bound) 1745 | wrap_1_raw(lines, f'\n') 1746 | ``` 1747 | 1748 | 1749 | 1750 | 1751 | 1752 | ### TANGLEUP OVERWRITE MARKDOWN 1753 | 1754 | 1755 | Test the whole magillah, the up direction. You may have to backpatch some 'language' names when you open the markdown, but 'language' only affects syntax coloring. 1756 | 1757 | 1758 | 1759 | 1760 | 1761 | ```python 1762 | 1763 | 1764 | 1765 | def tangleup_overwrite_markdown( 1766 | output_markdown_filename: str, 1767 | input_directory: str, 1768 | title: str = "Untitled") -> None: 1769 | pprint(f'WRITING LITERATE MARKDOWN to file {output_markdown_filename}') 1770 | file_gsnym_pairs = files_list(input_directory) 1771 | lines: List[str] = [f'# {title}\n\n'] 1772 | for pair in file_gsnym_pairs: 1773 | p = Path(pair[0]) 1774 | if p.suffix == '.clj': 1775 | language = f'clojure id={uuid.uuid4()}' 1776 | elif p.suffix == '.py': 1777 | language = f'python id={uuid.uuid4()}' 1778 | elif p.suffix == '.md': 1779 | language = 'markdown' 1780 | else: 1781 | language = '' 1782 | write_noweb_to_lines(lines, pair, language) 1783 | write_tangle_to_lines(lines, pair, language) 1784 | import json 1785 | 1786 | with open(output_markdown_filename, "w") as f: 1787 | for line in lines: 1788 | f.write(line) 1789 | pass 1790 | ``` 1791 | 1792 | 1793 | 1794 | 1795 | 1796 | ## YES PRE-EXISTING MARKDOWN 1797 | 1798 | 1799 | ### NO CHANGES ON DISK 1800 | 1801 | 1802 | If there are no changes to the tangled files on disk, then we must merely reassemble the nowebs, tangles, and block tags from the files on disk. On its first pass, Tangledown recorded the structure of nowebs and tangles and of the Markdown that surrounds them. When detangling a file: 1803 | 1804 | 1. look for every tangle that mentions that file 1805 | 1806 | 1807 | ### YES CHANGES ON DISK 1808 | 1809 | 1810 | #### CHANGES TO EXISTING CONTENTS 1811 | 1812 | 1813 | #### NEW CONTENTS 1814 | 1815 | 1816 | #### DELETED CONTENTS 1817 | 1818 | 1819 | ### FIRST SHOT 1820 | 1821 | 1822 | **PRO TIP**: For the Tangldown Kernel, if your little scripts contain noweb tags, surround them with tangle to `/dev/null`, reload the kernel spec, restart the kernel, then you can run them in the notebook. 1823 | 1824 | 1825 | 1826 | 1827 | ```python 1828 | from pprint import pprint 1829 | from tangledown_trace_foobar_md import sequential_structure as cells 1830 | pprint(cells) 1831 | fn = "tanglup_foobar.md" 1832 | line_no = 0 1833 | for cell in cells: 1834 | if cell["kind"] == "between": 1835 | 1836 | elif cell["kind"] == "noweb": 1837 | 1838 | elif cell["kind"] == "tangle": 1839 | 1840 | else: 1841 | assert False, f"unknown kind: {cell['kind']}" 1842 | ``` 1843 | 1844 | 1845 | 1846 | 1847 | 1848 | 1849 | 1850 | ```python 1851 | pass 1852 | ``` 1853 | 1854 | 1855 | 1856 | 1857 | 1858 | 1859 | 1860 | 1861 | 1862 | ```python 1863 | pass 1864 | ``` 1865 | 1866 | 1867 | 1868 | 1869 | 1870 | 1871 | 1872 | 1873 | 1874 | ```python 1875 | pass 1876 | ``` 1877 | 1878 | 1879 | 1880 | 1881 | 1882 | ## UNIT TESTS 1883 | 1884 | 1885 | ### NO PRE-EXISTING MARKDOWN FILE 1886 | 1887 | 1888 | Run these at the console for now. 1889 | 1890 | 1891 | 1892 | 1893 | 1894 | ```python 1895 | 1896 | if __name__ == "__main__": 1897 | tangleup_overwrite_markdown( 1898 | "asr_tangleup_test.md", 1899 | "./examples", 1900 | title="This is a First Test of the Emergency Tangleup System") 1901 | ``` 1902 | 1903 | 1904 | 1905 | 1906 | 1907 | 1908 | 1909 | 1910 | 1911 | ```python 1912 | 1913 | if __name__ == "__main__": 1914 | tangleup_overwrite_markdown( 1915 | "tangleup-test.md", 1916 | ".", 1917 | title="This is a Second Test of the Emergency Tangleup System") 1918 | ``` 1919 | 1920 | 1921 | 1922 | 1923 | 1924 | # APPENDIX: Developer Notes 1925 | 1926 | 1927 | If you change the code in this README.md and you want to test it by running the cell in Section [Tangle It, Already!](#tangle-already), you usually must restart whatever Jupyter kernel you're running because Jupytext caches code. If things continue to not make sense, try restarting the notebook server. It rarely but occasionally produces incorrect answers for more obscure reasons. 1928 | 1929 | 1930 | # APPENDIX: Tangledown Kernel 1931 | 1932 | 1933 | The Tangledown kernel is ***OPTIONAL***, but nice. Everything I talked about so far works fine without it, but the Tangledown Kernel lets you evaluate Jupytext notebook cells that have `block` tags in them. For example, you can run Tangledown on Tangledown itself in this notebook just by evaluating the cell that contains all of Tangledown, including the source for the kernel, [here](#tangle-listing-tangle-all). 1934 | 1935 | 1936 | The Tangledown Compiler writes the full path of the current Markdown file corresponding to the current notebook to fixed place in the home directory, and the Tangledown Kernel reads gets all the nowebs from there. 1937 | 1938 | 1939 | > If you run more than one instance of the Tangledown Kernel at one time on your machine, you must ***RETANGLE THE FILE AND RESTART THE TANGLEDOWN KERNEL WHEN YOU SWITCH NOTEBOOKS*** because the name of the current file is a fixed singleton. The Tangledown Kernel has no way to dynamically know what file you're working with. Sorry about that! 1940 | 1941 | 1942 | ## Installing the Tangledown Kernel 1943 | 1944 | 1945 | After you tangle the code out of this here README.md at least once, you will have two new files 1946 | - `./tangledown_kernel/tangledown_kernel.py` 1947 | - `./tangledown_kernel/kernel.json` 1948 | 1949 | 1950 | You must inform Jupyter about your new kernel. The following works for me on the Mac. It might be different on your machine: 1951 | 1952 | ```bash 1953 | jupyter kernelspec install --user tangledown_kernel 1954 | ``` 1955 | 1956 | ## Running the Tangledown Kernel 1957 | 1958 | 1959 | You must put the source for the Tangledown Kernel somewhere Python can find it before you start Jupyter Lab. One way is to modify the `PYTHONPATH` environment variable. The following works for me on the Mac: 1960 | 1961 | 1962 | ``` 1963 | PYTHONPATH=".:/Users/brian/Library/Jupyter/kernels/tangledown_kernel" jupyter lab 1964 | ``` 1965 | 1966 | 1967 | Once the kernel is installed, there are multiple ways to run it in Jupyter Lab. When you first open a notebook, you get a menu. The default is the regular Python 3 kernel, and it works fine, but you won't be able to run cells that have `block` tags in them. If you choose the Tangledown Kernel, you can run such cells. 1968 | 1969 | 1970 | If you modify the kernel: 1971 | 1972 | 1. re-tangle the kernel source, say by running the cell in [this section](#how-to-run) 1973 | 2. re-install the kernel by running the little bash script above 1974 | 3. restart the kernel inside the notebook 1975 | 1976 | 1977 | Most of the time, you don't have to restart Jupyter Lab itself, but sometimes after a really bad bug, you might have to. 1978 | 1979 | 1980 | ## Source for the Tangledown Kernel 1981 | 1982 | 1983 | Adapted from [these official docs](https://jupyter-client.readthedocs.io/en/latest/wrapperkernels.html). 1984 | 1985 | 1986 | The kernel calls [`expand_tangles`](#expand-tangles) after reformatting the lines a little. We learned about the reformatting by experiment. We explain `expand_tangles` [here](#expand-tangles) in the [section about Tangledown itself](#tangle-listing-tangle-all). The rest of this is boilerplate from the [official kernel documentation](https://jupyter-client.readthedocs.io/en/stable/wrapperkernels.html). There is no point, by the way, in running the cell below in any kernel. It's meant for the Jupyterlab startup engine, only. You just need to tangle it out and install it, as above. 1987 | 1988 | 1989 | > **NOTE**: You will get errors if you run this cell in the notebook. 1990 | 1991 | 1992 | TODO: plumb a Tracer through here? 1993 | 1994 | 1995 | 1996 | 1997 | 1998 | ```python 1999 | 2000 | class TangledownKernel(IPythonKernel): 2001 | 2002 | async def do_execute(self, code, silent, store_history=True, user_expressions=None, 2003 | allow_stdin=False): 2004 | if not silent: 2005 | cleaned_lines = [line + '\n' for line in code.split('\n')] 2006 | # HERE'S THE BEEF! 2007 | expanded_code = expand_tangles(None, [cleaned_lines], self.nowebs) 2008 | reply_content = await super().do_execute( 2009 | expanded_code, silent, store_history, user_expressions) 2010 | stream_content = { 2011 | 'name': 'stdout', 2012 | 'text': reply_content, 2013 | } 2014 | self.send_response(self.iopub_socket, 'stream', stream_content) 2015 | return {'status': 'ok', 2016 | # The base class increments the execution count 2017 | 'execution_count': self.execution_count, 2018 | 'payload': [], 2019 | 'user_expressions': {}, 2020 | } 2021 | if __name__ == '__main__': 2022 | from ipykernel.kernelapp import IPKernelApp 2023 | IPKernelApp.launch_instance(kernel_class=TangledownKernel) 2024 | ``` 2025 | 2026 | 2027 | 2028 | 2029 | 2030 | 2031 | 2032 | 2033 | 2034 | ```python 2035 | from ipykernel.ipkernel import IPythonKernel 2036 | from pprint import pprint 2037 | import sys # for version_info 2038 | from pathlib import Path 2039 | from tangledown import \ 2040 | accumulate_lines, \ 2041 | get_lines, \ 2042 | expand_tangles 2043 | ``` 2044 | 2045 | 2046 | 2047 | 2048 | 2049 | ## KERNEL INSTANCE VARIABLES 2050 | 2051 | 2052 | These get indented on expansion because the `block` tag is indented. You could do it the other way: indent the code here and DON'T indent the block tag, but that would be ugly, wouldn't it? 2053 | 2054 | 2055 | Notice this kernel runs Tangledown on the full file path that's stored in `current_victim_file.txt`. That file path got [written to that special place](#save-afile-path-for-kernel) when you tangled the file the first time. This may explain why you must tangle the file once and then restart the kernel whenever you switch notebooks that are running the Tangledown Kernel. 2056 | 2057 | 2058 | 2059 | 2060 | 2061 | ```python 2062 | current_victim_filepath = "" 2063 | with open(Path.home() / '.tangledown/current_victim_file.txt') as v: 2064 | fp = v.read() 2065 | tracer_, nowebs, tangles_ = accumulate_lines(*get_lines(fp)) 2066 | implementation = 'Tangledown' 2067 | implementation_version = '1.0' 2068 | language = 'no-op' 2069 | language_version = '0.1' 2070 | language_info = { # for syntax coloring 2071 | "name": "python", 2072 | "version": sys.version.split()[0], 2073 | "mimetype": "text/x-python", 2074 | "codemirror_mode": {"name": "ipython", "version": sys.version_info[0]}, 2075 | "pygments_lexer": "ipython%d" % 3, 2076 | "nbconvert_exporter": "python", 2077 | "file_extension": ".py", 2078 | } 2079 | banner = "Tangledown kernel - expanding 'block' tags" 2080 | ``` 2081 | 2082 | 2083 | 2084 | 2085 | 2086 | ## Kernel JSON Installation Helper 2087 | 2088 | 2089 | 2090 | 2091 | 2092 | ```json 2093 | {"argv":["python","-m","tangledown_kernel", "-f", "{connection_file}"], 2094 | "display_name":"Tangledown" 2095 | } 2096 | ``` 2097 | 2098 | 2099 | 2100 | 2101 | 2102 | # APPENDIX: Experimental Playground 2103 | 2104 | ```python 2105 | 2106 | ``` 2107 | --------------------------------------------------------------------------------