├── .gitignore ├── IMPLEMENTATION.md ├── LICENSE.md ├── Makefile ├── README.md ├── dune ├── dune-project ├── examples ├── Makefile ├── echo.lp ├── sortlines.html ├── sortlines.lp └── sortlines.md ├── lipsum.1 ├── lipsum.opam ├── lipsum.pod ├── src ├── dune ├── escape.mll ├── litprog.ml ├── litprog.mli ├── main.ml ├── parser.mly ├── scanner.mll ├── tangle.ml ├── tangle.mli ├── weave.ml └── weave.mli └── test ├── a.lp ├── b.lp ├── function.sh └── test.lp /.gitignore: -------------------------------------------------------------------------------- 1 | _build 2 | .merlin 3 | *.install 4 | *.url 5 | 6 | -------------------------------------------------------------------------------- /IMPLEMENTATION.md: -------------------------------------------------------------------------------- 1 | 2 | # The Lipsum Implementation 3 | 4 | This document provides an overview for the implementation of Lipsum. The 5 | heart of the implementation is module `Litprog` that captures in type `t` a 6 | literate program as it is read from a file and from which chunks are 7 | expanded. 8 | 9 | The program is driven by the main module `Lipsum` which evaluates the 10 | command line, handles files, and error messages. 11 | 12 | ## Parser and Abstract Datatype `Lipsum.t` 13 | 14 | Lipsum employs a scanner and parser to create an abstract syntax for an 15 | input file which then is turned by `Lipsum.make` into a `Litprog.t` value. 16 | The parser captures the high-level structure of an input file: a sequence 17 | of code and documentation chunks. 18 | 19 | 20 | litprog : chunks EOF 21 | | STR chunks EOF 22 | 23 | chunks : chunks chunk 24 | | 25 | 26 | chunk : code 27 | | doc 28 | 29 | doc : AT STR 30 | 31 | code : DEF body 32 | 33 | body : body STR 34 | | body REF 35 | | 36 | 37 | Scanning chops the input file into a sequence of tokens like `DEF`, `STR` 38 | or `AT` which are consumed by the parser. These tokens are defined in the 39 | parser but created by the scanner. 40 | 41 | A document is represented both as a sequence of chunks and additionally as 42 | a hash table that maps names (of chunks) to code and references to other 43 | chunks. While such a document is built by `Litprog.make` it is not checked 44 | that all referenced chunks indeed exist and neither is the absence of 45 | cycles checked. 46 | 47 | ## Scanning 48 | 49 | Creating a sequence of tokens in the scanner is the trickiest part of the 50 | implementation. Typically each token is defined by a regular expression 51 | that captures its content. The is easy for tokens that represent a `<` in angle brackets. It is, however, difficult to capture the content 53 | that makes up the chunks because it is delimited by syntax like `<` or `@` but there is no pattern to capture the content itself. The 55 | scanner therefore reads an arbitrary string and the following delimiter but 56 | must return these as two tokens. It does this using the `return` function. 57 | The `token` function in the scanner stores one value and returns it the 58 | next time it is called from the parser. 59 | 60 | ## Tangling 61 | 62 | Module `Tangle` implements emitting a named chunk by recursively expanding 63 | all chunks that are referenced within it. It must detect cycles to avoid an 64 | infinite loop during chunk expansion. 65 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | https://github.com/lindig/lipsum.git 2 | Copyright (c) 2012, 2013, 2014, 2015 3 | Christian Lindig 4 | All rights reserved. 5 | 6 | Redistribution and use in source and binary forms, with or 7 | without modification, are permitted provided that the following 8 | conditions are met: 9 | 10 | (1) Redistributions of source code must retain the above copyright 11 | notice, this list of conditions and the following disclaimer. 12 | (2) Redistributions in binary form must reproduce the above copyright 13 | notice, this list of conditions and the following disclaimer in 14 | the documentation and/or other materials provided with the 15 | distribution. 16 | 17 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND 18 | CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, 19 | INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 20 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 21 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR 22 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 23 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 24 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF 25 | USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 26 | AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 27 | LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 28 | ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 29 | POSSIBILITY OF SUCH DAMAGE. 30 | 31 | This program includes a library for regular expressions that 32 | is available from https://github.com/ocaml/ocaml-re.git. 33 | It was released under the GNU LESSER GENERAL PUBLIC LICENSE 34 | and was written by Jerome Vouillon 35 | 36 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | # vim: set noet ts=8: 2 | # 3 | # This Makefile is not called from Opam but only used for 4 | # convenience during development 5 | # 6 | 7 | DUNE = dune 8 | POD2MAN = pod2man $(PODOPTS) 9 | PODOPTS = --center="Christian Lindig" --name="lipsum" --release="2017" 10 | 11 | all: lipsum.1 12 | $(DUNE) build -p lipsum 13 | 14 | install: 15 | $(DUNE) install 16 | 17 | profile: 18 | $(DUNE) build --profile=profile 19 | 20 | clean: 21 | $(DUNE) clean 22 | 23 | man: lipsum.1 24 | 25 | %: 26 | $(DUNE) build $@ 27 | 28 | %.1: %.pod 29 | $(POD2MAN) $< > $@ 30 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | 2 | ### Lipsum - Literate Programming Simplified 3 | 4 | Lipsum is a command-line utility for literate programming. It stands in the 5 | tradition of [Noweb](http://www.cs.tufts.edu/~nr/noweb/), a popular and 6 | flexible literate programming system by Norman Ramsey. The idea of literate 7 | programming is to keep documentation for programmers and program code in 8 | one file and to arrange it in a way that helps understanding it best. To 9 | actually compile or run the code it needs to be extracted from the literate 10 | program and Lipsum is a tool to do this. 11 | 12 | Like Noweb, Lipsum employs a minimal markup to arrange documentation and 13 | code in a file. Also like Noweb, Lipsum is language agnostic and can be 14 | used for almost any programming language and documentation. 15 | 16 | @ Echo prints each command line argument on a line by itself. This 17 | documentation chunk starts with @ and extends until the beginning 18 | of the named code chunk below. A lipsum file (`*.lp` by convention) 19 | is a sequence of code and documentation chunks. Each chunk extends 20 | until the beginning of the next one (or the end of file.) 21 | 22 | <>= 23 | /* <> */ 24 | #include 25 | 26 | int main(int argc, char** argv) 27 | { 28 | int i; 29 | for (i=0; i>= 42 | This code is in the public domain. 43 | 44 | @ Below we are extending the code chunk above. 45 | 46 | <>= 47 | This code is part of the documentation for Lipsum. 48 | 49 | 50 | To extract the code for `echo.c` for compilation from the file `echo.lp` 51 | using Lipsum, one would run Lipsum like this: 52 | 53 | $ lipsum expand echo.c echo.lp 54 | $ cc -o echo echo.c 55 | 56 | ## Important Commands 57 | 58 | * `lipsum tangle`: extract source code to stdout 59 | * `lipsum expand`: extract source code to file 60 | * `lipsum weave`: format input 61 | * `lipsum roots`: emit names of root chunks to stdout 62 | 63 | For more information, see the manual page lipsum(1) and invoke `lipsum 64 | --help`. 65 | 66 | ## Installation from Opam 67 | 68 | Lipsum is available via the OCaml package manager Opam: 69 | 70 | $ opam install lipsum 71 | 72 | This will install a binary and the manual page. See also below for how to 73 | obtain the source code from GitHub. 74 | 75 | It can be also compiled from sources. Take a look a `lipsum.opsm` for 76 | dependencies: 77 | 78 | $ make 79 | 80 | ## Resources for Literate Programming 81 | 82 | While literate programming isn't a mass phenomenon among programmers it has 83 | a dedicated following. Here are some resources to learn about its concepts, 84 | strengths, and weaknesses. 85 | 86 | * [Noweb Homepage](http://www.cs.tufts.edu/~nr/noweb/) 87 | * [Noweb on Wikipedia](http://en.wikipedia.org/wiki/Noweb) 88 | * [Literate Programming on 89 | Wikipedia](http://en.wikipedia.org/wiki/Literate_programming) 90 | 91 | Literate programming enjoys popularity in the [R](www.r-project.org/) 92 | community which uses a literate programming system called Sweave which is 93 | also in the tradition of Noweb. R is a system for statistical analysis and 94 | Sweave is mainly used to include statistical analysis into scientific 95 | papers that are typeset with LaTeX. 96 | 97 | ## Why not using Noweb? 98 | 99 | Noweb is a great tool with a flexible architecture that permits a user to 100 | plug in filters to extend it. This makes its installation depend on various 101 | filters that are part of its distribution and that are written in various 102 | languages. While this is usually not a problem if you develop code mostly 103 | for yourself, it adds one more dependency if you want to release code as 104 | open source. 105 | 106 | Lipsum is less ambitious: it is just one binary and almost all it does is 107 | extracting code from a literate program. I am planning to use it in 108 | combination with Markdown as a syntax for documentation and to include it 109 | with literate programs that I release as open source. 110 | 111 | ## Documentation 112 | 113 | Lipsum comes with a Unix manual page `lipsum.1` that is generated from 114 | [`lipsum.pod`](lipsum/blob/master/lipsum.pod). POD is a simple markup 115 | language, much like Markdown, that is used by the Perl community. To view 116 | the manual page prior to installation use `nroff`: 117 | 118 | $ nroff -man lipsum.1 | less 119 | 120 | After installation it is available using `man lipsum` as usual. 121 | 122 | Lipsum provides minimal online help via `--help` options for all its sub 123 | commands. 124 | 125 | ## Source Code 126 | 127 | https://github.com/lindig/lipsum.git 128 | 129 | ## License 130 | 131 | Lipsum is distributed under the BSD-2 license. The license can be also 132 | displayed by the program: 133 | 134 | $ lipsum copyright 135 | https://github.com/lindig/lipsum.git 136 | Copyright (c) 2012, 2013, 2014, 2015 137 | Christian Lindig 138 | All rights reserved. 139 | 140 | Redistribution and use in source and binary forms, with or 141 | without modification, are permitted provided that the following 142 | conditions are met: 143 | 144 | (1) Redistributions of source code must retain the above copyright 145 | notice, this list of conditions and the following disclaimer. 146 | (2) Redistributions in binary form must reproduce the above copyright 147 | notice, this list of conditions and the following disclaimer in 148 | the documentation and/or other materials provided with the 149 | distribution. 150 | 151 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND 152 | CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, 153 | INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF 154 | MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 155 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR 156 | CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, 157 | SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT 158 | LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF 159 | USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED 160 | AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 161 | LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN 162 | ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE 163 | POSSIBILITY OF SUCH DAMAGE. 164 | 165 | ## Author 166 | 167 | Christian Lindig 168 | 169 | 170 | -------------------------------------------------------------------------------- /dune: -------------------------------------------------------------------------------- 1 | (env 2 | (profile 3 | (ocamlopt_flags (:standard -p)) 4 | (flags (:standard))) 5 | (dev 6 | (flags (:standard))) 7 | (release 8 | (flags (:standard)))) 9 | -------------------------------------------------------------------------------- /dune-project: -------------------------------------------------------------------------------- 1 | (lang dune 1.4) 2 | -------------------------------------------------------------------------------- /examples/Makefile: -------------------------------------------------------------------------------- 1 | # 2 | # 3 | 4 | LIPSUM = ../_build/default/src/main.exe 5 | CC = gcc 6 | 7 | all: echo sortlines 8 | 9 | clean: 10 | rm -f echo.c echo 11 | rm -f sortlines.c sortlines 12 | 13 | %.c: %.lp 14 | $(LIPSUM) tangle -f cpp $@ $< > $@ 15 | 16 | %: %.c 17 | $(CC) -o $@ $< 18 | 19 | -------------------------------------------------------------------------------- /examples/echo.lp: -------------------------------------------------------------------------------- 1 | 2 | Echo prints each command line argument on a line by itself. This 3 | documentation chunk extends until the beginning of the named code 4 | chunk below. 5 | 6 | <>= 7 | /* <> */ 8 | #include 9 | 10 | int main(int argc, char** argv) 11 | { 12 | int i; 13 | for (i=0; i>= 24 | This code is in the public domain. 25 | 26 | <>= 27 | This code is part of the documentation for Lipsum. -------------------------------------------------------------------------------- /examples/sortlines.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 622 | 623 | 624 | 625 |

Sortfile

626 | 627 |

This code is for the following task: sort a textfile according to the last 628 | number in each line. This challenge was given student applicants by friends 629 | and I took it to develop a program in C with some attention to performance.

630 | 631 | 632 |

Compiling

633 | 634 |

To compile this code, use a Makefile that can be extracted from this file 635 | using

636 | 637 |
    lipsum tangle Makefile sortlines.lp > Makefile
638 |     make
639 | 
640 | <<Makefile>>=
641 | all: sortlines
642 | 
643 | sortlines.c: sortlines.lp
644 |     lipsum tangle -f cpp $@ $< > $@
645 | 
646 | sortlines: sortlines.c
647 |     gcc -O -o $@ $<
648 | 
649 | 650 | 651 |

Types and Main Function

652 | 653 |

The idea is to read the file into memory, to build up an array of pointers 654 | to each line in the file, to sort the array, and finally to emit the lines 655 | in sorting order using the sorted array.

656 | 657 |
<<sortlines.c>>=
658 | #include <stdio.h>
659 | #include <stdlib.h>
660 | #include <ctype.h>
661 | 
662 | <<type definitions>>
663 | <<read file into memory>>
664 | <<scan lines>>
665 | <<sort predicate cmp>>
666 | <<main function>>
667 | 
668 | 669 |

Possible challenges include multiple numbers in a line and trying to avoid 670 | to scan a line multiple times for the detection of the last number in it.

671 | 672 |

The key data structure is LINE. It contains a pointer into the file, the 673 | length of the line and an int n for the value of the last number in that 674 | line.

675 | 676 |
<<type definitions>>=
677 | typedef struct line {
678 |     char *line; /* byte sequence */
679 |     int length;
680 |     int  n; /* sort key */
681 | } LINE;      
682 | 
683 | 684 |

The sort predicate on lines is defined by cmp. It derives an integer by 685 | subtracting the numbers belonging to the two lines being compared. The sign 686 | of the integer signals the order to the sort function.

687 | 688 |
<<sort predicate cmp>>=
689 | static int cmp(const LINE *x, const LINE *y)
690 | {
691 |     return x->n - y->n;
692 | }
693 | 
694 | 695 |

The main function reads the file into memory, scans it into lines, sorts 696 | the lines, and emits them.

697 | 698 |
<<main function>>=
699 | int main (int argc, const char * argv[]) 
700 | {
701 |     int n,l;
702 |     LINE *lines;
703 |     char *buffer;
704 |         long bufsize;
705 | 
706 |     if (argc != 2) {
707 |         fprintf(stderr,"usage: %s file\n",argv[0]);
708 |         return 1;
709 |     }
710 | 
711 |     buffer = readfile(argv[1], &bufsize);
712 |     if (!buffer)
713 |         return 1;
714 |     if (!bufsize)
715 |         return 0; /* file empty */
716 | 
717 |     lines = scan_buffer(&n, buffer, bufsize);
718 |     if (!lines) {
719 |         perror("can't allocate");
720 |         return 1;
721 |     }
722 | 
723 |     /* sort line array and emit each line to stdout */
724 |     mergesort(lines, n, sizeof(LINE), 
725 |         (int (*)(const void *, const void *))cmp);
726 |     for (l=0; l<n; l++)
727 |         fwrite(lines[l].line, lines[l].length, 1, stdout);
728 | 
729 |     free(lines);
730 |     free(buffer);
731 |     return 0;
732 | }
733 | 
734 | 735 | 736 |

Reading a file into memory

737 | 738 |

Reading a file into memory requires to allocate the right amount of memory. 739 | We actually allocate one more byte and add a newline if the last line does not 740 | end with a newline already. This makes line scanning more regular. Otherwise 741 | we have to pay attention to errors as not being able to read the file or 742 | running out of memory.

743 | 744 |

Function readfile returns a buffer allocated for the file and writes the 745 | size of the buffer to bufsize.

746 | 747 |
<<read file into memory>>=
748 | static char *readfile(const char *name, long *bufsize)
749 | {
750 |     long size, read;
751 |     char *buffer;
752 |     FILE *file;
753 | 
754 |     file = fopen(name, "rb");
755 |     if (!file) {
756 |         perror("can't open file");
757 |         return NULL;
758 |     }
759 | 
760 |     fseek(file, 0, SEEK_END);
761 |     size = ftell(file);
762 |     fseek(file, 0, SEEK_SET);
763 | 
764 |     /* we allocate one more byte that we might fill with \n */
765 |     buffer = malloc(size+1);
766 |     if (!buffer) {
767 |         perror("can't allocate memory");
768 |         return NULL;
769 |     } 
770 |     read = fread(buffer, 1, size, file);
771 |     if (read != size) {
772 |         perror("reading from file failed");
773 |         return NULL;
774 |     }
775 |     fclose(file);
776 | 
777 |     /* if the last character in the file is not a \n we add it */
778 |     if (*(buffer+size-1) != '\n') {
779 |         *(buffer+size) = '\n';
780 |         size++;
781 |     }
782 | 
783 |     *bufsize = size;
784 |     return buffer;
785 | }
786 | 
787 | 788 | 789 |

Parsing a buffer into lines

790 | 791 |

The scan_buffer routine scans the buffer byte by byte and each line it finds 792 | it adds to a LINE struct. Since we can’t know the number of lines, an 793 | initial number m is guessed. When it is exceeded, 2*m are 794 | allocated (and again doubling if necessary).

795 | 796 |

The tricky bit is to recognize the last number (and sort key) in a line and 797 | entering it into the LINE struct. Rather than relying to atoi(3) I’m using 798 | a small hack C2I that computes the value of a single digit and from there 799 | computes the number when we find more digits.

800 | 801 |

When we find a new number we never know wether it is the last number in a 802 | line. Hence we read it but overwrite any previous result. The outside flag 803 | is true, if we are outside a sequence of digits and if so we can go a little 804 | faster.

805 | 806 |

We don’t recognize negative numbers.

807 | 808 |
<<scan lines>>=
809 | #define C2I(c) ((c - '0'))        
810 | 
811 | LINE *scan_buffer(int *n, char *buffer, long bufsize) 
812 | 
813 | {
814 |     char *c = buffer;
815 |     char *line = buffer;
816 |     int outside = 1; /* true iff outside of digits sequence */
817 |     int number = -1; /* last number we read */
818 | 
819 |     int m = 100 + bufsize/40; /* max number of lines we can store */
820 |     LINE *lines = malloc(m*sizeof(LINE));
821 |     if (!lines) {
822 |         return NULL;
823 |     }
824 |     *n = 0; /* number of lines read */
825 |     LINE *l = lines; /* current line */
826 | 
827 | 
828 |     while(c < buffer+bufsize) {
829 |         switch (*c) {
830 |             case '\n':
831 |                 /* store line */
832 |                 l->line   = line;
833 |                 l->n      = number;
834 |                 l->length = c-line+1;
835 |                 (*n)++;
836 | 
837 |                 /* prepare for next line */
838 |                 number = -1;
839 |                 outside = 1;
840 |                 l++;
841 |                 c++;
842 |                 line = c;
843 | 
844 |                 /* make room for more lines to store */
845 |                 if (*n == m) {
846 |                     lines = realloc(lines, 2*m*sizeof(LINE));
847 |                     if (!lines) {
848 |                         return NULL;
849 |                     }
850 |                     m *= 2;
851 |                     l = &lines[*n];
852 |                 }
853 | 
854 |                 break;
855 | 
856 |             case '0': /* scan a number */
857 |             case '1':
858 |             case '2':
859 |             case '3':
860 |             case '4':
861 |             case '5':
862 |             case '6':
863 |             case '7':
864 |             case '8':
865 |             case '9':
866 |                 if (outside) {
867 |                     number = C2I(*c);
868 |                     /* we are inside a digit sequence now */
869 |                     outside = 0; 
870 |                 } else {
871 |                     number = number * 10 + C2I(*c);
872 |                 }
873 |                 c++;
874 |                 break;
875 |             default:
876 |                 outside = 1; /* outside a digit sequence */
877 |                 c++;
878 |                 break;
879 |         }
880 |     }
881 |     return lines;
882 | }
883 | 
884 | 885 | 886 | 887 | -------------------------------------------------------------------------------- /examples/sortlines.lp: -------------------------------------------------------------------------------- 1 | # Sortfile 2 | 3 | This code is for the following task: sort a textfile according to the last 4 | number in each line. This challenge was given student applicants by friends 5 | and I took it to develop a program in C with some attention to performance. 6 | 7 | ## Compiling 8 | 9 | To compile this code, use a Makefile that can be extracted from this file 10 | using 11 | 12 | lipsum tangle Makefile sortlines.lp > Makefile 13 | make 14 | 15 | <>= 16 | all: sortlines 17 | 18 | sortlines.c: sortlines.lp 19 | lipsum tangle -f cpp $@ $< > $@ 20 | 21 | sortlines: sortlines.c 22 | gcc -O -o $@ $< 23 | @ 24 | 25 | ## Types and Main Function 26 | 27 | The idea is to read the file into memory, to build up an array of pointers 28 | to each line in the file, to sort the array, and finally to emit the lines 29 | in sorting order using the sorted array. 30 | 31 | <>= 32 | #include 33 | #include 34 | #include 35 | 36 | <> 37 | <> 38 | <> 39 | <> 40 | <
> 41 | @ 42 | 43 | Possible challenges include multiple numbers in a line and trying to avoid 44 | to scan a line multiple times for the detection of the last number in it. 45 | 46 | The key data structure is `LINE`. It contains a pointer into the file, the 47 | length of the line and an `int n` for the value of the last number in that 48 | line. 49 | 50 | <>= 51 | typedef struct line { 52 | char *line; /* byte sequence */ 53 | int length; 54 | int n; /* sort key */ 55 | } LINE; 56 | 57 | @ The sort predicate on lines is defined by `cmp`. It derives an integer by 58 | subtracting the numbers belonging to the two lines being compared. The sign 59 | of the integer signals the order to the sort function. 60 | 61 | <>= 62 | static int cmp(const LINE *x, const LINE *y) 63 | { 64 | return x->n - y->n; 65 | } 66 | 67 | @ The main function reads the file into memory, scans it into lines, sorts 68 | the lines, and emits them. 69 | 70 | <
>= 71 | int main (int argc, const char * argv[]) 72 | { 73 | int n,l; 74 | LINE *lines; 75 | char *buffer; 76 | long bufsize; 77 | 78 | if (argc != 2) { 79 | fprintf(stderr,"usage: %s file\n",argv[0]); 80 | return 1; 81 | } 82 | 83 | buffer = readfile(argv[1], &bufsize); 84 | if (!buffer) 85 | return 1; 86 | if (!bufsize) 87 | return 0; /* file empty */ 88 | 89 | lines = scan_buffer(&n, buffer, bufsize); 90 | if (!lines) { 91 | perror("can't allocate"); 92 | return 1; 93 | } 94 | 95 | /* sort line array and emit each line to stdout */ 96 | mergesort(lines, n, sizeof(LINE), 97 | (int (*)(const void *, const void *))cmp); 98 | for (l=0; l>= 121 | static char *readfile(const char *name, long *bufsize) 122 | { 123 | long size, read; 124 | char *buffer; 125 | FILE *file; 126 | 127 | file = fopen(name, "rb"); 128 | if (!file) { 129 | perror("can't open file"); 130 | return NULL; 131 | } 132 | 133 | fseek(file, 0, SEEK_END); 134 | size = ftell(file); 135 | fseek(file, 0, SEEK_SET); 136 | 137 | /* we allocate one more byte that we might fill with \n */ 138 | buffer = malloc(size+1); 139 | if (!buffer) { 140 | perror("can't allocate memory"); 141 | return NULL; 142 | } 143 | read = fread(buffer, 1, size, file); 144 | if (read != size) { 145 | perror("reading from file failed"); 146 | return NULL; 147 | } 148 | fclose(file); 149 | 150 | /* if the last character in the file is not a \n we add it */ 151 | if (*(buffer+size-1) != '\n') { 152 | *(buffer+size) = '\n'; 153 | size++; 154 | } 155 | 156 | *bufsize = size; 157 | return buffer; 158 | } 159 | @ 160 | 161 | ## Parsing a buffer into lines 162 | 163 | The `scan_buffer` routine scans the buffer byte by byte and each line it finds 164 | it adds to a `LINE` struct. Since we can't know the number of lines, an 165 | initial number `m` is guessed. When it is exceeded, ``2*m`` are 166 | allocated (and again doubling if necessary). 167 | 168 | The tricky bit is to recognize the last number (and sort key) in a line and 169 | entering it into the `LINE` struct. Rather than relying to `atoi(3)` I'm using 170 | a small hack `C2I` that computes the value of a single digit and from there 171 | computes the number when we find more digits. 172 | 173 | When we find a new number we never know wether it is the last number in a 174 | line. Hence we read it but overwrite any previous result. The `outside` flag 175 | is true, if we are outside a sequence of digits and if so we can go a little 176 | faster. 177 | 178 | We don't recognize negative numbers. 179 | 180 | <>= 181 | #define C2I(c) ((c - '0')) 182 | 183 | LINE *scan_buffer(int *n, char *buffer, long bufsize) 184 | 185 | { 186 | char *c = buffer; 187 | char *line = buffer; 188 | int outside = 1; /* true iff outside of digits sequence */ 189 | int number = -1; /* last number we read */ 190 | 191 | int m = 100 + bufsize/40; /* max number of lines we can store */ 192 | LINE *lines = malloc(m*sizeof(LINE)); 193 | if (!lines) { 194 | return NULL; 195 | } 196 | *n = 0; /* number of lines read */ 197 | LINE *l = lines; /* current line */ 198 | 199 | 200 | while(c < buffer+bufsize) { 201 | switch (*c) { 202 | case '\n': 203 | /* store line */ 204 | l->line = line; 205 | l->n = number; 206 | l->length = c-line+1; 207 | (*n)++; 208 | 209 | /* prepare for next line */ 210 | number = -1; 211 | outside = 1; 212 | l++; 213 | c++; 214 | line = c; 215 | 216 | /* make room for more lines to store */ 217 | if (*n == m) { 218 | lines = realloc(lines, 2*m*sizeof(LINE)); 219 | if (!lines) { 220 | return NULL; 221 | } 222 | m *= 2; 223 | l = &lines[*n]; 224 | } 225 | 226 | break; 227 | 228 | case '0': /* scan a number */ 229 | case '1': 230 | case '2': 231 | case '3': 232 | case '4': 233 | case '5': 234 | case '6': 235 | case '7': 236 | case '8': 237 | case '9': 238 | if (outside) { 239 | number = C2I(*c); 240 | /* we are inside a digit sequence now */ 241 | outside = 0; 242 | } else { 243 | number = number * 10 + C2I(*c); 244 | } 245 | c++; 246 | break; 247 | default: 248 | outside = 1; /* outside a digit sequence */ 249 | c++; 250 | break; 251 | } 252 | } 253 | return lines; 254 | } 255 | 256 | -------------------------------------------------------------------------------- /examples/sortlines.md: -------------------------------------------------------------------------------- 1 | # Sortfile 2 | 3 | This code is for the following task: sort a textfile according to the last 4 | number in each line. This challenge was given student applicants by friends 5 | and I took it to develop a program in C with some attention to performance. 6 | 7 | ## Compiling 8 | 9 | To compile this code, use a Makefile that can be extracted from this file 10 | using 11 | 12 | lipsum tangle Makefile sortlines.lp > Makefile 13 | make 14 | 15 | <>= 16 | all: sortlines 17 | 18 | sortlines.c: sortlines.lp 19 | lipsum tangle -f cpp $@ $< > $@ 20 | 21 | sortlines: sortlines.c 22 | gcc -O -o $@ $< 23 | 24 | 25 | ## Types and Main Function 26 | 27 | The idea is to read the file into memory, to build up an array of pointers 28 | to each line in the file, to sort the array, and finally to emit the lines 29 | in sorting order using the sorted array. 30 | 31 | <>= 32 | #include 33 | #include 34 | #include 35 | 36 | <> 37 | <> 38 | <> 39 | <> 40 | <
> 41 | 42 | 43 | 44 | Possible challenges include multiple numbers in a line and trying to avoid 45 | to scan a line multiple times for the detection of the last number in it. 46 | 47 | The key data structure is `LINE`. It contains a pointer into the file, the 48 | length of the line and an `int n` for the value of the last number in that 49 | line. 50 | 51 | <>= 52 | typedef struct line { 53 | char *line; /* byte sequence */ 54 | int length; 55 | int n; /* sort key */ 56 | } LINE; 57 | 58 | 59 | The sort predicate on lines is defined by `cmp`. It derives an integer by 60 | subtracting the numbers belonging to the two lines being compared. The sign 61 | of the integer signals the order to the sort function. 62 | 63 | <>= 64 | static int cmp(const LINE *x, const LINE *y) 65 | { 66 | return x->n - y->n; 67 | } 68 | 69 | 70 | The main function reads the file into memory, scans it into lines, sorts 71 | the lines, and emits them. 72 | 73 | <
>= 74 | int main (int argc, const char * argv[]) 75 | { 76 | int n,l; 77 | LINE *lines; 78 | char *buffer; 79 | long bufsize; 80 | 81 | if (argc != 2) { 82 | fprintf(stderr,"usage: %s file\n",argv[0]); 83 | return 1; 84 | } 85 | 86 | buffer = readfile(argv[1], &bufsize); 87 | if (!buffer) 88 | return 1; 89 | if (!bufsize) 90 | return 0; /* file empty */ 91 | 92 | lines = scan_buffer(&n, buffer, bufsize); 93 | if (!lines) { 94 | perror("can't allocate"); 95 | return 1; 96 | } 97 | 98 | /* sort line array and emit each line to stdout */ 99 | mergesort(lines, n, sizeof(LINE), 100 | (int (*)(const void *, const void *))cmp); 101 | for (l=0; l>= 124 | static char *readfile(const char *name, long *bufsize) 125 | { 126 | long size, read; 127 | char *buffer; 128 | FILE *file; 129 | 130 | file = fopen(name, "rb"); 131 | if (!file) { 132 | perror("can't open file"); 133 | return NULL; 134 | } 135 | 136 | fseek(file, 0, SEEK_END); 137 | size = ftell(file); 138 | fseek(file, 0, SEEK_SET); 139 | 140 | /* we allocate one more byte that we might fill with \n */ 141 | buffer = malloc(size+1); 142 | if (!buffer) { 143 | perror("can't allocate memory"); 144 | return NULL; 145 | } 146 | read = fread(buffer, 1, size, file); 147 | if (read != size) { 148 | perror("reading from file failed"); 149 | return NULL; 150 | } 151 | fclose(file); 152 | 153 | /* if the last character in the file is not a \n we add it */ 154 | if (*(buffer+size-1) != '\n') { 155 | *(buffer+size) = '\n'; 156 | size++; 157 | } 158 | 159 | *bufsize = size; 160 | return buffer; 161 | } 162 | 163 | 164 | ## Parsing a buffer into lines 165 | 166 | The `scan_buffer` routine scans the buffer byte by byte and each line it finds 167 | it adds to a `LINE` struct. Since we can't know the number of lines, an 168 | initial number `m` is guessed. When it is exceeded, ``2*m`` are 169 | allocated (and again doubling if necessary). 170 | 171 | The tricky bit is to recognize the last number (and sort key) in a line and 172 | entering it into the `LINE` struct. Rather than relying to `atoi(3)` I'm using 173 | a small hack `C2I` that computes the value of a single digit and from there 174 | computes the number when we find more digits. 175 | 176 | When we find a new number we never know wether it is the last number in a 177 | line. Hence we read it but overwrite any previous result. The `outside` flag 178 | is true, if we are outside a sequence of digits and if so we can go a little 179 | faster. 180 | 181 | We don't recognize negative numbers. 182 | 183 | <>= 184 | #define C2I(c) ((c - '0')) 185 | 186 | LINE *scan_buffer(int *n, char *buffer, long bufsize) 187 | 188 | { 189 | char *c = buffer; 190 | char *line = buffer; 191 | int outside = 1; /* true iff outside of digits sequence */ 192 | int number = -1; /* last number we read */ 193 | 194 | int m = 100 + bufsize/40; /* max number of lines we can store */ 195 | LINE *lines = malloc(m*sizeof(LINE)); 196 | if (!lines) { 197 | return NULL; 198 | } 199 | *n = 0; /* number of lines read */ 200 | LINE *l = lines; /* current line */ 201 | 202 | 203 | while(c < buffer+bufsize) { 204 | switch (*c) { 205 | case '\n': 206 | /* store line */ 207 | l->line = line; 208 | l->n = number; 209 | l->length = c-line+1; 210 | (*n)++; 211 | 212 | /* prepare for next line */ 213 | number = -1; 214 | outside = 1; 215 | l++; 216 | c++; 217 | line = c; 218 | 219 | /* make room for more lines to store */ 220 | if (*n == m) { 221 | lines = realloc(lines, 2*m*sizeof(LINE)); 222 | if (!lines) { 223 | return NULL; 224 | } 225 | m *= 2; 226 | l = &lines[*n]; 227 | } 228 | 229 | break; 230 | 231 | case '0': /* scan a number */ 232 | case '1': 233 | case '2': 234 | case '3': 235 | case '4': 236 | case '5': 237 | case '6': 238 | case '7': 239 | case '8': 240 | case '9': 241 | if (outside) { 242 | number = C2I(*c); 243 | /* we are inside a digit sequence now */ 244 | outside = 0; 245 | } else { 246 | number = number * 10 + C2I(*c); 247 | } 248 | c++; 249 | break; 250 | default: 251 | outside = 1; /* outside a digit sequence */ 252 | c++; 253 | break; 254 | } 255 | } 256 | return lines; 257 | } 258 | 259 | 260 | -------------------------------------------------------------------------------- /lipsum.1: -------------------------------------------------------------------------------- 1 | .\" Automatically generated by Pod::Man 2.27 (Pod::Simple 3.28) 2 | .\" 3 | .\" Standard preamble: 4 | .\" ======================================================================== 5 | .de Sp \" Vertical space (when we can't use .PP) 6 | .if t .sp .5v 7 | .if n .sp 8 | .. 9 | .de Vb \" Begin verbatim text 10 | .ft CW 11 | .nf 12 | .ne \\$1 13 | .. 14 | .de Ve \" End verbatim text 15 | .ft R 16 | .fi 17 | .. 18 | .\" Set up some character translations and predefined strings. \*(-- will 19 | .\" give an unbreakable dash, \*(PI will give pi, \*(L" will give a left 20 | .\" double quote, and \*(R" will give a right double quote. \*(C+ will 21 | .\" give a nicer C++. Capital omega is used to do unbreakable dashes and 22 | .\" therefore won't be available. \*(C` and \*(C' expand to `' in nroff, 23 | .\" nothing in troff, for use with C<>. 24 | .tr \(*W- 25 | .ds C+ C\v'-.1v'\h'-1p'\s-2+\h'-1p'+\s0\v'.1v'\h'-1p' 26 | .ie n \{\ 27 | . ds -- \(*W- 28 | . ds PI pi 29 | . if (\n(.H=4u)&(1m=24u) .ds -- \(*W\h'-12u'\(*W\h'-12u'-\" diablo 10 pitch 30 | . if (\n(.H=4u)&(1m=20u) .ds -- \(*W\h'-12u'\(*W\h'-8u'-\" diablo 12 pitch 31 | . ds L" "" 32 | . ds R" "" 33 | . ds C` "" 34 | . ds C' "" 35 | 'br\} 36 | .el\{\ 37 | . ds -- \|\(em\| 38 | . ds PI \(*p 39 | . ds L" `` 40 | . ds R" '' 41 | . ds C` 42 | . ds C' 43 | 'br\} 44 | .\" 45 | .\" Escape single quotes in literal strings from groff's Unicode transform. 46 | .ie \n(.g .ds Aq \(aq 47 | .el .ds Aq ' 48 | .\" 49 | .\" If the F register is turned on, we'll generate index entries on stderr for 50 | .\" titles (.TH), headers (.SH), subsections (.SS), items (.Ip), and index 51 | .\" entries marked with X<> in POD. Of course, you'll have to process the 52 | .\" output yourself in some meaningful fashion. 53 | .\" 54 | .\" Avoid warning from groff about undefined register 'F'. 55 | .de IX 56 | .. 57 | .nr rF 0 58 | .if \n(.g .if rF .nr rF 1 59 | .if (\n(rF:(\n(.g==0)) \{ 60 | . if \nF \{ 61 | . de IX 62 | . tm Index:\\$1\t\\n%\t"\\$2" 63 | .. 64 | . if !\nF==2 \{ 65 | . nr % 0 66 | . nr F 2 67 | . \} 68 | . \} 69 | .\} 70 | .rr rF 71 | .\" 72 | .\" Accent mark definitions (@(#)ms.acc 1.5 88/02/08 SMI; from UCB 4.2). 73 | .\" Fear. Run. Save yourself. No user-serviceable parts. 74 | . \" fudge factors for nroff and troff 75 | .if n \{\ 76 | . ds #H 0 77 | . ds #V .8m 78 | . ds #F .3m 79 | . ds #[ \f1 80 | . ds #] \fP 81 | .\} 82 | .if t \{\ 83 | . ds #H ((1u-(\\\\n(.fu%2u))*.13m) 84 | . ds #V .6m 85 | . ds #F 0 86 | . ds #[ \& 87 | . ds #] \& 88 | .\} 89 | . \" simple accents for nroff and troff 90 | .if n \{\ 91 | . ds ' \& 92 | . ds ` \& 93 | . ds ^ \& 94 | . ds , \& 95 | . ds ~ ~ 96 | . ds / 97 | .\} 98 | .if t \{\ 99 | . ds ' \\k:\h'-(\\n(.wu*8/10-\*(#H)'\'\h"|\\n:u" 100 | . ds ` \\k:\h'-(\\n(.wu*8/10-\*(#H)'\`\h'|\\n:u' 101 | . ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'^\h'|\\n:u' 102 | . ds , \\k:\h'-(\\n(.wu*8/10)',\h'|\\n:u' 103 | . ds ~ \\k:\h'-(\\n(.wu-\*(#H-.1m)'~\h'|\\n:u' 104 | . ds / \\k:\h'-(\\n(.wu*8/10-\*(#H)'\z\(sl\h'|\\n:u' 105 | .\} 106 | . \" troff and (daisy-wheel) nroff accents 107 | .ds : \\k:\h'-(\\n(.wu*8/10-\*(#H+.1m+\*(#F)'\v'-\*(#V'\z.\h'.2m+\*(#F'.\h'|\\n:u'\v'\*(#V' 108 | .ds 8 \h'\*(#H'\(*b\h'-\*(#H' 109 | .ds o \\k:\h'-(\\n(.wu+\w'\(de'u-\*(#H)/2u'\v'-.3n'\*(#[\z\(de\v'.3n'\h'|\\n:u'\*(#] 110 | .ds d- \h'\*(#H'\(pd\h'-\w'~'u'\v'-.25m'\f2\(hy\fP\v'.25m'\h'-\*(#H' 111 | .ds D- D\\k:\h'-\w'D'u'\v'-.11m'\z\(hy\v'.11m'\h'|\\n:u' 112 | .ds th \*(#[\v'.3m'\s+1I\s-1\v'-.3m'\h'-(\w'I'u*2/3)'\s-1o\s+1\*(#] 113 | .ds Th \*(#[\s+2I\s-2\h'-\w'I'u*3/5'\v'-.3m'o\v'.3m'\*(#] 114 | .ds ae a\h'-(\w'a'u*4/10)'e 115 | .ds Ae A\h'-(\w'A'u*4/10)'E 116 | . \" corrections for vroff 117 | .if v .ds ~ \\k:\h'-(\\n(.wu*9/10-\*(#H)'\s-2\u~\d\s+2\h'|\\n:u' 118 | .if v .ds ^ \\k:\h'-(\\n(.wu*10/11-\*(#H)'\v'-.4m'^\v'.4m'\h'|\\n:u' 119 | . \" for low resolution devices (crt and lpr) 120 | .if \n(.H>23 .if \n(.V>19 \ 121 | \{\ 122 | . ds : e 123 | . ds 8 ss 124 | . ds o a 125 | . ds d- d\h'-1'\(ga 126 | . ds D- D\h'-1'\(hy 127 | . ds th \o'bp' 128 | . ds Th \o'LP' 129 | . ds ae ae 130 | . ds Ae AE 131 | .\} 132 | .rm #[ #] #H #V #F C 133 | .\" ======================================================================== 134 | .\" 135 | .IX Title "lipsum 1" 136 | .TH lipsum 1 "2017-09-17" "2015" "Christian Lindig" 137 | .\" For nroff, turn off justification. Always turn off hyphenation; it makes 138 | .\" way too many mistakes in technical documents. 139 | .if n .ad l 140 | .nh 141 | .SH "NAME" 142 | lipsum \- a simple literate programming utility 143 | .SH "SYNOPSIS" 144 | .IX Header "SYNOPSIS" 145 | \&\fBlipsum\fR \fBtangle\fR [\fB\-f\fR \fIfmt\fR] \fIchunk\fR [\fIfile.lp\fR] 146 | .PP 147 | \&\fBlipsum\fR \fBexpand\fR [\fB\-f\fR \fIfmt\fR] \fIglob\fR [\fIfile.lp\fR] 148 | .PP 149 | \&\fBlipsum\fR \fBformats\fR 150 | .PP 151 | \&\fBlipsum\fR \fBroots\fR [\fIfile.lp\fR] 152 | .PP 153 | \&\fBlipsum\fR \fBcheck\fR [\fIfile.lp\fR] 154 | .PP 155 | \&\fBlipsum\fR \fBchunks\fR [\fIfile.lp\fR] 156 | .PP 157 | \&\fBlipsum\fR \fBundefined\fR [\fIfile.lp\fR] 158 | .PP 159 | \&\fBlipsum\fR \fBweave\fR [\fIfile.lp\fR] 160 | .PP 161 | \&\fBlipsum\fR \fBcopyright\fR 162 | .SH "DESCRIPTION" 163 | .IX Header "DESCRIPTION" 164 | \&\fBLipsum\fR is a tool for literate programming in the tradition of NoWeb 165 | and Sweave. A literate program is a file that contains both 166 | documentation and source code which are separated by a lightweight 167 | markup. \fBlipsum\fR's primary task is to extract the source from a 168 | literate program to make it available for compilation and execution. 169 | This is traditionally called \fItangling\fR. \fBLipsum\fR is language agnostic 170 | and can be used with almost any programming language and documentation 171 | syntax. 172 | .SH "COMMANDS" 173 | .IX Header "COMMANDS" 174 | The first command line parameter identifies a command (like \fBtangle\fR or 175 | \&\fBroots\fR), which is followed by options, parameters, and file names. 176 | A command that expects a named file for input will typically read input 177 | from \fIstdin\fR when no file name is given. Output of a command generally goes 178 | to \fIstdout\fR. All commands accept an option \fB\-\-help\fR to show online 179 | help. 180 | .IP "\fBtangle\fR [\fB\-f\fR \fIfmt\fR] \fIchunk\fR [\fIfile.lp\fR]" 4 181 | .IX Item "tangle [-f fmt] chunk [file.lp]" 182 | Extract \fIchunk\fR from \fIfile.lp\fR and emit it to \fIstdout\fR. Commands 183 | \&\fBchunks\fR and \fBroots\fR list the available chunks. The extracted code 184 | optionally includes references to original source code locations in 185 | \&\fIfile.lp\fR. For a list of available formats, use \fBformats\fR. 186 | .IP "\fBformats\fR" 4 187 | .IX Item "formats" 188 | Emit the list of available source code position formats. The best known 189 | format is \f(CW\*(C`cpp\*(C'\fR as it is used by the C preprocessor \fIcpp\fR\|(1) and works for 190 | many languages that use \f(CW\*(C`#\*(C'\fR to start a comment. 191 | .IP "\fBexpand\fR [\fB\-f\fR \fIfmt\fR] \fIglob\fR [\fIfile.lp\fR]" 4 192 | .IX Item "expand [-f fmt] glob [file.lp]" 193 | Extract each root code chunk matching shell pattern \fIglob\fR to a file of 194 | the same name using format \fIfmt\fR. This is a quick way extract all code 195 | chunks in one go into individual files. On the shell command line, you have 196 | to quote the \fIglob\fR pattern to avoid the shell expanding it: 197 | .Sp 198 | .Vb 1 199 | \& $ lipsum \-f cpp \*(Aq*.[ch]\*(Aq file.lp 200 | .Ve 201 | .Sp 202 | This will expand all chunks matching `*.c` or `*.h` using the \fIcpp\fR\|(1) 203 | format for back references into \fIfile.lp\fR. 204 | .Sp 205 | Note that pattern \f(CW\*(C`*\*(C'\fR does not match a \f(CW\*(C`/\*(C'\fR and hence \f(CW\*(C`*.c\*(C'\fR does not 206 | match \f(CW\*(C`a/b.c\*(C'\fR. 207 | .IP "\fBroots\fR [\fIfile.lp\fR]" 4 208 | .IX Item "roots [file.lp]" 209 | List the name of all root code chunks, one per line. A root chunk is 210 | a chunk that is not part of another code chunk. 211 | .IP "\fBundefined\fR [\fIfile.lp\fR]" 4 212 | .IX Item "undefined [file.lp]" 213 | List code chunks that are referenced but not defined, one per line. 214 | .IP "\fBchunks\fR [\fIfile.lp\fR]" 4 215 | .IX Item "chunks [file.lp]" 216 | List the name of all code chunks, one per line. 217 | .IP "\fBweave\fR [\fIfile.lp\fR]" 4 218 | .IX Item "weave [file.lp]" 219 | Emit \fIfile.lp\fR with code chunks indented by 4 spaces. This makes the 220 | output suitable for Markdown provided the documentation is already in 221 | Markdown syntax. 222 | .IP "\fBcopyright\fR" 4 223 | .IX Item "copyright" 224 | Emit the copyright notice and license to \fIstdout\fR. 225 | .SH "FILE FORMAT" 226 | .IX Header "FILE FORMAT" 227 | A literate program as it is understood by \fBlipsum\fR is a sequence of 228 | documentation and code chunks. Code chunks are named and may include 229 | references to other code chunks. Command \fBchunks\fR lists all available 230 | chunks in a file. Below is a simple example: 231 | .PP 232 | .Vb 5 233 | \& Echo prints each command line argument on a line by itself. This 234 | \& documentation chunk extends until the beginning of the named code 235 | \& chunk @<> below. Here I had to use @ to escape the meaning 236 | \& of @<< because otherwise it would have denoted a reference to a named 237 | \& code chunk. 238 | \& 239 | \& <>= 240 | \& /* <> */ 241 | \& #include 242 | \& 243 | \& int main(int argc, char** argv) 244 | \& { 245 | \& int i; 246 | \& for (i=0; i>= 257 | \& This code is in the public domain. 258 | .Ve 259 | .PP 260 | Documentation is introduced by an \f(CW\*(C`@\*(C'\fR at the beginning of a line, followed 261 | by a space or a newline. Code chunks are named and introduced by \f(CW\*(C`<<\*(C'\fR and 262 | >>=. A name can span several words but must not include a newline. A 263 | reference to a chunk is enclosed by \f(CW\*(C`<<\*(C'\fR and \f(CW\*(C`\*(C'\fR>>. A chunk extends until 264 | the beginning of the next chunk or the end of input. The first chunk in 265 | a file is by default a documentation chunk. 266 | .PP 267 | A code chunk can be extended: 268 | .PP 269 | .Vb 2 270 | \& <>= 271 | \& This code is part of the documentation for Lipsum 272 | .Ve 273 | .PP 274 | When the \f(CW\*(C`copyright\*(C'\fR chunk is expanded, the two chunks are concatenated 275 | and hence the copyright chunk expands to: 276 | .PP 277 | .Vb 2 278 | \& /* This code is in the public domain. 279 | \& This code is part of the documentation for Lipsum. */ 280 | .Ve 281 | .PP 282 | When a chunk is extracted with command \fBexpand\fR (traditionally called 283 | tangling), the code is emitted to \fIstdout\fR and all references are resolved 284 | by emitting the referenced chunks during the process. A code chunk must not 285 | include references to itself or any of the chunks where it is referenced. 286 | .SH "QUOTING" 287 | .IX Header "QUOTING" 288 | Since a lipsum file uses @, <<, >>, and >>= for markup, a mechanism is 289 | needed to include these in documentation and code chunks, as well as chunk 290 | names. The general mechanism is to prefix strings with \f(CW\*(C`@\*(C'\fR to escape their 291 | meaning as markup. 292 | .PP 293 | Below are situations where escaping is important. 294 | .ie n .IP """@"" in code" 4 295 | .el .IP "\f(CW@\fR in code" 4 296 | .IX Item "@ in code" 297 | The \f(CW\*(C`@\*(C'\fR character only needs to be escaped when it is the first character 298 | in a line. Escape it as \f(CW\*(C`@@\*(C'\fR. 299 | .ie n .IP """<<"" in code or documentation" 4 300 | .el .IP "\f(CW<<\fR in code or documentation" 4 301 | .IX Item "<< in code or documentation" 302 | Any occurrence of \f(CW\*(C`<<\*(C'\fR in code or documentation that does not indicate a 303 | named chunk needs to be escaped as \f(CW\*(C`@<<\*(C'\fR. 304 | .ie n .IP """@<<"" in code" 4 305 | .el .IP "\f(CW@<<\fR in code" 4 306 | .IX Item "@<< in code" 307 | Escape \f(CW\*(C`@<<\*(C'\fR as \f(CW\*(C`@@<<\*(C'\fR. 308 | .ie n .IP """@"" in chunk names" 4 309 | .el .IP "\f(CW@\fR in chunk names" 4 310 | .IX Item "@ in chunk names" 311 | Escape \f(CW\*(C`@\*(C'\fR as \f(CW\*(C`@@\*(C'\fR. 312 | .ie n .IP """<<"", and "">>="" in chunk names" 4 313 | .el .IP "\f(CW<<\fR, and \f(CW>>=\fR in chunk names" 4 314 | .IX Item "<<, and >>= in chunk names" 315 | Escape any of the above strings by prefixing them with \f(CW\*(C`@\*(C'\fR. 316 | .ie n .IP """@<<"", ""@>>"", ""@>>="" in chunk names" 4 317 | .el .IP "\f(CW@<<\fR, \f(CW@>>\fR, \f(CW@>>=\fR in chunk names" 4 318 | .IX Item "@<<, @>>, @>>= in chunk names" 319 | Escape any of them by prefixing them with another \f(CW\*(C`@\*(C'\fR. 320 | .SH "DIAGNOSTICS" 321 | .IX Header "DIAGNOSTICS" 322 | .ie n .IP """no such chunk""" 4 323 | .el .IP "\f(CWno such chunk\fR" 4 324 | .IX Item "no such chunk" 325 | The named chunk does not exist. Use commands \fBroots\fR or \fBchunks\fR to list 326 | existing chunks. 327 | .ie n .IP """chunk is part of a cylcle""" 4 328 | .el .IP "\f(CWchunk is part of a cylcle\fR" 4 329 | .IX Item "chunk is part of a cylcle" 330 | A chunk must not include itself directly or indirectly as it would expand 331 | to an infinite document. A chunk was found to violate this. 332 | .ie n .IP """unexpeced newline in chunk name""" 4 333 | .el .IP "\f(CWunexpeced newline in chunk name\fR" 4 334 | .IX Item "unexpeced newline in chunk name" 335 | A chunk name must not contain a newline character. The error is most likely 336 | caused by << inside code that looks to the scanner like the beginning of 337 | a chunk name. Prefix it with @ like in @<< to escape it. 338 | .ie n .IP """unexpeced end of file in chunk name""" 4 339 | .el .IP "\f(CWunexpeced end of file in chunk name\fR" 4 340 | .IX Item "unexpeced end of file in chunk name" 341 | The scanner encountered the end of input after reading << and assuming 342 | that this marks the beginning of a chunk name. Prefix it with @ 343 | like in @<< to signal that it is not the beginning of a chunk name or 344 | close the chunk name properly with >> or >>=. 345 | .SH "UNICODE" 346 | .IX Header "UNICODE" 347 | Lipsum is not unicode aware but should work with \s-1UTF8\s0 files regardless. If 348 | you have suggestions how to make Lipsum unicode aware the author would be 349 | interested to hear about them. 350 | .SH "RETURN VALUES" 351 | .IX Header "RETURN VALUES" 352 | The \fBlipsum\fR utility returns 0 on success and a positive number if an 353 | error occurs. 354 | .SH "EXIT CODE" 355 | .IX Header "EXIT CODE" 356 | \&\fBLipusm\fR exits with 0 on success and a positive number otherwise. 357 | .SH "SEE ALSO" 358 | .IX Header "SEE ALSO" 359 | \&\fInotangle\fR\|(1), \fIcpp\fR\|(1), http://daringfireball.net/projects/markdown/ 360 | .SH "AUTHOR" 361 | .IX Header "AUTHOR" 362 | Written by Christian Lindig . The source code is 363 | available from https://github.com/lindig/lipsum.git 364 | .SH "LICENSE" 365 | .IX Header "LICENSE" 366 | See command \fBcopyright\fR for how to display the copyright notice and 367 | license. 368 | -------------------------------------------------------------------------------- /lipsum.opam: -------------------------------------------------------------------------------- 1 | opam-version: "2.0" 2 | name: "lipsum" 3 | synopsis: "Self-contained tool for literate programming in tradition of NoWeb" 4 | maintainer: "Christian Lindig " 5 | authors: "Christian Lindig " 6 | license: "BSD" 7 | homepage: "https://github.com/lindig/lipsum" 8 | bug-reports: "https://github.com/lindig/lipsum/issues" 9 | depends: [ 10 | "ocaml" 11 | "dune" {build} 12 | "re" 13 | ] 14 | doc: "https://lindig.github.io/lipsum/" 15 | build: ["dune" "build" "-p" "-j" jobs] 16 | dev-repo: "git+https://github.com/lindig/lipsum.git" 17 | description: """ 18 | Lipsum is a command-line utility for literate programming. It stands in 19 | the tradition of [Noweb](http://www.cs.tufts.edu/~nr/noweb/), a popular 20 | and flexible literate programming system by Norman Ramsey. The idea of 21 | literate programming is to keep documentation for programmers and 22 | program code in one file and to arrange it in a way that helps 23 | understanding it best. To actually compile or run the code it needs to 24 | be extracted from the literate program and Lipsum is a tool to do this. 25 | 26 | Like Noweb, Lipsum employs a minimal markup to arrange documentation and 27 | code in a file. Also like Noweb, Lipsum is language agnostic and can be 28 | used for almost any programming language and documentation.""" 29 | -------------------------------------------------------------------------------- /lipsum.pod: -------------------------------------------------------------------------------- 1 | 2 | # This is a Unix manual page for Lipsum in Perl POD format. It is 3 | # translated into manual format using pod2perl which comes with every Perl 4 | # installation. The POD format has some limitations but it is still one of 5 | # the easier ways to write a manual page. I take the limitations (no 6 | # tables, problems with escapes) in exchange for an easy way to update the 7 | # manual page. 8 | # 9 | 10 | __END__ 11 | 12 | =head1 NAME 13 | 14 | lipsum - a simple literate programming utility 15 | 16 | =head1 SYNOPSIS 17 | 18 | B B [B<-f> I] I [I] 19 | 20 | B B [B<-f> I] I [I] 21 | 22 | B B 23 | 24 | B B [I] 25 | 26 | B B [I] 27 | 28 | B B [I] 29 | 30 | B B [I] 31 | 32 | B B [I] 33 | 34 | B B 35 | 36 | =head1 DESCRIPTION 37 | 38 | B is a tool for literate programming in the tradition of NoWeb 39 | and Sweave. A literate program is a file that contains both 40 | documentation and source code which are separated by a lightweight 41 | markup. B's primary task is to extract the source from a 42 | literate program to make it available for compilation and execution. 43 | This is traditionally called I. B is language agnostic 44 | and can be used with almost any programming language and documentation 45 | syntax. 46 | 47 | =head1 COMMANDS 48 | 49 | The first command line parameter identifies a command (like B or 50 | B), which is followed by options, parameters, and file names. 51 | A command that expects a named file for input will typically read input 52 | from I when no file name is given. Output of a command generally goes 53 | to I. All commands accept an option B<--help> to show online 54 | help. 55 | 56 | =over 4 57 | 58 | =item B [B<-f> I] I [I] 59 | 60 | Extract I from I and emit it to I. Commands 61 | B and B list the available chunks. The extracted code 62 | optionally includes references to original source code locations in 63 | I. For a list of available formats, use B. 64 | 65 | =item B 66 | 67 | Emit the list of available source code position formats. The best known 68 | format is C as it is used by the C preprocessor cpp(1) and works for 69 | many languages that use C<#> to start a comment. 70 | 71 | =item B [B<-f> I] I [I] 72 | 73 | Extract each root code chunk matching shell pattern I to a file of 74 | the same name using format I. This is a quick way extract all code 75 | chunks in one go into individual files. On the shell command line, you have 76 | to quote the I pattern to avoid the shell expanding it: 77 | 78 | $ lipsum -f cpp '*.[ch]' file.lp 79 | 80 | This will expand all chunks matching `*.c` or `*.h` using the cpp(1) 81 | format for back references into F. 82 | 83 | Note that pattern C<*> does not match a C and hence C<*.c> does not 84 | match C. 85 | 86 | =item B [I] 87 | 88 | List the name of all root code chunks, one per line. A root chunk is 89 | a chunk that is not part of another code chunk. 90 | 91 | =item B [I] 92 | 93 | List code chunks that are referenced but not defined, one per line. 94 | 95 | =item B [I] 96 | 97 | List the name of all code chunks, one per line. 98 | 99 | =item B [I] 100 | 101 | Emit I with code chunks indented by 4 spaces. This makes the 102 | output suitable for Markdown provided the documentation is already in 103 | Markdown syntax. 104 | 105 | =item B 106 | 107 | Emit the copyright notice and license to I. 108 | 109 | =back 110 | 111 | =head1 FILE FORMAT 112 | 113 | A literate program as it is understood by B is a sequence of 114 | documentation and code chunks. Code chunks are named and may include 115 | references to other code chunks. Command B lists all available 116 | chunks in a file. Below is a simple example: 117 | 118 | Echo prints each command line argument on a line by itself. This 119 | documentation chunk extends until the beginning of the named code 120 | chunk @<> below. Here I had to use @ to escape the meaning 121 | of @<< because otherwise it would have denoted a reference to a named 122 | code chunk. 123 | 124 | <>= 125 | /* <> */ 126 | #include 127 | 128 | int main(int argc, char** argv) 129 | { 130 | int i; 131 | for (i=0; i>= 142 | This code is in the public domain. 143 | 144 | Documentation is introduced by an C<@> at the beginning of a line, followed 145 | by a space or a newline. Code chunks are named and introduced by C<<<> and 146 | >>=. A name can span several words but must not include a newline. A 147 | reference to a chunk is enclosed by C<<<> and C<>>>. A chunk extends until 148 | the beginning of the next chunk or the end of input. The first chunk in 149 | a file is by default a documentation chunk. 150 | 151 | A code chunk can be extended: 152 | 153 | <>= 154 | This code is part of the documentation for Lipsum 155 | 156 | When the C chunk is expanded, the two chunks are concatenated 157 | and hence the copyright chunk expands to: 158 | 159 | /* This code is in the public domain. 160 | This code is part of the documentation for Lipsum. */ 161 | 162 | When a chunk is extracted with command B (traditionally called 163 | tangling), the code is emitted to I and all references are resolved 164 | by emitting the referenced chunks during the process. A code chunk must not 165 | include references to itself or any of the chunks where it is referenced. 166 | 167 | =head1 QUOTING 168 | 169 | Since a lipsum file uses @, <<, >>, and >>= for markup, a mechanism is 170 | needed to include these in documentation and code chunks, as well as chunk 171 | names. The general mechanism is to prefix strings with C<@> to escape their 172 | meaning as markup. 173 | 174 | Below are situations where escaping is important. 175 | 176 | =over 4 177 | 178 | =item C<@> in code 179 | 180 | The C<@> character only needs to be escaped when it is the first character 181 | in a line. Escape it as C<@@>. 182 | 183 | =item C<<< << >>> in code or documentation 184 | 185 | Any occurrence of C<<<> in code or documentation that does not indicate a 186 | named chunk needs to be escaped as C<@<<>. 187 | 188 | =item C<@<<> in code 189 | 190 | Escape C<@<<> as C<@@<<>. 191 | 192 | =item C<@> in chunk names 193 | 194 | Escape C<@> as C<@@>. 195 | 196 | =item C<<< << >>>, and C<<< >>= >>> in chunk names 197 | 198 | Escape any of the above strings by prefixing them with C<@>. 199 | 200 | =item C<<< @<< >>>, C<<< @>> >>>, C<<< @>>= >>> in chunk names 201 | 202 | Escape any of them by prefixing them with another C<@>. 203 | 204 | =back 205 | 206 | =head1 DIAGNOSTICS 207 | 208 | =over 4 209 | 210 | =item C 211 | 212 | The named chunk does not exist. Use commands B or B to list 213 | existing chunks. 214 | 215 | =item C 216 | 217 | A chunk must not include itself directly or indirectly as it would expand 218 | to an infinite document. A chunk was found to violate this. 219 | 220 | =item C 221 | 222 | A chunk name must not contain a newline character. The error is most likely 223 | caused by << inside code that looks to the scanner like the beginning of 224 | a chunk name. Prefix it with @ like in @<< to escape it. 225 | 226 | =item C 227 | 228 | The scanner encountered the end of input after reading << and assuming 229 | that this marks the beginning of a chunk name. Prefix it with @ 230 | like in @<< to signal that it is not the beginning of a chunk name or 231 | close the chunk name properly with >> or >>=. 232 | 233 | =back 234 | 235 | =head1 UNICODE 236 | 237 | Lipsum is not unicode aware but should work with UTF8 files regardless. If 238 | you have suggestions how to make Lipsum unicode aware the author would be 239 | interested to hear about them. 240 | 241 | =head1 RETURN VALUES 242 | 243 | The B utility returns 0 on success and a positive number if an 244 | error occurs. 245 | 246 | =head1 EXIT CODE 247 | 248 | B exits with 0 on success and a positive number otherwise. 249 | 250 | =head1 SEE ALSO 251 | 252 | notangle(1), cpp(1), http://daringfireball.net/projects/markdown/ 253 | 254 | =head1 AUTHOR 255 | 256 | Written by Christian Lindig . The source code is 257 | available from https://github.com/lindig/lipsum.git 258 | 259 | =head1 LICENSE 260 | 261 | See command B for how to display the copyright notice and 262 | license. 263 | -------------------------------------------------------------------------------- /src/dune: -------------------------------------------------------------------------------- 1 | 2 | (executable 3 | (name main) 4 | (public_name lipsum) 5 | (libraries 6 | cmdliner re.glob)) 7 | 8 | (ocamlyacc parser) 9 | 10 | (ocamllex scanner escape) 11 | -------------------------------------------------------------------------------- /src/escape.mll: -------------------------------------------------------------------------------- 1 | { 2 | module L = Lexing 3 | 4 | (* col0 is true, iff a match starts at the beginning of a line *) 5 | let col0 lexbuf = 6 | let p = lexbuf.L.lex_start_p 7 | in 8 | p.L.pos_cnum = p.L.pos_bol 9 | } 10 | 11 | rule escape io = parse 12 | eof { () } 13 | | "@<<" { output_string io "@@<<" ; escape io lexbuf } 14 | | "<<" { output_string io "@<<" ; escape io lexbuf } 15 | | "@" { if col0 lexbuf 16 | then ( output_string io "@@" 17 | ; escape io lexbuf 18 | ) 19 | else ( output_char io '@' 20 | ; escape io lexbuf 21 | ) 22 | } 23 | | '\n' { Lexing.new_line lexbuf 24 | ; output_char io '\n' 25 | ; escape io lexbuf 26 | } 27 | | _ { output_char io (L.lexeme_char lexbuf 0) 28 | ; escape io lexbuf 29 | } 30 | { 31 | 32 | } 33 | 34 | 35 | -------------------------------------------------------------------------------- /src/litprog.ml: -------------------------------------------------------------------------------- 1 | 2 | 3 | module SM = Map.Make(String) 4 | module SS = Set.Make(String) 5 | module T = Tangle 6 | 7 | exception NoSuchChunk of string 8 | exception Cycle of string 9 | 10 | 11 | 12 | type code = 13 | | Str of T.position * string 14 | | Ref of string 15 | 16 | type chunk = 17 | | Doc of string 18 | | Code of string * code list 19 | 20 | type doc = chunk list 21 | 22 | type t = 23 | { code: (code list) SM.t 24 | ; chunks: chunk list 25 | } 26 | 27 | let printf = Printf.printf 28 | let (@@) f x = f x 29 | 30 | let empty = 31 | { code = SM.empty 32 | ; chunks = [] 33 | } 34 | 35 | let append key v map = 36 | if SM.mem key map 37 | then SM.add key ((SM.find key map)@v) map 38 | else SM.add key v map 39 | 40 | let make chunks = 41 | let add t = function 42 | | Doc(_) as d -> { t with chunks = d :: t.chunks } 43 | | Code(n,cs) as c -> { code = append n cs t.code 44 | ; chunks = c::t.chunks 45 | } 46 | in 47 | let t = List.fold_left add empty chunks in 48 | { t with chunks = List.rev t.chunks} 49 | 50 | let doc t = t.chunks 51 | 52 | let code_chunks t = 53 | SM.fold (fun name _ names -> name::names) t.code [] 54 | 55 | let code_roots t = 56 | let add name _ names = SS.add name names in 57 | let roots = SM.fold add t.code SS.empty in 58 | let rec traverse_chunk roots = function 59 | | Doc(_) -> roots 60 | | Code(_,code) -> List.fold_left traverse_code roots code 61 | and traverse_code roots = function 62 | | Str(_,_) -> roots 63 | | Ref(n) -> SS.remove n roots 64 | in 65 | SS.elements @@ List.fold_left traverse_chunk roots t.chunks 66 | 67 | let references t = 68 | let code refs = function 69 | | Str(_) -> refs 70 | | Ref(r) -> SS.add r refs in 71 | let chunk refs = function 72 | | Doc(_) -> refs 73 | | Code(_,cs) -> List.fold_left code refs cs 74 | in 75 | List.fold_left chunk SS.empty t.chunks 76 | 77 | let unknown_references t = 78 | let (++) = SS.add in 79 | let (--) = SS.diff in 80 | let refs = references t in 81 | let defs = SM.fold 82 | (fun name _ names -> name ++ names) t.code SS.empty in 83 | SS.elements @@ refs -- defs 84 | 85 | let lookup name map = 86 | try 87 | SM.find name map 88 | with 89 | Not_found -> raise (NoSuchChunk name) 90 | 91 | 92 | let tangle t emit io chunk = 93 | let rec loop pred = function 94 | | [] -> () 95 | | Str(pos,s)::todo -> emit io pos s; loop pred todo 96 | | Ref(s)::todo -> 97 | if SS.mem s pred then 98 | raise (Cycle s) 99 | else 100 | ( loop (SS.add s pred) (lookup s t.code) 101 | ; loop pred todo 102 | ) 103 | in 104 | loop SS.empty (lookup chunk t.code) 105 | 106 | (* Just for debugging during development 107 | *) 108 | 109 | let excerpt s = 110 | let str = String.escaped s in 111 | let len = String.length str in 112 | if len < 40 then str 113 | else String.sub str 0 10 ^ "..." ^ String.sub str (len - 10) 10 114 | 115 | let code = function 116 | | Str(p,str) -> printf "%3d (%4d): %s\n" 117 | p.T.line p.T.offset (excerpt str) 118 | | Ref(str) -> printf "<<%s>>\n" str 119 | 120 | let chunk _map = function 121 | | Doc(str) -> Printf.printf "@ %s\n" (excerpt str) 122 | | Code(name,cs) -> 123 | ( Printf.printf "<<%s>>=\n" name 124 | ; List.iter code cs 125 | ) 126 | 127 | let print litprog = List.iter (chunk litprog.code) litprog.chunks 128 | -------------------------------------------------------------------------------- /src/litprog.mli: -------------------------------------------------------------------------------- 1 | (** This module implements a representation for a literate program which 2 | is a sequence of documentation and named code chunks. *) 3 | 4 | (** a requested code chunk doesn't exist *) 5 | exception NoSuchChunk of string 6 | 7 | (** code chunks refers to itself *) 8 | exception Cycle of string 9 | 10 | type code = 11 | | Str of Tangle.position * string 12 | | Ref of string 13 | (** Code is a string or a reference to another chunk *) 14 | 15 | type chunk = 16 | | Doc of string 17 | | Code of string * code list 18 | (** A chunk is a document chunk or a named code chunk. The code 19 | is modeled as a list of code and references *) 20 | 21 | 22 | type doc = chunk list 23 | type t 24 | (** A document is a list of chunks. Such a document can be turned into an 25 | abstract representation of type t *) 26 | 27 | val make : chunk list -> t 28 | val doc : t -> chunk list 29 | (** make and doc convert between the two representations but only the 30 | abstract representation can be used for querying it *) 31 | 32 | val code_chunks : t -> string list 33 | (** Return the names of all code chunks. *) 34 | 35 | val code_roots : t -> string list 36 | (** Code chunks to referred to by others are roots. Code_roots 37 | returns all root code chunks *) 38 | 39 | val tangle : t -> Tangle.t -> out_channel -> string -> unit 40 | (** Extract a named code chunk from a literate program by recursively 41 | resolving all references and writing the result to an output 42 | channel. The formatting is controlled by the Tangle.t value. The 43 | function may raise exceptions in case of undefined references or cyclic 44 | references *) 45 | 46 | val unknown_references : t -> string list 47 | (** Scan t and find all undefined references to code chunks *) 48 | 49 | (**/**) 50 | val print : t -> unit (** For debugging: emit t to stdout *) 51 | -------------------------------------------------------------------------------- /src/main.ml: -------------------------------------------------------------------------------- 1 | (** Command line evaluaton *) 2 | 3 | (** This is the main module that evaluates the command line and drives 4 | the program. *) 5 | 6 | module C = Cmdliner 7 | module S = Scanner 8 | module P = Parser 9 | module LP = Litprog 10 | module T = Tangle 11 | module RX = Re (* regular expression *) 12 | module G = Re_glob (* shell-style globbing *) 13 | 14 | exception Error of string 15 | let error fmt = Printf.kprintf (fun msg -> raise (Error msg)) fmt 16 | let eprintf = Printf.eprintf 17 | let printf = Printf.printf 18 | let giturl = "https://github.com/lindig/lipsum.git" 19 | 20 | 21 | let copyright () = 22 | List.iter print_endline 23 | [ giturl 24 | ; "Copyright (c) 2012, 2013, 2014, 2015" 25 | ; "Christian Lindig " 26 | ; "All rights reserved." 27 | ; "" 28 | ; "Redistribution and use in source and binary forms, with or" 29 | ; "without modification, are permitted provided that the following" 30 | ; "conditions are met:" 31 | ; "" 32 | ; "(1) Redistributions of source code must retain the above copyright" 33 | ; " notice, this list of conditions and the following disclaimer." 34 | ; "(2) Redistributions in binary form must reproduce the above copyright" 35 | ; " notice, this list of conditions and the following disclaimer in" 36 | ; " the documentation and/or other materials provided with the" 37 | ; " distribution." 38 | ; "" 39 | ; "THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND" 40 | ; "CONTRIBUTORS \"AS IS\" AND ANY EXPRESS OR IMPLIED WARRANTIES," 41 | ; "INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF" 42 | ; "MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE" 43 | ; "DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR" 44 | ; "CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL," 45 | ; "SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT" 46 | ; "LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF" 47 | ; "USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED" 48 | ; "AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT" 49 | ; "LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN" 50 | ; "ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE" 51 | ; "POSSIBILITY OF SUCH DAMAGE." 52 | ; "" 53 | ] 54 | 55 | let finally opn cls = 56 | let res = try opn () with exn -> cls (); raise exn in 57 | cls (); 58 | res 59 | 60 | (** Attach a file name to the input source that we are reading. This is 61 | most useful when we are reading from stdin and no file name 62 | was attached *) 63 | let set_filename (fname:string) (lexbuf:Lexing.lexbuf) = 64 | ( lexbuf.Lexing.lex_curr_p <- 65 | { lexbuf.Lexing.lex_curr_p with Lexing.pos_fname = fname } 66 | ; lexbuf 67 | ) 68 | 69 | (** Open a named file (or stdin), setup a lexer and call f on the lexer for 70 | the result. If a file was opened, it is closed before the result 71 | is returned *) 72 | let with_lexbuf path f = match path with 73 | | None -> 74 | Lexing.from_channel stdin |> set_filename "stdin" |> f 75 | | Some path -> 76 | let io = open_in path in 77 | finally 78 | (fun () -> Lexing.from_channel io |> set_filename path |> f) 79 | (fun () -> close_in io) 80 | 81 | let scan path = 82 | let rec loop lexbuf = 83 | match S.token' lexbuf with 84 | | P.EOF -> print_endline @@ S.to_string P.EOF 85 | | tok -> ( print_endline @@ S.to_string tok 86 | ; loop lexbuf 87 | ) 88 | in 89 | with_lexbuf path loop 90 | 91 | 92 | let escape path = with_lexbuf path (Escape.escape stdout) 93 | (** read input and emit it again, escaping characters where needed 94 | to turn this into a code chunk in a literate program *) 95 | 96 | let litprog path = with_lexbuf path (P.litprog S.token') |> LP.make 97 | (** create a literate program by parsing the input *) 98 | 99 | let parse path = litprog path |> LP.print 100 | (** emit a literate program for debugging *) 101 | 102 | (** expand chunk from litprog into a file named like chunk, using format *) 103 | let tangle_to_file (litprog:LP.t) (format:Tangle.t) (chunk:string) = 104 | let io = open_out chunk in 105 | finally 106 | (fun () -> LP.tangle litprog format io chunk) 107 | (fun () -> close_out io) 108 | 109 | let tangle_roots fmt path = 110 | let fmt = T.lookup fmt in 111 | let lp = litprog path in 112 | lp 113 | |> LP.code_roots 114 | |> List.iter (tangle_to_file lp fmt) 115 | 116 | let compile glob = 117 | try 118 | RX.compile @@ RX.whole_string @@ G.globx glob 119 | with 120 | G.Parse_error -> error "syntax error in pattern '%s'" glob 121 | 122 | let expand fmt glob path = 123 | (* only expand roots matching glob *) 124 | let rx = compile glob in (* rx can be used for matching a string *) 125 | let fmt = T.lookup fmt in 126 | let lp = litprog path in 127 | lp 128 | |> LP.code_roots 129 | |> List.filter (RX.execp rx) 130 | |> List.iter (tangle_to_file lp fmt) 131 | 132 | let tangle fmt chunk path = 133 | LP.tangle (litprog path) (T.lookup fmt) stdout chunk 134 | 135 | let weave path = 136 | litprog path 137 | |> LP.doc 138 | |> (Weave.lookup "plain") stdout 139 | 140 | let chunks path = 141 | litprog path 142 | |> LP.code_chunks 143 | |> List.iter print_endline 144 | 145 | let roots path = 146 | litprog path 147 | |> LP.code_roots 148 | |> List.iter print_endline 149 | 150 | let undefined path = 151 | litprog path 152 | |> LP.unknown_references 153 | |> List.iter print_endline 154 | 155 | let formats () = 156 | T.formats 157 | |> List.iter print_endline 158 | 159 | 160 | module Command = struct 161 | let filename = 162 | C.Arg.(value 163 | & pos 0 (some file) None ~rev:true 164 | & info [] 165 | ~docv:"file.lp" 166 | ~doc:"Literate program file. Defaults to stdin." 167 | ) 168 | 169 | let chunk = 170 | C.Arg.(required 171 | & pos 0 (some string) None 172 | & info [] 173 | ~docv:"chunk" 174 | ~doc:"Code chunk defined in literate program." 175 | ) 176 | 177 | let format = 178 | C.Arg.(value 179 | & opt string "plain" 180 | & info ["format"; "f"] 181 | ~docv:"format" 182 | ~doc:"Format of line number directives in output." 183 | ) 184 | 185 | let glob = 186 | C.Arg.(required 187 | & pos 0 (some string) None 188 | & info [] 189 | ~docv:"glob" 190 | ~doc:"Glob pattern for code chunks in literate program." 191 | ) 192 | 193 | let more_help = 194 | [ `S "MORE HELP" 195 | ; `P "Use `$(mname) $(i,COMMAND) --help' for help on a single command." 196 | ; `S "BUGS" 197 | ; `P "Check bug reports at https://github.com/lindig/lipsum/issues" 198 | ] 199 | 200 | let roots = 201 | let doc = "list all root code chunks defined in literate program." in 202 | let man = 203 | [ `S "DESCRIPTION" 204 | ; `P doc 205 | ; `Blocks more_help 206 | ] in 207 | C.Term. 208 | ( const roots $ filename 209 | , info "roots" ~doc ~man 210 | ) 211 | 212 | let chunks = 213 | let doc = "List all code chunks defined in literate program." in 214 | C.Term. 215 | ( const chunks $ filename 216 | , info "chunks" ~doc 217 | ) 218 | 219 | let undefined = 220 | let doc = "list all used but undefined code chunks." in 221 | C.Term. 222 | ( const undefined $ filename 223 | , info "undefined" ~doc 224 | ) 225 | 226 | let copyright = 227 | let doc = "Emit copyright notice on standard output." in 228 | C.Term. 229 | ( const copyright $ const () 230 | , info "copyright" ~doc 231 | ) 232 | 233 | let tangle = 234 | let doc = "Extract code chunk from literate program." in 235 | C.Term. 236 | ( const tangle $ format $ chunk $ filename 237 | , info "tangle" ~doc 238 | ) 239 | 240 | let expand = 241 | let doc = "Extract code chunks matching glob from literate program." in 242 | C.Term. 243 | ( const expand $ format $ glob $ filename 244 | , info "expand" ~doc 245 | ) 246 | 247 | let weave = 248 | let doc = "Emit literate program." in 249 | C.Term. 250 | ( const weave $ filename 251 | , info "weave" ~doc 252 | ) 253 | 254 | let scan = 255 | let doc = "For debugging, emit literate program as token sequence" in 256 | C.Term. 257 | ( const scan $ filename 258 | , info "scan" ~doc 259 | ) 260 | 261 | let parse = 262 | let doc = "For debugging, emit literate program" in 263 | C.Term. 264 | ( const parse $ filename 265 | , info "parse" ~doc 266 | ) 267 | 268 | 269 | let formats = 270 | let doc = "List formats available for option $(format)." in 271 | C.Term. 272 | ( const formats $ const () 273 | , info "formats" ~doc 274 | ) 275 | 276 | let lipsum = 277 | let doc = "literate programming tool" in 278 | let man = 279 | [ `S "DESCRIPTION" 280 | ; `P doc 281 | ; `Blocks more_help 282 | ] in 283 | C.Term. 284 | ( ret (const (fun _ -> `Help(`Pager, None)) $ const ()) 285 | , info "lipsum" ~doc ~man 286 | ) 287 | 288 | let all = 289 | [ copyright 290 | ; roots 291 | ; chunks 292 | ; undefined 293 | ; tangle 294 | ; expand 295 | ; weave 296 | ; scan 297 | ; formats 298 | ; parse 299 | ] 300 | end 301 | 302 | let main () = 303 | try match C.Term.eval_choice Command.lipsum Command.all ~catch:false with 304 | | `Error _ -> exit 1 305 | | _ -> exit 0 306 | with 307 | | Error(msg) -> eprintf "error: %s\n" msg; exit 1 308 | | Failure(msg) -> eprintf "error: %s\n" msg; exit 1 309 | | Scanner.Error(msg) -> eprintf "error: %s\n" msg; exit 1 310 | | Sys_error(msg) -> eprintf "error: %s\n" msg; exit 1 311 | | T.NoSuchFormat(s) -> eprintf "unknown tangle format %s\n" s; exit 1 312 | | LP.NoSuchChunk(msg)-> eprintf "no such chunk: %s\n" msg; exit 1 313 | | LP.Cycle(s) -> eprintf "chunk <<%s>> is part of a cycle\n" s; 314 | exit 1 315 | | exn -> Printf.eprintf "error: %s\n" 316 | (Printexc.to_string exn); 317 | exit 1 318 | 319 | let () = if !Sys.interactive then () else main () 320 | -------------------------------------------------------------------------------- /src/parser.mly: -------------------------------------------------------------------------------- 1 | %{ 2 | module LP = Litprog 3 | module P = Parsing 4 | module L = Lexing 5 | module T = Tangle 6 | 7 | let position p = 8 | { T.file = p.L.pos_fname 9 | ; T.line = p.L.pos_lnum 10 | ; T.column = p.L.pos_cnum - p.L.pos_bol 11 | ; T.offset = p.L.pos_cnum 12 | } 13 | 14 | %} 15 | 16 | %start litprog 17 | %type litprog 18 | 19 | %token EOF AT 20 | %token REF 21 | %token DEF 22 | %token STR 23 | 24 | %% /* rules below */ 25 | 26 | litprog : /**/ chunks EOF {List.rev $1} 27 | | STR chunks EOF {LP.Doc(snd $1) :: List.rev $2} 28 | ; 29 | 30 | chunks : chunks chunk {$2::$1} 31 | | /**/ {[]} 32 | ; 33 | 34 | chunk : code {$1} 35 | | doc {$1} 36 | ; 37 | 38 | doc : AT STR ; {LP.Doc(snd $2)} 39 | 40 | code : DEF body ; {LP.Code($1, List.rev $2)} 41 | 42 | body : body STR {LP.Str(position (fst $2),snd $2)::$1} 43 | | body REF {LP.Ref($2)::$1} 44 | | /**/ {[]} 45 | ; 46 | %% 47 | -------------------------------------------------------------------------------- /src/scanner.mll: -------------------------------------------------------------------------------- 1 | { 2 | module L = Lexing 3 | module B = Buffer 4 | module P = Parser (* defines tokens *) 5 | 6 | let get = Lexing.lexeme 7 | let getchar = Lexing.lexeme_char 8 | let new_line = Lexing.new_line 9 | 10 | exception Error of string 11 | 12 | let position lexbuf = 13 | let p = lexbuf.L.lex_curr_p in 14 | Printf.sprintf "%s:%d:%d" 15 | p.L.pos_fname p.L.pos_lnum (p.L.pos_cnum - p.L.pos_bol) 16 | 17 | let error lexbuf fmt = 18 | let p = position lexbuf in 19 | Printf.kprintf (fun msg -> raise (Error (p^" "^msg))) fmt 20 | 21 | 22 | let warning lexbuf fmt = 23 | let p = position lexbuf in 24 | let s = Printf.sprintf fmt in 25 | prerr_endline ("warning "^p^" "^s) 26 | 27 | let return tok pos str = (tok,pos,B.contents str) 28 | let (@@) f x = f x 29 | 30 | (* col 0 is true, iff a match starts at the beginning of a line *) 31 | let col n lexbuf = 32 | let p = lexbuf.L.lex_start_p 33 | in 34 | p.L.pos_cnum + n = p.L.pos_bol 35 | 36 | } 37 | 38 | rule token pos str = parse 39 | eof { return P.EOF pos str } 40 | | "@<<" { B.add_string str "<<" 41 | ; token pos str lexbuf 42 | } 43 | | "<<" { let x = name (Buffer.create 40) lexbuf in 44 | return x pos str 45 | } 46 | | "@ " { if col 0 lexbuf 47 | then return P.AT pos str 48 | else ( B.add_string str (get lexbuf) 49 | ; token pos str lexbuf 50 | ) 51 | } 52 | | "@\n" { new_line lexbuf; 53 | if col 0 lexbuf 54 | then return P.AT pos str 55 | else ( B.add_string str (get lexbuf) 56 | ; token pos str lexbuf 57 | ) 58 | } 59 | | "@>>" { warning lexbuf "spurious @>>?" 60 | ; B.add_string str (get lexbuf) 61 | ; token pos str lexbuf 62 | } 63 | 64 | (* 65 | | "\n@ " { new_line lexbuf; return P.AT pos str } 66 | *) 67 | 68 | | "@@" { (if col 0 lexbuf 69 | then B.add_char str '@' 70 | else B.add_string str "@@"); 71 | token pos str lexbuf 72 | } 73 | | "@@<<" { B.add_string str "@<<" 74 | ; token pos str lexbuf } 75 | | '\n' { new_line lexbuf 76 | ; B.add_char str '\n' 77 | ; token pos str lexbuf 78 | } 79 | | _ { B.add_char str (getchar lexbuf 0) 80 | ; token pos str lexbuf 81 | } 82 | and name str = parse 83 | eof { error lexbuf "unexpected end of file in <<..>>" } 84 | | '\n' { error lexbuf "unexpected newline in <<..>>"} 85 | | "@<<" { B.add_string str "<<" ; name str lexbuf } 86 | | "@>>" { B.add_string str ">>" ; name str lexbuf } 87 | | "@>>=" { B.add_string str ">>="; name str lexbuf } 88 | | "@@" { B.add_char str '@' ; name str lexbuf } 89 | | "@@>>" { B.add_string str "@>>" ; name str lexbuf } 90 | | "@@<<" { B.add_string str "@<<" ; name str lexbuf } 91 | | "@@>>=" { B.add_string str "@>>="; name str lexbuf } 92 | | ">>" { P.REF (B.contents str) } 93 | | ">>=" { P.DEF (B.contents str) } 94 | (* special case - eat up newline. Is this a good idea? *) 95 | | ">>=\n" { new_line lexbuf; P.DEF (B.contents str)} 96 | | _ { B.add_char str (getchar lexbuf 0) 97 | ; name str lexbuf 98 | } 99 | 100 | 101 | { 102 | 103 | let excerpt s = 104 | let str = String.escaped s in 105 | let len = String.length str in 106 | if len < 40 then str 107 | else 108 | String.sub str 0 20 ^ " ... " ^ String.sub str (len - 20) 20 109 | 110 | let to_string = function 111 | | P.EOF -> "EOF" 112 | | P.DEF(s) -> Printf.sprintf "<<%s>>=" s 113 | | P.REF(s) -> Printf.sprintf "<<%s>>" s 114 | | P.AT -> "@" 115 | | P.STR(_,s) -> excerpt s 116 | 117 | 118 | (** See the file IMPLEMENTATION.md for an explanation. Function token' 119 | either returns the token stored in next, or calls the scanner which 120 | returns two tokens. One is stored and the other one returned *) 121 | 122 | let next = ref None 123 | let token' lexbuf = 124 | let pos = lexbuf.L.lex_curr_p in 125 | match !next with 126 | | None -> let t,p,s = token pos (Buffer.create 256) lexbuf in 127 | ( next := Some t 128 | ; P.STR(p,s) 129 | ) 130 | | Some t -> ( next := None 131 | ; t 132 | ) 133 | 134 | 135 | } 136 | -------------------------------------------------------------------------------- /src/tangle.ml: -------------------------------------------------------------------------------- 1 | 2 | module SM = Map.Make(String) 3 | 4 | exception NoSuchFormat of string 5 | 6 | type position = 7 | { file: string 8 | ; line: int 9 | ; column: int 10 | ; offset: int 11 | } 12 | 13 | type t = out_channel -> position -> string -> unit 14 | 15 | 16 | 17 | let fprintf = Printf.fprintf 18 | let escaped = String.escaped 19 | let (@@) f x = f x 20 | 21 | let plain io _pos str = 22 | output_string io str 23 | 24 | let cpp io pos str = 25 | ( fprintf io "\n# %d \"%s\"\n" pos.line (escaped pos.file) 26 | ; output_string io str 27 | ) 28 | 29 | let comment cstr io pos str = 30 | ( fprintf io "%s %s %d\n" cstr (escaped pos.file) pos.line 31 | ; output_string io str 32 | ) 33 | 34 | let formats = 35 | let add map (keys,value) = 36 | List.fold_left (fun m k -> SM.add k value m) map keys 37 | in 38 | List.fold_left add SM.empty 39 | [ ["plain"], plain 40 | ; ["cpp";"c";"cxx";"h"], cpp 41 | ; ["postscript";"ps";"eps";"tex";"latex"], comment "%" 42 | ; ["ruby";"rb";"shell";"sh"], comment "#" 43 | ] 44 | 45 | let lookup fmt = 46 | try 47 | SM.find fmt formats 48 | with 49 | Not_found -> raise (NoSuchFormat fmt) 50 | 51 | let formats = List.map fst @@ SM.bindings formats 52 | -------------------------------------------------------------------------------- /src/tangle.mli: -------------------------------------------------------------------------------- 1 | (** This module provides formats for emitting code from a literate program. 2 | This process is called traditionally called tangling. Typically a 3 | format will want to emit file positions that refer back to the literate 4 | program such that compiler error messages can be traced back to the 5 | original source, i.e., the literate program. 6 | 7 | The module maintains a directory of named formatters which can be 8 | looked up. Currently there is no way to add new formatters 9 | programatically. *) 10 | 11 | exception NoSuchFormat of string (** raised if lookup fails *) 12 | 13 | type position = 14 | { file : string (** original source file *) 15 | ; line : int (** original line in source file *) 16 | ; column : int (** original column in source file *) 17 | ; offset: int (** byte offset from beginning of source file *) 18 | } 19 | 20 | type t = out_channel -> position -> string -> unit 21 | (** emits code originating from position to output channel *) 22 | 23 | (** available formats *) 24 | val lookup : string -> t (* NoSuchFormat *) 25 | val formats : string list 26 | 27 | val plain : t 28 | (** a very basic formatter *) 29 | -------------------------------------------------------------------------------- /src/weave.ml: -------------------------------------------------------------------------------- 1 | 2 | module SM = Map.Make(String) 3 | module LP = Litprog 4 | 5 | exception NoSuchFormat of string 6 | 7 | type t = out_channel -> LP.doc -> unit 8 | 9 | let (@@) f x = f x 10 | let fprintf = Printf.fprintf 11 | 12 | module Markdown = struct 13 | 14 | let output_code io = 15 | String.iter (function 16 | | '\n' -> output_string io "\n " 17 | | c -> output_char io c 18 | ) 19 | 20 | let code io = function 21 | | LP.Str(_,str) -> output_code io str 22 | | LP.Ref(str) -> fprintf io "<<%s>>" str 23 | 24 | let chunk io = function 25 | | LP.Doc(str) -> output_string io str 26 | | LP.Code(name, src) -> 27 | ( output_code io @@ Printf.sprintf " <<%s>>=\n" name 28 | ; List.iter (code io) src 29 | ; output_char io '\n' 30 | ) 31 | 32 | let weave io chunks = List.iter (chunk io) chunks 33 | end 34 | 35 | let formats = 36 | let add map (keys,value) = 37 | List.fold_left (fun m k -> SM.add k value m) map keys 38 | in 39 | List.fold_left add SM.empty 40 | [ ["plain";"markdown"], Markdown.weave 41 | ] 42 | 43 | let lookup fmt = 44 | try 45 | SM.find fmt formats 46 | with 47 | Not_found -> raise (NoSuchFormat fmt) 48 | 49 | (* this shadows the table above *) 50 | let formats = List.map fst @@ SM.bindings formats 51 | 52 | 53 | -------------------------------------------------------------------------------- /src/weave.mli: -------------------------------------------------------------------------------- 1 | (** Weaving is called the process of formatting a literate program 2 | as documentation (in contrast to formatting its embedded code 3 | for compilation). This module provides formatters for weaving 4 | a literate program. *) 5 | 6 | exception NoSuchFormat of string (** requested format doesn't exist *) 7 | 8 | type t = out_channel -> Litprog.doc -> unit 9 | (* emit doc to channel *) 10 | 11 | val lookup : string -> t 12 | (** lookup named format *) 13 | 14 | val formats : string list 15 | (** names of available formats *) 16 | 17 | -------------------------------------------------------------------------------- /test/a.lp: -------------------------------------------------------------------------------- 1 | <>= 2 | 1 3 | 2 4 | 3 5 | 4 6 | 5 7 | first <> 8 | second <> 9 | <>= 10 | 10 11 | 11 12 | 12 13 | 13 14 | 14 15 | 15 16 | 16 17 | 17 18 | 18 19 | 19 20 | 20 21 | 21 22 | 22 23 | 23 24 | 24 25 | 25 26 | <>= 27 | 6 28 | 7 29 | 8 30 | 9 -------------------------------------------------------------------------------- /test/b.lp: -------------------------------------------------------------------------------- 1 | <>= 2 | 1 3 | 2 4 | 3 5 | 4 6 | 5 7 | first <> 8 | second <> 9 | <>= 10 | 10 11 | 11 12 | 12 13 | 13 14 | 14 15 | 15 16 | 16 17 | 17 18 | 18 19 | 19 20 | 20 21 | 21 22 | 22 23 | 23 24 | 24 25 | 25 26 | <>= 27 | 6 28 | 7 29 | 8 30 | 9 31 | <> -------------------------------------------------------------------------------- /test/function.sh: -------------------------------------------------------------------------------- 1 | 2 | try() { 3 | out=`echo -n "$1" | tr ^ '\012' | $2 | tr '\012' ^` 4 | if [ "$3" = "$out" ]; then 5 | echo "ok" 6 | else 7 | echo "$out != $3" 8 | fi 9 | } -------------------------------------------------------------------------------- /test/test.lp: -------------------------------------------------------------------------------- 1 | <>= 2 | 1 3 | 2 4 | 3 5 | <> 6 | 10 7 | 11 8 | 12 9 | @ Some documentation 10 | 11 | <>= 12 | 4 13 | 5 14 | 6 15 | <>= 16 | 7 17 | 8 18 | 9 19 | <>= 20 | 13 21 | 14 22 | 15 23 | @ 24 | 25 | 26 | --------------------------------------------------------------------------------