├── .gitignore ├── Makefile ├── README.md ├── src ├── lex.c ├── lex.h ├── main.c ├── parse.c ├── parse.h ├── run.c └── run.h └── tests ├── 4096.txt ├── arraysum.txt ├── deep.txt └── fizzbuzz.txt /.gitignore: -------------------------------------------------------------------------------- 1 | obj/ 2 | interp 3 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | CFLAGS = -std=gnu11 -Wall -Werror 2 | NAME = interp 3 | 4 | SRCDIR := ./src 5 | OBJDIR := ./obj 6 | SRCS := $(addprefix $(SRCDIR)/, lex.c parse.c run.c main.c) 7 | OBJS := $(addprefix $(OBJDIR)/, $(notdir $(SRCS:.c=.o))) 8 | 9 | all: $(NAME) 10 | 11 | $(OBJS): $(OBJDIR)/%.o : $(SRCDIR)/%.c 12 | @mkdir -p $(OBJDIR) 13 | $(CC) $(CFLAGS) -c $< -o $@ 14 | 15 | $(NAME): $(OBJS) 16 | $(CC) -o $(NAME) $^ 17 | 18 | .PHONY: clean 19 | 20 | clean: 21 | rm -rf $(OBJDIR) 22 | rm -f $(NAME) 23 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## What is this? 2 | 3 | This is a hackable and extensible lexer, parser and interpreter for a minimalistic, imperative, C-like language. It can also be used as an educational tool for understanding lexing and parsing. 4 | 5 | ## How does it work? 6 | 7 | The lexer produces a list of tokens from the input (a `PROT_READ` memory-mapped file). Each individual token is produced by consuming a character from the input and then asking each of the token functions whether they accept it. Each such token function has an internal state and can return any of `STS_ACCEPT`, `STS_REJECT` or `STS_HUNGRY` on a character consumed. The lexer reads in characters until all of the functions return `STS_REJECT` (at which point they reset their internal state), and then the accepted token is determined by looking back for an `STS_ACCEPT` from the previous iteration. 8 | 9 | Essentially, this is a "maximal munch" algorithm. 10 | 11 | The parser takes the list of tokens and produces a tree. It does that by continuously shifting tokens off the input to the parse stack and then reducing any matching suffix of the stack to a non-terminal, according to the rules of the grammar. The grammar is defined as a static array of structs, where each struct is a rule. When a rule matches a suffix of the stack, a reduction is made. The reduction essentially creates a single level of child nodes (the symbols that matched the rule) and they get parented by a new non-terminal symbol on the stack (the left-hand side of the matching rule). 12 | 13 | In effect, this is a shift-reduce, bottom-up parser. Because the parser has no state and decision tables, a few additional hacks are implemented in order to support operator precedence and if-elif-else chains. 14 | 15 | The interpreter is really straightforward. It starts from the top of the parse tree and walks down through the child nodes, executing the statements and evaluating the expressions. Any warnings during the execution of the program are written to standard error with a `warn:` prefix. 16 | 17 | ## The Language 18 | 19 | * Control-flow statements (the curly braces are mandatory): 20 | * `if (Expr) { N✕Stmt } elif (Expr) { N✕Stmt } else { N✕Stmt }` 21 | * `while (Expr) { N✕Stmt }` 22 | * `do { N✕Stmt } while (Expr);` 23 | 24 | * Variable and array assignment (integers only, `Name` is equivalent to `Name[0]`): 25 | * `Name = Expr;` 26 | * `Name[Expr] = Expr;` 27 | 28 | * Printing to standard output (integers only): 29 | * `print "Placeholder: " Expr;` 30 | * `print Expr;` 31 | 32 | * Parenthesised expressions (integers only): 33 | * `(Expr)` 34 | 35 | * Binary expressions (between two integers): 36 | * `Expr OP Expr`, where `OP` is `+`, `-`, `*`, `/`, `%`, `==`, `!=`, `<`, `>`, `<=`, `>=`, `&&` or `||` 37 | 38 | * Unary expressions (integers only): 39 | * `OP Expr`, where `OP` is `-`, `+` or `!` 40 | 41 | * A ternary expression (integers only): 42 | * `Expr ? Expr : Expr` 43 | 44 | * Line and block comments: 45 | * `// line comment` 46 | * `/* block comment */` 47 | 48 | ## Sample Output 49 | You start the interpreter by specifying the file containing the code. 50 | 51 | Once the file is opened and mapped into memory, the lexer starts. The tokens will be written to standard output as they appear in the file, in alternating colours (green and yellow), so that you can clearly see where each token starts and ends. 52 | 53 | If the lexing was successful (all the tokens were recognised), the parser starts. On each shift or reduce operation, it outputs a single line with the current contents of the parse stack. Non-terminals are in yellow, terminals are in green. Finally, if the parsing was successful, the parse stack should contain a single non-terminal called "Unit". 54 | 55 | The interpreter then starts from the root of the tree (which is always "Unit"), and executes the tree produced by the parser. 56 | ``` 57 | $ ./interp tests/fizzbuzz.txt 58 | *** Lexing *** 59 | number = 1; 60 | 61 | do { 62 | if (number % 3 == 0 && number % 5 == 0) { 63 | print "FizzBuzz " number; 64 | } elif (number % 5 == 0) { 65 | print "Fizz " number; 66 | } elif (number % 3 == 0) { 67 | print "Buzz " number; 68 | } else { 69 | print number; 70 | } 71 | 72 | number = number + 1; 73 | } while (number <= 100); 74 | 75 | *** Parsing *** 76 | Shift: ^ 77 | Shift: ^ number 78 | Shift: ^ number = 79 | Shift: ^ number = 1 80 | Red19: ^ number = Atom 81 | Red20: ^ number = Expr 82 | Shift: ^ number = Expr ; 83 | Red05: ^ Assn 84 | Red02: ^ Stmt 85 | Shift: ^ Stmt do 86 | Shift: ^ Stmt do { 87 | Shift: ^ Stmt do { if 88 | Shift: ^ Stmt do { if ( 89 | Shift: ^ Stmt do { if ( number 90 | Red18: ^ Stmt do { if ( Atom 91 | Red20: ^ Stmt do { if ( Expr 92 | Shift: ^ Stmt do { if ( Expr % 93 | Shift: ^ Stmt do { if ( Expr % 3 94 | Red19: ^ Stmt do { if ( Expr % Atom 95 | Red20: ^ Stmt do { if ( Expr % Expr 96 | Red39: ^ Stmt do { if ( Bexp 97 | Red22: ^ Stmt do { if ( Expr 98 | Shift: ^ Stmt do { if ( Expr == 99 | ... 100 | Red21: ^ Stmt do { Stmt Stmt } while Expr 101 | Shift: ^ Stmt do { Stmt Stmt } while Expr ; 102 | Red16: ^ Stmt Dowh 103 | Red11: ^ Stmt Ctrl 104 | Red04: ^ Stmt Stmt 105 | Shift: ^ Stmt Stmt $ 106 | Red01: Unit 107 | ACCEPT Unit 108 | 109 | *** Running *** 110 | 1 111 | 2 112 | Buzz 3 113 | 4 114 | Fizz 5 115 | Buzz 6 116 | 7 117 | 8 118 | Buzz 9 119 | Fizz 10 120 | 11 121 | Buzz 12 122 | 13 123 | 14 124 | FizzBuzz 15 125 | ... 126 | 94 127 | Fizz 95 128 | Buzz 96 129 | 97 130 | 98 131 | Buzz 99 132 | Fizz 100 133 | -------------------------------------------------------------------------------- /src/lex.c: -------------------------------------------------------------------------------- 1 | #include "lex.h" 2 | 3 | #include 4 | #include 5 | 6 | enum { 7 | STS_ACCEPT, 8 | STS_REJECT, 9 | STS_HUNGRY, 10 | }; 11 | 12 | typedef uint8_t sts_t; 13 | 14 | #define TR(st, tr) (*s = (st), (STS_##tr)) 15 | #define REJECT TR(0, REJECT) 16 | 17 | #define IS_ALPHA(c) (((c) >= 'a' && (c) <= 'z') || ((c) >= 'A' && (c) <= 'Z')) 18 | #define IS_DIGIT(c) ((c) >= '0' && (c) <= '9') 19 | #define IS_ALNUM(c) (IS_ALPHA(c) || IS_DIGIT(c)) 20 | #define IS_WSPACE(c) ((c) == ' ' || (c) == '\t' || (c) == '\r' || (c) == '\n') 21 | 22 | #define TOKEN_DEFINE_1(token, str) \ 23 | static sts_t token(const uint8_t c, uint8_t *const s) \ 24 | { \ 25 | switch (*s) { \ 26 | case 0: return c == (str)[0] ? TR(1, ACCEPT) : REJECT; \ 27 | case 1: return REJECT; \ 28 | default: abort(); \ 29 | } \ 30 | } 31 | 32 | #define TOKEN_DEFINE_2(token, str) \ 33 | static sts_t token(const uint8_t c, uint8_t *const s) \ 34 | { \ 35 | switch (*s) { \ 36 | case 0: return c == (str)[0] ? TR(1, HUNGRY) : REJECT; \ 37 | case 1: return c == (str)[1] ? TR(2, ACCEPT) : REJECT; \ 38 | case 2: return REJECT; \ 39 | default: abort(); \ 40 | } \ 41 | } 42 | 43 | #define TOKEN_DEFINE_3(token, str) \ 44 | static sts_t token(const uint8_t c, uint8_t *const s) \ 45 | { \ 46 | switch (*s) { \ 47 | case 0: return c == (str)[0] ? TR(1, HUNGRY) : REJECT; \ 48 | case 1: return c == (str)[1] ? TR(2, HUNGRY) : REJECT; \ 49 | case 2: return c == (str)[2] ? TR(3, ACCEPT) : REJECT; \ 50 | case 3: return REJECT; \ 51 | default: abort(); \ 52 | } \ 53 | } 54 | 55 | #define TOKEN_DEFINE_4(token, str) \ 56 | static sts_t token(const uint8_t c, uint8_t *const s) \ 57 | { \ 58 | switch (*s) { \ 59 | case 0: return c == (str)[0] ? TR(1, HUNGRY) : REJECT; \ 60 | case 1: return c == (str)[1] ? TR(2, HUNGRY) : REJECT; \ 61 | case 2: return c == (str)[2] ? TR(3, HUNGRY) : REJECT; \ 62 | case 3: return c == (str)[3] ? TR(4, ACCEPT) : REJECT; \ 63 | case 4: return REJECT; \ 64 | default: abort(); \ 65 | } \ 66 | } 67 | 68 | #define TOKEN_DEFINE_5(token, str) \ 69 | static sts_t token(const uint8_t c, uint8_t *const s) \ 70 | { \ 71 | switch (*s) { \ 72 | case 0: return c == (str)[0] ? TR(1, HUNGRY) : REJECT; \ 73 | case 1: return c == (str)[1] ? TR(2, HUNGRY) : REJECT; \ 74 | case 2: return c == (str)[2] ? TR(3, HUNGRY) : REJECT; \ 75 | case 3: return c == (str)[3] ? TR(4, HUNGRY) : REJECT; \ 76 | case 4: return c == (str)[4] ? TR(5, ACCEPT) : REJECT; \ 77 | case 5: return REJECT; \ 78 | default: abort(); \ 79 | } \ 80 | } 81 | 82 | static sts_t tk_name(const uint8_t c, uint8_t *const s) 83 | { 84 | enum { 85 | tk_name_begin, 86 | tk_name_accum, 87 | }; 88 | 89 | switch (*s) { 90 | case tk_name_begin: 91 | return IS_ALPHA(c) || (c == '_') ? TR(tk_name_accum, ACCEPT) : REJECT; 92 | 93 | case tk_name_accum: 94 | return IS_ALNUM(c) || (c == '_') ? STS_ACCEPT : REJECT; 95 | } 96 | 97 | abort(); 98 | } 99 | 100 | static sts_t tk_nmbr(const uint8_t c, uint8_t *const s) 101 | { 102 | (void) s; 103 | return IS_DIGIT(c) ? STS_ACCEPT : STS_REJECT; 104 | } 105 | 106 | static sts_t tk_strl(const uint8_t c, uint8_t *const s) 107 | { 108 | enum { 109 | tk_strl_begin, 110 | tk_strl_accum, 111 | tk_strl_end, 112 | }; 113 | 114 | switch (*s) { 115 | case tk_strl_begin: 116 | return c == '"' ? TR(tk_strl_accum, HUNGRY) : REJECT; 117 | 118 | case tk_strl_accum: 119 | return c != '"' ? STS_HUNGRY : TR(tk_strl_end, ACCEPT); 120 | 121 | case tk_strl_end: 122 | return REJECT; 123 | } 124 | 125 | abort(); 126 | } 127 | 128 | static sts_t tk_wspc(const uint8_t c, uint8_t *const s) 129 | { 130 | enum { 131 | tk_wspc_begin, 132 | tk_wspc_accum, 133 | }; 134 | 135 | switch (*s) { 136 | case tk_wspc_begin: 137 | return IS_WSPACE(c) ? TR(tk_wspc_accum, ACCEPT) : REJECT; 138 | 139 | case tk_wspc_accum: 140 | return IS_WSPACE(c) ? STS_ACCEPT : REJECT; 141 | } 142 | 143 | abort(); 144 | } 145 | 146 | static sts_t tk_lcom(const uint8_t c, uint8_t *const s) 147 | { 148 | enum { 149 | tk_lcom_begin, 150 | tk_lcom_first_slash, 151 | tk_lcom_accum, 152 | tk_lcom_end 153 | }; 154 | 155 | switch (*s) { 156 | case tk_lcom_begin: 157 | return c == '/' ? TR(tk_lcom_first_slash, HUNGRY) : REJECT; 158 | 159 | case tk_lcom_first_slash: 160 | return c == '/' ? TR(tk_lcom_accum, HUNGRY) : REJECT; 161 | 162 | case tk_lcom_accum: 163 | return c == '\n' || c == '\r' ? TR(tk_lcom_end, ACCEPT) : STS_HUNGRY; 164 | 165 | case tk_lcom_end: 166 | return REJECT; 167 | } 168 | 169 | abort(); 170 | } 171 | 172 | static sts_t tk_bcom(const uint8_t c, uint8_t *const s) 173 | { 174 | enum { 175 | tk_bcom_begin, 176 | tk_bcom_open_slash, 177 | tk_bcom_accum, 178 | tk_bcom_close_star, 179 | tk_bcom_end 180 | }; 181 | 182 | switch (*s) { 183 | case tk_bcom_begin: 184 | return c == '/' ? TR(tk_bcom_open_slash, HUNGRY) : REJECT; 185 | 186 | case tk_bcom_open_slash: 187 | return c == '*' ? TR(tk_bcom_accum, HUNGRY) : REJECT; 188 | 189 | case tk_bcom_accum: 190 | return c != '*' ? STS_HUNGRY : TR(tk_bcom_close_star, HUNGRY); 191 | 192 | case tk_bcom_close_star: 193 | return c == '/' ? TR(tk_bcom_end, ACCEPT) : TR(tk_bcom_accum, HUNGRY); 194 | 195 | case tk_bcom_end: 196 | return REJECT; 197 | } 198 | 199 | abort(); 200 | } 201 | 202 | TOKEN_DEFINE_1(tk_lpar, "(") 203 | TOKEN_DEFINE_1(tk_rpar, ")") 204 | TOKEN_DEFINE_1(tk_lbra, "[") 205 | TOKEN_DEFINE_1(tk_rbra, "]") 206 | TOKEN_DEFINE_1(tk_lbrc, "{") 207 | TOKEN_DEFINE_1(tk_rbrc, "}") 208 | TOKEN_DEFINE_2(tk_cond, "if") 209 | TOKEN_DEFINE_4(tk_elif, "elif") 210 | TOKEN_DEFINE_4(tk_else, "else") 211 | TOKEN_DEFINE_2(tk_dowh, "do") 212 | TOKEN_DEFINE_5(tk_whil, "while") 213 | TOKEN_DEFINE_1(tk_assn, "=") 214 | TOKEN_DEFINE_2(tk_equl, "==") 215 | TOKEN_DEFINE_2(tk_neql, "!=") 216 | TOKEN_DEFINE_1(tk_lthn, "<") 217 | TOKEN_DEFINE_1(tk_gthn, ">") 218 | TOKEN_DEFINE_2(tk_lteq, "<=") 219 | TOKEN_DEFINE_2(tk_gteq, ">=") 220 | TOKEN_DEFINE_2(tk_conj, "&&") 221 | TOKEN_DEFINE_2(tk_disj, "||") 222 | TOKEN_DEFINE_1(tk_plus, "+") 223 | TOKEN_DEFINE_1(tk_mins, "-") 224 | TOKEN_DEFINE_1(tk_mult, "*") 225 | TOKEN_DEFINE_1(tk_divi, "/") 226 | TOKEN_DEFINE_1(tk_modu, "%") 227 | TOKEN_DEFINE_1(tk_nega, "!") 228 | TOKEN_DEFINE_5(tk_prnt, "print") 229 | TOKEN_DEFINE_1(tk_scol, ";") 230 | TOKEN_DEFINE_1(tk_ques, "?") 231 | TOKEN_DEFINE_1(tk_coln, ":") 232 | 233 | static sts_t (*const token_funcs[TK_COUNT])(const uint8_t, uint8_t *const) = { 234 | tk_name, 235 | tk_nmbr, 236 | tk_strl, 237 | tk_wspc, 238 | tk_lcom, 239 | tk_bcom, 240 | tk_lpar, 241 | tk_rpar, 242 | tk_lbra, 243 | tk_rbra, 244 | tk_lbrc, 245 | tk_rbrc, 246 | tk_cond, 247 | tk_elif, 248 | tk_else, 249 | tk_dowh, 250 | tk_whil, 251 | tk_assn, 252 | tk_equl, 253 | tk_neql, 254 | tk_lthn, 255 | tk_gthn, 256 | tk_lteq, 257 | tk_gteq, 258 | tk_conj, 259 | tk_disj, 260 | tk_plus, 261 | tk_mins, 262 | tk_mult, 263 | tk_divi, 264 | tk_modu, 265 | tk_nega, 266 | tk_prnt, 267 | tk_scol, 268 | tk_ques, 269 | tk_coln, 270 | }; 271 | 272 | static inline int push_token(struct token **const tokens, 273 | size_t *const ntokens, size_t *const allocated, const tk_t token, 274 | const uint8_t *const beg, const uint8_t *const end) 275 | { 276 | if (*ntokens >= *allocated) { 277 | *allocated = (*allocated ?: 1) * 8; 278 | 279 | struct token *const tmp = 280 | realloc(*tokens, *allocated * sizeof(struct token)); 281 | 282 | if (!tmp) { 283 | return free(*tokens), *tokens = NULL, LEX_NOMEM; 284 | } 285 | 286 | *tokens = tmp; 287 | } 288 | 289 | (*tokens)[(*ntokens)++] = (struct token) { 290 | .beg = beg, 291 | .end = end, 292 | .tk = token 293 | }; 294 | 295 | return LEX_OK; 296 | } 297 | 298 | int lex(const uint8_t *const input, const size_t size, 299 | struct token **const tokens, size_t *const ntokens) 300 | { 301 | static struct { 302 | sts_t prev, curr; 303 | } statuses[TK_COUNT] = { 304 | [0 ... TK_COUNT - 1] = { STS_HUNGRY, STS_REJECT } 305 | }; 306 | 307 | uint8_t states[TK_COUNT] = {0}; 308 | 309 | const uint8_t *prefix_beg = input, *prefix_end = input; 310 | tk_t accepted_token; 311 | size_t allocated = 0; 312 | *tokens = NULL, *ntokens = 0; 313 | 314 | #define PUSH_OR_NOMEM(tk, beg, end) \ 315 | if (push_token(tokens, ntokens, &allocated, (tk), (beg), (end))) { \ 316 | return LEX_NOMEM; \ 317 | } 318 | 319 | #define foreach_tk \ 320 | for (tk_t tk = 0; tk < TK_COUNT; ++tk) 321 | 322 | PUSH_OR_NOMEM(TK_FBEG, NULL, NULL); 323 | 324 | while (prefix_end < input + size) { 325 | int did_accept = 0; 326 | 327 | foreach_tk { 328 | if (statuses[tk].prev != STS_REJECT) { 329 | statuses[tk].curr = token_funcs[tk](*prefix_end, &states[tk]); 330 | } 331 | 332 | if (statuses[tk].curr != STS_REJECT) { 333 | did_accept = 1; 334 | } 335 | } 336 | 337 | if (did_accept) { 338 | prefix_end++; 339 | 340 | foreach_tk { 341 | statuses[tk].prev = statuses[tk].curr; 342 | } 343 | } else { 344 | accepted_token = TK_COUNT; 345 | 346 | foreach_tk { 347 | if (statuses[tk].prev == STS_ACCEPT) { 348 | accepted_token = tk; 349 | } 350 | 351 | statuses[tk].prev = STS_HUNGRY; 352 | statuses[tk].curr = STS_REJECT; 353 | } 354 | 355 | PUSH_OR_NOMEM(accepted_token, prefix_beg, prefix_end); 356 | 357 | if (accepted_token == TK_COUNT) { 358 | (*tokens)[*ntokens - 1].end++; 359 | return LEX_UNKNOWN_TOKEN; 360 | } 361 | 362 | prefix_beg = prefix_end; 363 | } 364 | } 365 | 366 | accepted_token = TK_COUNT; 367 | 368 | foreach_tk { 369 | if (statuses[tk].curr == STS_ACCEPT) { 370 | accepted_token = tk; 371 | } 372 | 373 | statuses[tk].prev = STS_HUNGRY; 374 | statuses[tk].curr = STS_REJECT; 375 | } 376 | 377 | PUSH_OR_NOMEM(accepted_token, prefix_beg, prefix_end); 378 | 379 | if (accepted_token == TK_COUNT) { 380 | return LEX_UNKNOWN_TOKEN; 381 | } 382 | 383 | PUSH_OR_NOMEM(TK_FEND, NULL, NULL); 384 | return LEX_OK; 385 | 386 | #undef PUSH_OR_NOMEM 387 | #undef foreach_tk 388 | } 389 | -------------------------------------------------------------------------------- /src/lex.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | 6 | #define COLOURED(s, b, c) "\033[" #b ";" #c "m" s "\033[0m" 7 | #define GRAY(s) COLOURED(s, 0, 37) 8 | #define RED(s) COLOURED(s, 1, 31) 9 | #define GREEN(s) COLOURED(s, 1, 32) 10 | #define YELLOW(s) COLOURED(s, 1, 33) 11 | #define ORANGE(s) COLOURED(s, 1, 34) 12 | #define CYAN(s) COLOURED(s, 1, 36) 13 | #define WHITE(s) COLOURED(s, 1, 37) 14 | 15 | enum { 16 | TK_NAME, 17 | TK_NMBR, 18 | TK_STRL, 19 | TK_WSPC, 20 | TK_LCOM, 21 | TK_BCOM, 22 | TK_LPAR, 23 | TK_RPAR, 24 | TK_LBRA, 25 | TK_RBRA, 26 | TK_LBRC, 27 | TK_RBRC, 28 | TK_COND, 29 | TK_ELIF, 30 | TK_ELSE, 31 | TK_DOWH, 32 | TK_WHIL, 33 | TK_ASSN, 34 | TK_EQUL, 35 | TK_NEQL, 36 | TK_LTHN, 37 | TK_GTHN, 38 | TK_LTEQ, 39 | TK_GTEQ, 40 | TK_CONJ, 41 | TK_DISJ, 42 | TK_PLUS, 43 | TK_MINS, 44 | TK_MULT, 45 | TK_DIVI, 46 | TK_MODU, 47 | TK_NEGA, 48 | TK_PRNT, 49 | TK_SCOL, 50 | TK_QUES, 51 | TK_COLN, 52 | TK_COUNT, 53 | TK_FBEG, 54 | TK_FEND, 55 | }; 56 | 57 | typedef uint8_t tk_t; 58 | 59 | struct token { 60 | const uint8_t *beg, *end; 61 | tk_t tk; 62 | }; 63 | 64 | int lex(const uint8_t *, size_t, struct token **, size_t *); 65 | 66 | enum { 67 | LEX_OK, 68 | LEX_NOMEM, 69 | LEX_UNKNOWN_TOKEN, 70 | }; 71 | -------------------------------------------------------------------------------- /src/main.c: -------------------------------------------------------------------------------- 1 | #include "lex.h" 2 | #include "parse.h" 3 | #include "run.h" 4 | 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | 14 | static void print_tokens(const struct token *const tokens, 15 | const size_t ntokens, const int error) 16 | { 17 | for (size_t i = 0, alternate = 0; i < ntokens; ++i) { 18 | const struct token token = tokens[i]; 19 | 20 | if (token.tk == TK_FBEG || token.tk == TK_FEND) { 21 | continue; 22 | } 23 | 24 | if (token.tk != TK_WSPC && token.tk != TK_LCOM && token.tk != TK_BCOM) { 25 | alternate++; 26 | } 27 | 28 | const int len = token.end - token.beg; 29 | 30 | if (i == ntokens - 1 && error == LEX_UNKNOWN_TOKEN) { 31 | printf(RED("%.*s") CYAN("< Unknown token\n"), len ?: 1, token.beg); 32 | } else if (token.tk == TK_LCOM || token.tk == TK_BCOM) { 33 | printf(GRAY("%.*s"), len, token.beg); 34 | } else if (alternate % 2) { 35 | printf(GREEN("%.*s"), len, token.beg); 36 | } else { 37 | printf(YELLOW("%.*s"), len, token.beg); 38 | } 39 | } 40 | } 41 | 42 | int main(int argc, char **argv) 43 | { 44 | int fd; 45 | size_t size; 46 | struct stat statbuf; 47 | int exit_status = EXIT_FAILURE; 48 | 49 | if (argc != 2) { 50 | return fprintf(stderr, "Usage: %s \n", argv[0]), exit_status; 51 | } 52 | 53 | if ((fd = open(argv[1], O_RDONLY)) < 0) { 54 | return perror("open"), exit_status; 55 | } 56 | 57 | if (fstat(fd, &statbuf) < 0) { 58 | return perror("fstat"), close(fd), exit_status; 59 | } 60 | 61 | if ((size = statbuf.st_size) == 0) { 62 | fprintf(stderr, "‘%s‘: file is empty\n", argv[1]); 63 | return close(fd), exit_status; 64 | } 65 | 66 | const uint8_t *const mapped = mmap(0, size, PROT_READ, MAP_PRIVATE, fd, 0); 67 | 68 | if (mapped == MAP_FAILED) { 69 | return perror("mmap"), close(fd), exit_status; 70 | } 71 | 72 | puts(WHITE("*** Lexing ***")); 73 | struct token *tokens; 74 | size_t ntokens; 75 | const int lex_error = lex(mapped, size, &tokens, &ntokens); 76 | 77 | if (!lex_error || lex_error == LEX_UNKNOWN_TOKEN) { 78 | print_tokens(tokens, ntokens, lex_error); 79 | } else if (lex_error == LEX_NOMEM) { 80 | puts(RED("The lexer could not allocate memory.")); 81 | } 82 | 83 | if (!lex_error) { 84 | puts(WHITE("\n*** Parsing ***")); 85 | const struct node root = parse(tokens, ntokens); 86 | 87 | if (!parse_error(root)) { 88 | puts(WHITE("\n*** Running ***")); 89 | run(&root); 90 | destroy_tree(root); 91 | exit_status = EXIT_SUCCESS; 92 | } 93 | } 94 | 95 | free(tokens); 96 | munmap((uint8_t *const) mapped, size); 97 | close(fd); 98 | return exit_status; 99 | } 100 | -------------------------------------------------------------------------------- /src/parse.c: -------------------------------------------------------------------------------- 1 | #include "parse.h" 2 | #include "lex.h" 3 | 4 | #include 5 | #include 6 | #include 7 | 8 | #define RULE_RHS_LAST 7 9 | #define GRAMMAR_SIZE (sizeof(grammar) / sizeof(*grammar)) 10 | #define SKIP_TOKEN(t) ((t) == TK_WSPC || (t) == TK_LCOM || (t) == TK_BCOM) 11 | 12 | #define n(_nt) { .nt = NT_##_nt, .is_tk = 0, .is_mt = 0 } 13 | #define m(_nt) { .nt = NT_##_nt, .is_tk = 0, .is_mt = 1 } 14 | #define t(_tm) { .tk = TK_##_tm, .is_tk = 1, .is_mt = 0 } 15 | #define no { .tk = TK_COUNT, .is_tk = 1, .is_mt = 0 } 16 | 17 | #define r1(_lhs, t1) \ 18 | { .lhs = NT_##_lhs, .rhs = { no, no, no, no, no, no, no, t1, } }, 19 | #define r2(_lhs, t1, t2) \ 20 | { .lhs = NT_##_lhs, .rhs = { no, no, no, no, no, no, t1, t2, } }, 21 | #define r3(_lhs, t1, t2, t3) \ 22 | { .lhs = NT_##_lhs, .rhs = { no, no, no, no, no, t1, t2, t3, } }, 23 | #define r4(_lhs, t1, t2, t3, t4) \ 24 | { .lhs = NT_##_lhs, .rhs = { no, no, no, no, t1, t2, t3, t4, } }, 25 | #define r5(_lhs, t1, t2, t3, t4, t5) \ 26 | { .lhs = NT_##_lhs, .rhs = { no, no, no, t1, t2, t3, t4, t5, } }, 27 | #define r6(_lhs, t1, t2, t3, t4, t5, t6) \ 28 | { .lhs = NT_##_lhs, .rhs = { no, no, t1, t2, t3, t4, t5, t6, } }, 29 | #define r7(_lhs, t1, t2, t3, t4, t5, t6, t7) \ 30 | { .lhs = NT_##_lhs, .rhs = { no, t1, t2, t3, t4, t5, t6, t7, } }, 31 | 32 | static const struct rule { 33 | /* left-hand side of production */ 34 | const nt_t lhs; 35 | 36 | /* array of RULE_RHS_LAST + 1 terms which form the right-hand side */ 37 | const struct term { 38 | /* a rule RHS term is either a terminal token or a non-terminal */ 39 | union { 40 | const tk_t tk; 41 | const nt_t nt; 42 | }; 43 | 44 | /* indicates which field of the above union to use */ 45 | const uint8_t is_tk: 1; 46 | 47 | /* indicates that the non-terminal can be matched multiple times */ 48 | const uint8_t is_mt: 1; 49 | } rhs[RULE_RHS_LAST + 1]; 50 | } grammar[] = { 51 | r3(Unit, t(FBEG), m(Stmt), t(FEND) ) 52 | 53 | r1(Stmt, n(Assn) ) 54 | r1(Stmt, n(Prnt) ) 55 | r1(Stmt, n(Ctrl) ) 56 | 57 | r4(Assn, t(NAME), t(ASSN), n(Expr), t(SCOL) ) 58 | r4(Assn, n(Aexp), t(ASSN), n(Expr), t(SCOL) ) 59 | 60 | r3(Prnt, t(PRNT), n(Expr), t(SCOL) ) 61 | r4(Prnt, t(PRNT), t(STRL), n(Expr), t(SCOL) ) 62 | 63 | r2(Ctrl, n(Cond), m(Elif) ) 64 | r3(Ctrl, n(Cond), m(Elif), n(Else) ) 65 | r1(Ctrl, n(Dowh) ) 66 | r1(Ctrl, n(Whil) ) 67 | 68 | r5(Cond, t(COND), n(Expr), t(LBRC), m(Stmt), t(RBRC) ) 69 | r5(Elif, t(ELIF), n(Expr), t(LBRC), m(Stmt), t(RBRC) ) 70 | r4(Else, t(ELSE), t(LBRC), m(Stmt), t(RBRC) ) 71 | 72 | r7(Dowh, t(DOWH), t(LBRC), m(Stmt), t(RBRC), t(WHIL), n(Expr), t(SCOL) ) 73 | r5(Whil, t(WHIL), n(Expr), t(LBRC), m(Stmt), t(RBRC) ) 74 | 75 | r1(Atom, t(NAME) ) 76 | r1(Atom, t(NMBR) ) 77 | 78 | r1(Expr, n(Atom) ) 79 | r1(Expr, n(Pexp) ) 80 | r1(Expr, n(Bexp) ) 81 | r1(Expr, n(Uexp) ) 82 | r1(Expr, n(Texp) ) 83 | r1(Expr, n(Aexp) ) 84 | 85 | r3(Pexp, t(LPAR), n(Expr), t(RPAR) ) 86 | 87 | r3(Bexp, n(Expr), t(EQUL), n(Expr) ) 88 | r3(Bexp, n(Expr), t(NEQL), n(Expr) ) 89 | r3(Bexp, n(Expr), t(LTHN), n(Expr) ) 90 | r3(Bexp, n(Expr), t(GTHN), n(Expr) ) 91 | r3(Bexp, n(Expr), t(LTEQ), n(Expr) ) 92 | r3(Bexp, n(Expr), t(GTEQ), n(Expr) ) 93 | r3(Bexp, n(Expr), t(CONJ), n(Expr) ) 94 | r3(Bexp, n(Expr), t(DISJ), n(Expr) ) 95 | r3(Bexp, n(Expr), t(PLUS), n(Expr) ) 96 | r3(Bexp, n(Expr), t(MINS), n(Expr) ) 97 | r3(Bexp, n(Expr), t(MULT), n(Expr) ) 98 | r3(Bexp, n(Expr), t(DIVI), n(Expr) ) 99 | r3(Bexp, n(Expr), t(MODU), n(Expr) ) 100 | 101 | r2(Uexp, t(PLUS), n(Expr) ) 102 | r2(Uexp, t(MINS), n(Expr) ) 103 | r2(Uexp, t(NEGA), n(Expr) ) 104 | 105 | r5(Texp, n(Expr), t(QUES), n(Expr), t(COLN), n(Expr) ) 106 | 107 | r4(Aexp, t(NAME), t(LBRA), n(Expr), t(RBRA) ) 108 | }; 109 | 110 | #undef r1 111 | #undef r2 112 | #undef r3 113 | #undef r4 114 | #undef r5 115 | #undef r6 116 | #undef r7 117 | 118 | #undef n 119 | #undef m 120 | #undef t 121 | #undef no 122 | 123 | static const uint8_t precedence[TK_MODU - TK_EQUL + 1] = { 124 | 4, 4, 3, 3, 3, 3, 5, 6, 2, 2, 1, 1, 1, 125 | }; 126 | 127 | static struct { 128 | size_t size, allocated; 129 | struct node *nodes; 130 | } stack; 131 | 132 | static void print_stack(void) 133 | { 134 | static const char *const nts[NT_COUNT] = { 135 | "Unit", 136 | "Stmt", 137 | "Assn", 138 | "Prnt", 139 | "Ctrl", 140 | "Cond", 141 | "Elif", 142 | "Else", 143 | "Dowh", 144 | "Whil", 145 | "Atom", 146 | "Expr", 147 | "Pexp", 148 | "Bexp", 149 | "Uexp", 150 | "Texp", 151 | "Aexp", 152 | }; 153 | 154 | for (size_t i = 0; i < stack.size; ++i) { 155 | const struct node *const node = &stack.nodes[i]; 156 | 157 | if (node->nchildren) { 158 | printf(YELLOW("%s "), nts[node->nt]); 159 | } else if (node->token->tk == TK_FBEG) { 160 | printf(GREEN("^ ")); 161 | } else if (node->token->tk == TK_FEND) { 162 | printf(GREEN("$ ")); 163 | } else { 164 | const ptrdiff_t len = node->token->end - node->token->beg; 165 | printf(GREEN("%.*s "), (int) len, node->token->beg); 166 | } 167 | } 168 | 169 | puts(""); 170 | } 171 | 172 | static void destroy_node(const struct node *const node) 173 | { 174 | if (node->nchildren) { 175 | for (size_t child_idx = 0; child_idx < node->nchildren; ++child_idx) { 176 | destroy_node(node->children[child_idx]); 177 | } 178 | 179 | free(node->children[0]); 180 | free(node->children); 181 | } 182 | } 183 | 184 | static void deallocate_stack(void) 185 | { 186 | free(stack.nodes); 187 | stack.nodes = NULL; 188 | stack.size = 0; 189 | stack.allocated = 0; 190 | } 191 | 192 | static void destroy_stack(void) 193 | { 194 | for (size_t node_idx = 0; node_idx < stack.size; ++node_idx) { 195 | destroy_node(&stack.nodes[node_idx]); 196 | } 197 | 198 | deallocate_stack(); 199 | } 200 | 201 | static inline int term_eq_node( 202 | const struct term *const term, 203 | const struct node *const node) 204 | { 205 | const int node_is_leaf = node->nchildren == 0; 206 | 207 | if (term->is_tk == node_is_leaf) { 208 | if (node_is_leaf) { 209 | return term->tk == node->token->tk; 210 | } else { 211 | return term->nt == node->nt; 212 | } 213 | } 214 | 215 | return 0; 216 | } 217 | 218 | static size_t match_rule(const struct rule *const rule, size_t *const at) 219 | { 220 | const struct term *prev = NULL; 221 | const struct term *term = &rule->rhs[RULE_RHS_LAST]; 222 | ssize_t st_idx = stack.size - 1; 223 | 224 | do { 225 | if (term_eq_node(term, &stack.nodes[st_idx])) { 226 | prev = term->is_mt ? term : NULL; 227 | --term, --st_idx; 228 | } else if (prev && term_eq_node(prev, &stack.nodes[st_idx])) { 229 | --st_idx; 230 | } else if (term->is_mt) { 231 | prev = NULL; 232 | --term; 233 | } else { 234 | term = NULL; 235 | break; 236 | } 237 | } while (st_idx >= 0 && !(term->is_tk && term->tk == TK_COUNT)); 238 | 239 | const int reached_eor = term && term->is_tk && term->tk == TK_COUNT; 240 | const size_t reduction_size = stack.size - st_idx - 1; 241 | 242 | return reached_eor && reduction_size ? 243 | (*at = st_idx + 1, reduction_size) : 0; 244 | } 245 | 246 | static inline int shift(const struct token *const token) 247 | { 248 | if (stack.size >= stack.allocated) { 249 | stack.allocated = (stack.allocated ?: 1) * 8; 250 | 251 | struct node *const tmp = realloc(stack.nodes, 252 | stack.allocated * sizeof(struct node)); 253 | 254 | if (!tmp) { 255 | return PARSE_NOMEM; 256 | } 257 | 258 | stack.nodes = tmp; 259 | } 260 | 261 | stack.nodes[stack.size++] = (struct node) { 262 | .nchildren = 0, 263 | .token = token, 264 | }; 265 | 266 | return PARSE_OK; 267 | } 268 | 269 | static inline bool should_shift_pre( 270 | const struct rule *const rule, 271 | const struct token *const tokens, 272 | size_t *const token_idx) 273 | { 274 | if (rule->lhs == NT_Unit) { 275 | return false; 276 | } 277 | 278 | while (SKIP_TOKEN(tokens[*token_idx].tk)) { 279 | ++*token_idx; 280 | } 281 | 282 | const struct token *const ahead = &tokens[*token_idx]; 283 | 284 | if (rule->lhs == NT_Bexp && ahead->tk >= TK_EQUL && ahead->tk <= TK_MODU) { 285 | /* 286 | Check whether the operator ahead has a lower precedence. If it has, 287 | let the parser shift it before applying the Bexp reduction. 288 | */ 289 | const uint8_t p1 = precedence[rule->rhs[RULE_RHS_LAST - 1].tk - TK_EQUL]; 290 | const uint8_t p2 = precedence[ahead->tk - TK_EQUL]; 291 | 292 | if (p2 < p1) { 293 | return true; 294 | } 295 | } else if (rule->lhs == NT_Atom && rule->rhs[RULE_RHS_LAST].tk == TK_NAME) { 296 | /* 297 | Do not allow the left side of an assignment or an array name to 298 | escalate to Expr. 299 | */ 300 | if (ahead->tk == TK_ASSN || ahead->tk == TK_LBRA) { 301 | return true; 302 | } 303 | } else if (rule->lhs == NT_Expr && rule->rhs[RULE_RHS_LAST].nt == NT_Aexp) { 304 | /* 305 | Do not allow an Aexp on the left side of an assignment to escalate 306 | to Expr. 307 | */ 308 | if (ahead->tk == TK_ASSN) { 309 | return true; 310 | } 311 | } 312 | 313 | return false; 314 | } 315 | 316 | static inline bool should_shift_post( 317 | const struct rule *const rule, 318 | const struct token *const tokens, 319 | size_t *const token_idx) 320 | { 321 | if (rule->lhs == NT_Unit) { 322 | return false; 323 | } 324 | 325 | while (SKIP_TOKEN(tokens[*token_idx].tk)) { 326 | ++*token_idx; 327 | } 328 | 329 | const struct token *const ahead = &tokens[*token_idx]; 330 | 331 | if (rule->lhs == NT_Cond || rule->lhs == NT_Elif) { 332 | /* swallow the next "elif" or "else" in order to parse the whole chain */ 333 | if (ahead->tk == TK_ELIF || ahead->tk == TK_ELSE) { 334 | return true; 335 | } 336 | } 337 | 338 | return false; 339 | } 340 | 341 | static int reduce(const struct rule *const rule, 342 | const size_t at, const size_t size) 343 | { 344 | struct node *const child_nodes = malloc(size * sizeof(struct node)); 345 | 346 | if (!child_nodes) { 347 | return PARSE_NOMEM; 348 | } 349 | 350 | struct node *const reduce_at = &stack.nodes[at]; 351 | struct node **const old_children = reduce_at->children; 352 | reduce_at->children = malloc(size * sizeof(struct node *)) ?: old_children; 353 | 354 | if (reduce_at->children == old_children) { 355 | return free(child_nodes), PARSE_NOMEM; 356 | } 357 | 358 | for (size_t child_idx = 0, st_idx = at; 359 | st_idx < stack.size; 360 | ++st_idx, ++child_idx) { 361 | 362 | child_nodes[child_idx] = stack.nodes[st_idx]; 363 | reduce_at->children[child_idx] = &child_nodes[child_idx]; 364 | } 365 | 366 | child_nodes[0].children = old_children; 367 | reduce_at->nchildren = size; 368 | reduce_at->nt = rule->lhs; 369 | stack.size = at + 1; 370 | return PARSE_OK; 371 | } 372 | 373 | struct node parse(const struct token *const tokens, const size_t ntokens) 374 | { 375 | static const struct token 376 | reject = { .tk = PARSE_REJECT }, 377 | nomem = { .tk = PARSE_NOMEM }; 378 | 379 | static const struct node 380 | err_reject = { .nchildren = 0, .token = &reject }, 381 | err_nomem = { .nchildren = 0, .token = &nomem }; 382 | 383 | #define SHIFT_OR_NOMEM(t) \ 384 | if (shift(t)) { \ 385 | puts(RED("Out of memory on shift!")); \ 386 | return destroy_stack(), err_nomem; \ 387 | } 388 | 389 | #define REDUCE_OR_NOMEM(r, a, s) \ 390 | if (reduce(r, a, s)) { \ 391 | puts(RED("Out of memory on reduce!")); \ 392 | return destroy_stack(), err_nomem; \ 393 | } 394 | 395 | for (size_t token_idx = 0; token_idx < ntokens; ) { 396 | if (SKIP_TOKEN(tokens[token_idx].tk)) { 397 | ++token_idx; 398 | continue; 399 | } 400 | 401 | SHIFT_OR_NOMEM(&tokens[token_idx++]); 402 | printf(CYAN("Shift: ")), print_stack(); 403 | 404 | try_reduce_again:; 405 | const struct rule *rule = grammar; 406 | 407 | do { 408 | size_t reduction_at, reduction_size; 409 | 410 | if ((reduction_size = match_rule(rule, &reduction_at))) { 411 | const bool do_shift = should_shift_pre(rule, tokens, &token_idx); 412 | 413 | if (!do_shift) { 414 | REDUCE_OR_NOMEM(rule, reduction_at, reduction_size); 415 | const ptrdiff_t rule_number = rule - grammar + 1; 416 | printf(ORANGE("Red%02td: "), rule_number), print_stack(); 417 | } 418 | 419 | if (do_shift || should_shift_post(rule, tokens, &token_idx)) { 420 | SHIFT_OR_NOMEM(&tokens[token_idx++]); 421 | printf(CYAN("Shift: ")), print_stack(); 422 | } 423 | 424 | goto try_reduce_again; 425 | } 426 | } while (++rule != grammar + GRAMMAR_SIZE); 427 | } 428 | 429 | #undef SHIFT_OR_NOMEM 430 | #undef REDUCE_OR_NOMEM 431 | 432 | const int accepted = stack.size == 1 && 433 | stack.nodes[0].nchildren && stack.nodes[0].nt == NT_Unit; 434 | 435 | printf(accepted ? GREEN("ACCEPT ") : RED("REJECT ")), print_stack(); 436 | 437 | if (accepted) { 438 | const struct node ret = stack.nodes[0]; 439 | return deallocate_stack(), ret; 440 | } else { 441 | return destroy_stack(), err_reject; 442 | } 443 | } 444 | 445 | void destroy_tree(const struct node root) 446 | { 447 | destroy_node(&root); 448 | } 449 | -------------------------------------------------------------------------------- /src/parse.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | #include 4 | #include 5 | 6 | enum { 7 | NT_Unit, 8 | NT_Stmt, 9 | NT_Assn, 10 | NT_Prnt, 11 | NT_Ctrl, 12 | NT_Cond, 13 | NT_Elif, 14 | NT_Else, 15 | NT_Dowh, 16 | NT_Whil, 17 | NT_Atom, 18 | NT_Expr, 19 | NT_Pexp, 20 | NT_Bexp, 21 | NT_Uexp, 22 | NT_Texp, 23 | NT_Aexp, 24 | NT_COUNT 25 | }; 26 | 27 | typedef uint8_t nt_t; 28 | 29 | struct token; 30 | struct node { 31 | /* use "token" if nchildren == 0, "nt" and "children" otherwise */ 32 | uint32_t nchildren; 33 | 34 | union { 35 | const struct token *token; 36 | 37 | struct { 38 | nt_t nt; 39 | struct node **children; 40 | }; 41 | }; 42 | }; 43 | 44 | struct node parse(const struct token *, size_t); 45 | 46 | enum { 47 | PARSE_OK, 48 | PARSE_REJECT, 49 | PARSE_NOMEM, 50 | }; 51 | 52 | #define parse_error(root) ({ \ 53 | struct node root_once = (root); \ 54 | root_once.nchildren ? PARSE_OK : root_once.token->tk; \ 55 | }) 56 | 57 | void destroy_tree(struct node); 58 | -------------------------------------------------------------------------------- /src/run.c: -------------------------------------------------------------------------------- 1 | #include "lex.h" 2 | #include "parse.h" 3 | 4 | #include 5 | #include 6 | #include 7 | 8 | static void run_stmt(const struct node *const); 9 | static void run_assn(const struct node *const); 10 | static void run_prnt(const struct node *const); 11 | static void run_ctrl(const struct node *const); 12 | static int eval_atom(const struct node *const); 13 | static int eval_expr(const struct node *const); 14 | static int eval_pexp(const struct node *const); 15 | static int eval_bexp(const struct node *const); 16 | static int eval_uexp(const struct node *const); 17 | static int eval_texp(const struct node *const); 18 | static int eval_aexp(const struct node *const); 19 | 20 | #define VARSTORE_CAPACITY 128 21 | 22 | static struct { 23 | size_t size; 24 | 25 | struct { 26 | const uint8_t *beg; 27 | ptrdiff_t len; 28 | size_t array_size; 29 | int *values; 30 | } vars[VARSTORE_CAPACITY]; 31 | } varstore; 32 | 33 | void run(const struct node *const unit) 34 | { 35 | for (size_t stmt_idx = 1; stmt_idx < unit->nchildren - 1; ++stmt_idx) { 36 | run_stmt(unit->children[stmt_idx]); 37 | } 38 | 39 | for (size_t var_idx = 0; var_idx < varstore.size; ++var_idx) { 40 | free(varstore.vars[var_idx].values); 41 | } 42 | 43 | varstore.size = 0; 44 | } 45 | 46 | static void run_stmt(const struct node *const stmt) 47 | { 48 | switch (stmt->children[0]->nt) { 49 | case NT_Assn: 50 | run_assn(stmt->children[0]); 51 | break; 52 | 53 | case NT_Prnt: 54 | run_prnt(stmt->children[0]); 55 | break; 56 | 57 | case NT_Ctrl: 58 | run_ctrl(stmt->children[0]); 59 | break; 60 | 61 | default: 62 | abort(); 63 | } 64 | } 65 | 66 | static void run_assn(const struct node *const assn) 67 | { 68 | const int lhs_is_aexp = assn->children[0]->nchildren; 69 | 70 | const int array_idx = lhs_is_aexp ? 71 | eval_expr(assn->children[0]->children[2]) : 0; 72 | 73 | const uint8_t *const beg = lhs_is_aexp ? 74 | assn->children[0]->children[0]->token->beg : 75 | assn->children[0]->token->beg; 76 | 77 | const ptrdiff_t len = lhs_is_aexp ? 78 | assn->children[0]->children[0]->token->end - beg : 79 | assn->children[0]->token->end - beg; 80 | 81 | size_t var_idx; 82 | 83 | for (var_idx = 0; var_idx < varstore.size; ++var_idx) { 84 | if (varstore.vars[var_idx].len == len && 85 | !memcmp(varstore.vars[var_idx].beg, beg, len)) { 86 | 87 | const size_t array_size = varstore.vars[var_idx].array_size; 88 | 89 | if (!array_size) { 90 | fprintf(stderr, "warn: a previous reallocation has failed, " 91 | "assignment has no effect\n"); 92 | 93 | return; 94 | } 95 | 96 | if (array_idx >= 0 && array_idx < array_size) { 97 | varstore.vars[var_idx].values[array_idx] = 98 | eval_expr(assn->children[2]); 99 | 100 | return; 101 | } else if (array_idx >= 0) { 102 | const size_t new_size = (array_idx + 1) * 2; 103 | 104 | int *const tmp = realloc( 105 | varstore.vars[var_idx].values, new_size * sizeof(int)); 106 | 107 | if (!tmp) { 108 | free(varstore.vars[var_idx].values); 109 | varstore.vars[var_idx].array_size = 0; 110 | varstore.vars[var_idx].values = NULL; 111 | perror("realloc"); 112 | return; 113 | } 114 | 115 | varstore.vars[var_idx].values = tmp; 116 | varstore.vars[var_idx].array_size = new_size; 117 | varstore.vars[var_idx].values[array_idx] = 118 | eval_expr(assn->children[2]); 119 | 120 | return; 121 | } else { 122 | fprintf(stderr, "warn: negative array offset\n"); 123 | return; 124 | } 125 | } 126 | } 127 | 128 | if (var_idx < VARSTORE_CAPACITY) { 129 | if (array_idx < 0) { 130 | fprintf(stderr, "warn: negative array offset\n"); 131 | return; 132 | } 133 | 134 | varstore.vars[var_idx].beg = beg; 135 | varstore.vars[var_idx].len = len; 136 | varstore.vars[var_idx].values = malloc((array_idx + 1) * sizeof(int)); 137 | varstore.vars[var_idx].array_size = 0; 138 | 139 | if (!varstore.vars[var_idx].values) { 140 | perror("malloc"); 141 | return; 142 | } 143 | 144 | varstore.vars[var_idx].array_size = array_idx + 1; 145 | varstore.vars[var_idx].values[array_idx] = eval_expr(assn->children[2]); 146 | varstore.size++; 147 | } else { 148 | fprintf(stderr, "warn: varstore exhausted, assignment has no effect\n"); 149 | } 150 | } 151 | 152 | static void run_prnt(const struct node *const prnt) 153 | { 154 | if (prnt->nchildren == 3) { 155 | printf("%d\n", eval_expr(prnt->children[1])); 156 | } else if (prnt->nchildren == 4) { 157 | const struct node *const strl = prnt->children[1]; 158 | 159 | const uint8_t *const beg = strl->token->beg + 1; 160 | const uint8_t *const end = strl->token->end - 1; 161 | const ptrdiff_t len = end - beg; 162 | 163 | printf("%.*s%d\n", (int) len, beg, eval_expr(prnt->children[2])); 164 | } 165 | } 166 | 167 | static void run_ctrl(const struct node *const ctrl) 168 | { 169 | switch (ctrl->children[0]->nt) { 170 | case NT_Cond: { 171 | const struct node *const cond = ctrl->children[0]; 172 | 173 | if (eval_expr(cond->children[1])) { 174 | const struct node *stmt = cond->children[3]; 175 | 176 | while (stmt->nchildren) { 177 | run_stmt(stmt++); 178 | } 179 | } else if (ctrl->nchildren >= 2) { 180 | size_t child_idx = 1; 181 | 182 | do { 183 | if (ctrl->children[child_idx]->nt == NT_Elif) { 184 | const struct node *const elif = ctrl->children[child_idx]; 185 | 186 | if (eval_expr(elif->children[1])) { 187 | const struct node *stmt = elif->children[3]; 188 | 189 | while (stmt->nchildren) { 190 | run_stmt(stmt++); 191 | } 192 | 193 | break; 194 | } 195 | } else { 196 | const struct node *const els = ctrl->children[child_idx]; 197 | const struct node *stmt = els->children[2]; 198 | 199 | while (stmt->nchildren) { 200 | run_stmt(stmt++); 201 | } 202 | } 203 | } while (++child_idx < ctrl->nchildren); 204 | } 205 | } break; 206 | 207 | case NT_Dowh: { 208 | const struct node *const dowh = ctrl->children[0]; 209 | const struct node *const expr = dowh->children[dowh->nchildren - 2]; 210 | 211 | do { 212 | const struct node *stmt = dowh->children[2]; 213 | 214 | while (stmt->nchildren) { 215 | run_stmt(stmt++); 216 | } 217 | } while (eval_expr(expr)); 218 | } break; 219 | 220 | case NT_Whil: { 221 | const struct node *const whil = ctrl->children[0]; 222 | 223 | while (eval_expr(whil->children[1])) { 224 | const struct node *stmt = whil->children[3]; 225 | 226 | while (stmt->nchildren) { 227 | run_stmt(stmt++); 228 | } 229 | } 230 | } break; 231 | 232 | default: 233 | abort(); 234 | } 235 | } 236 | 237 | static int eval_atom(const struct node *const atom) 238 | { 239 | switch (atom->children[0]->token->tk) { 240 | case TK_NAME: { 241 | const uint8_t *const beg = atom->children[0]->token->beg; 242 | const ptrdiff_t len = atom->children[0]->token->end - beg; 243 | 244 | for (size_t idx = 0; idx < varstore.size; ++idx) { 245 | if (varstore.vars[idx].len == len && 246 | !memcmp(varstore.vars[idx].beg, beg, len)) { 247 | 248 | if (varstore.vars[idx].array_size) { 249 | return varstore.vars[idx].values[0]; 250 | } else { 251 | return 0; 252 | } 253 | } 254 | } 255 | 256 | return fprintf(stderr, "warn: access to undefined variable\n"), 0; 257 | } 258 | 259 | case TK_NMBR: { 260 | const uint8_t *const beg = atom->children[0]->token->beg; 261 | const uint8_t *const end = atom->children[0]->token->end; 262 | int result = 0, mult = 1; 263 | 264 | for (ssize_t idx = end - beg - 1; idx >= 0; --idx, mult *= 10) { 265 | result += mult * (beg[idx] - '0'); 266 | } 267 | 268 | return result; 269 | } 270 | 271 | default: 272 | abort(); 273 | } 274 | } 275 | 276 | static int eval_expr(const struct node *const expr) 277 | { 278 | switch (expr->children[0]->nt) { 279 | case NT_Atom: 280 | return eval_atom(expr->children[0]); 281 | 282 | case NT_Pexp: 283 | return eval_pexp(expr->children[0]); 284 | 285 | case NT_Bexp: 286 | return eval_bexp(expr->children[0]); 287 | 288 | case NT_Uexp: 289 | return eval_uexp(expr->children[0]); 290 | 291 | case NT_Texp: 292 | return eval_texp(expr->children[0]); 293 | 294 | case NT_Aexp: 295 | return eval_aexp(expr->children[0]); 296 | 297 | default: 298 | abort(); 299 | } 300 | } 301 | 302 | static int eval_pexp(const struct node *const pexp) 303 | { 304 | return eval_expr(pexp->children[1]); 305 | } 306 | 307 | static int eval_bexp(const struct node *const bexp) 308 | { 309 | switch (bexp->children[1]->token->tk) { 310 | case TK_PLUS: 311 | return eval_expr(bexp->children[0]) + eval_expr(bexp->children[2]); 312 | 313 | case TK_MINS: 314 | return eval_expr(bexp->children[0]) - eval_expr(bexp->children[2]); 315 | 316 | case TK_MULT: 317 | return eval_expr(bexp->children[0]) * eval_expr(bexp->children[2]); 318 | 319 | case TK_DIVI: { 320 | const int dividend = eval_expr(bexp->children[0]); 321 | const int divisor = eval_expr(bexp->children[2]); 322 | 323 | if (divisor) { 324 | return dividend / divisor; 325 | } else { 326 | fprintf(stderr, "warn: prevented attempt to divide by zero\n"); 327 | return 0; 328 | } 329 | } 330 | 331 | case TK_MODU: 332 | return eval_expr(bexp->children[0]) % eval_expr(bexp->children[2]); 333 | 334 | case TK_EQUL: 335 | return eval_expr(bexp->children[0]) == eval_expr(bexp->children[2]); 336 | 337 | case TK_NEQL: 338 | return eval_expr(bexp->children[0]) != eval_expr(bexp->children[2]); 339 | 340 | case TK_LTHN: 341 | return eval_expr(bexp->children[0]) < eval_expr(bexp->children[2]); 342 | 343 | case TK_GTHN: 344 | return eval_expr(bexp->children[0]) > eval_expr(bexp->children[2]); 345 | 346 | case TK_LTEQ: 347 | return eval_expr(bexp->children[0]) <= eval_expr(bexp->children[2]); 348 | 349 | case TK_GTEQ: 350 | return eval_expr(bexp->children[0]) >= eval_expr(bexp->children[2]); 351 | 352 | case TK_CONJ: 353 | return eval_expr(bexp->children[0]) && eval_expr(bexp->children[2]); 354 | 355 | case TK_DISJ: 356 | return eval_expr(bexp->children[0]) || eval_expr(bexp->children[2]); 357 | 358 | default: 359 | abort(); 360 | } 361 | } 362 | 363 | static int eval_uexp(const struct node *const uexp) 364 | { 365 | switch (uexp->children[0]->token->tk) { 366 | case TK_PLUS: 367 | return eval_expr(uexp->children[1]); 368 | 369 | case TK_MINS: 370 | return -eval_expr(uexp->children[1]); 371 | 372 | case TK_NEGA: 373 | return !eval_expr(uexp->children[1]); 374 | 375 | default: 376 | abort(); 377 | } 378 | } 379 | 380 | static int eval_texp(const struct node *const texp) 381 | { 382 | return eval_expr(texp->children[0]) ? 383 | eval_expr(texp->children[2]) : eval_expr(texp->children[4]); 384 | } 385 | 386 | static int eval_aexp(const struct node *const aexp) 387 | { 388 | const uint8_t *const beg = aexp->children[0]->token->beg; 389 | const ptrdiff_t len = aexp->children[0]->token->end - beg; 390 | const int array_idx = eval_expr(aexp->children[2]); 391 | 392 | if (array_idx < 0) { 393 | return fprintf(stderr, "warn: negative array offset\n"), 0; 394 | } 395 | 396 | for (size_t idx = 0; idx < varstore.size; ++idx) { 397 | if (varstore.vars[idx].len == len && 398 | !memcmp(varstore.vars[idx].beg, beg, len)) { 399 | 400 | if (array_idx < varstore.vars[idx].array_size) { 401 | return varstore.vars[idx].values[array_idx]; 402 | } else { 403 | return fprintf(stderr, "warn: out of bounds array access\n"), 0; 404 | } 405 | } 406 | } 407 | 408 | return fprintf(stderr, "warn: access to undefined array\n"), 0; 409 | } 410 | -------------------------------------------------------------------------------- /src/run.h: -------------------------------------------------------------------------------- 1 | #pragma once 2 | 3 | struct node; 4 | 5 | void run(const struct node *); 6 | -------------------------------------------------------------------------------- /tests/4096.txt: -------------------------------------------------------------------------------- 1 | /*aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 2 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 3 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 4 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 5 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 6 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 7 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 8 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 9 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 10 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 11 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 12 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 13 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 14 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 15 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 16 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 17 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 18 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 19 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 20 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 21 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 22 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 23 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 24 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 25 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 26 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 27 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 28 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 29 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 30 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 31 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa 32 | aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa*/ 33 | -------------------------------------------------------------------------------- /tests/arraysum.txt: -------------------------------------------------------------------------------- 1 | i = 0; 2 | iterations = 20; 3 | 4 | while (i < iterations) { 5 | sums[i] = i + i; 6 | products[i] = i * i; 7 | sums_and_products[i] = sums[i] + products[i]; 8 | i = i + 1; 9 | } 10 | 11 | i = 0; 12 | 13 | while (i < iterations) { 14 | print "Row: " i; 15 | print " Sum: " sums[i]; 16 | print " Product: " products[i]; 17 | print " Sum + Product: " sums_and_products[i]; 18 | i = i + 1; 19 | } 20 | -------------------------------------------------------------------------------- /tests/deep.txt: -------------------------------------------------------------------------------- 1 | print ---------------------------------+++5; 2 | print ((((((((3 + 2) * ((((((2))))))))))))); 3 | 4 | if (1) { if (1) { if (1) { if (1) { if (1) { print 15; } } } } } 5 | -------------------------------------------------------------------------------- /tests/fizzbuzz.txt: -------------------------------------------------------------------------------- 1 | number = 1; 2 | 3 | do { 4 | if (number % 3 == 0 && number % 5 == 0) { 5 | print "FizzBuzz " number; 6 | } elif (number % 5 == 0) { 7 | print "Fizz " number; 8 | } elif (number % 3 == 0) { 9 | print "Buzz " number; 10 | } else { 11 | print number; 12 | } 13 | 14 | number = number + 1; 15 | } while (number <= 100); 16 | --------------------------------------------------------------------------------