├── BUGS ├── DIALECT ├── LICENSE ├── README ├── a.out.h ├── makefile ├── nas ├── input.c ├── insn.c ├── makefile ├── name.c ├── nas.c ├── nas.h ├── output.c └── pseudo.c ├── ncc.c ├── ncc1 ├── block.c ├── block.h ├── decl.c ├── gen.c ├── init.c ├── lex.c ├── makefile ├── ncc1.c ├── ncc1.h ├── opt.c ├── output.c ├── reg.c ├── reg.h ├── stmt.c ├── symbol.c ├── symbol.h ├── token.h ├── tree.c ├── tree.h ├── type.c └── type.h ├── ncpp ├── EXAMPLE3 ├── EXAMPLE4 ├── directive.c ├── input.c ├── macro.c ├── makefile ├── ncpp.c ├── ncpp.h ├── token.c └── vstring.c ├── nld.c ├── nobj.c └── obj.h /BUGS: -------------------------------------------------------------------------------- 1 | There is only one compiler bug that I'm aware of and haven't fixed: a function 2 | pointer must be dereferenced implicitly. That is, 3 | 4 | int (*f)(); 5 | 6 | f(); /* works as expected */ 7 | (*f)(); /* error */ 8 | 9 | The logic in generate_call() (ncc1/gen.c) botches the job. 10 | Easy enough to fix, just have to get around to it. 11 | 12 | Charles Youse 13 | December 26, 2018 14 | 15 | -------------------------------------------------------------------------------- /DIALECT: -------------------------------------------------------------------------------- 1 | The compiler accepts "classic" C as described in K&R (1978), with 2 | a few changes as detailed below. 3 | 4 | ========== 5 | EXTENSIONS 6 | ========== 7 | 8 | Most of these were common extensions that made it into the ANSI standard. 9 | 10 | 11 | * 'unsigned' is extended to all integer types. 12 | 13 | K&R only mentions 'unsigned int'. NCC extends this qualifier to all 14 | integral types, including 'char', which is signed by default. 15 | 16 | 17 | * Separate member namespaces. 18 | 19 | 'struct' and 'union' types each enclose a unique namespace, so members of 20 | different aggregate types need not have different names. 21 | 22 | 23 | * Separate tag namespaces. 24 | 25 | 'struct foo' and 'union foo' refer to different types, even in the same scope. 26 | There's really no reason to have these namespaces overlap, and I suspect it was 27 | merely an artifact of dmr's original C compiler that they do (the same could 28 | be said about shared member namespaces). Shared tag namespaces are a nuisance 29 | to enforce, and so the compiler doesn't bother. 30 | 31 | 32 | * Arbitrarily-typed bitfields. 33 | 34 | The compiler allows bit fields to have any integral type: the size of the type 35 | limits the number of bits that are permitted and sets the alignment requirement 36 | of the word that contains field. Also, the signedness of the field is honored. 37 | 38 | 39 | ======= 40 | CHANGES 41 | ======= 42 | 43 | These are intentional changes to the syntax and semantics of K&R C. These 44 | are fairly minor but make the language more consistent. (ANSI attempts to 45 | address some of the issues addressed here, but they were more constrained 46 | by backwards-compatibility concerns.) 47 | 48 | 49 | * Floating-point types 50 | 51 | K&R basically treats 'float' as a kind of short 'double': 'float' types are 52 | promoted to 'double' before just about any operation, in the same way that a 53 | 'short' is promoted to 'int'. This behavior is suboptimal on AMD64. 54 | 55 | NCC doesn't have a 'double' type; its double-precision type is instead called 56 | 'long float', and its relationship to 'float' is the same as that between 57 | 'long' and 'int'. A 'float' is coerced to a 'long float' only during the 58 | "usual conversions": when one of the operands to a binary operator is 'long 59 | float', the other is promoted to 'long float'. Note that this means that 60 | function arguments and return values of type 'float' are distinct from their 61 | 'long float' analogs. 62 | 63 | All floating-point constants are assumed to be 'float' unless suffixed with 64 | 'L' or 'l', in which case they are 'long float'. 65 | 66 | * Integer constants 67 | 68 | 'long' integer constants must be explicitly suffixed with 'L' or 'l'. This 69 | is for parity with the floating-point types and to avoid inadvertent long 70 | constants. Both K&R and ANSI rules for classifying constants are dependent 71 | on the value and base of the constant, as well as the implementation-specific 72 | ranges of the integral types. 73 | 74 | 75 | ======= 76 | DEFECTS 77 | ======= 78 | 79 | These are minor issues that could be fixed pretty easily. I'm just lazy. 80 | 81 | 82 | * 'extern' objects must be declared at file scope. 83 | 84 | K&R and ANSI both allow 'extern' objects to appear in local scopes. This 85 | results in a lot of extra bookkeeping in the front end because: 86 | 87 | 1. all such objects must have compatible declarations, and 88 | 2. such declarations can't be exported to the global scope. 89 | 90 | This means that "invisible" entries need to hang around the symbol table, 91 | and that just makes a mess, for no good reason that I can determine. 92 | 93 | 94 | * Aggregate initializer braces can't be elided. 95 | 96 | int a[2][2] = { { 1, 2 }, { 3, 4 } }; 97 | 98 | and 99 | 100 | int a[2][2] = { 1, 2, 3, 4 }; 101 | 102 | are identical by both K&R and ANSI, but NCC only accepts the first. 103 | 104 | 105 | Charles Youse 106 | December 27, 2018 107 | 108 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | -------------------------------------------------------------------------------- /README: -------------------------------------------------------------------------------- 1 | This is NCC, the "new" C compiler. It is intended to be used as the system 2 | compiler for BSD/64 (my port of pre-Reno 4.3BSD to Intel/AMD 64-bit desktops). 3 | 4 | The dialect of C accepted by the compiler is basically pre-ANSI (K&R 1978) 5 | with some common extensions and a few minor "fixes". (See the DIALECT file for 6 | specifics.) This is both a function of the compiler's purpose -- to operate on 7 | a mid-80s codebase -- and, admittedly, personal taste. 8 | 9 | The binary tools work on a proprietary object file format and produce a.out- 10 | format executables. These are documented in obj.h and a.out.h respectively. 11 | 12 | The compiler and its tools are fully functional and have been fairly well- 13 | tested, though they are works in progress. In particular, the optimizer is 14 | quite minimal: the framework for a more aggressive optimizer is there, but 15 | for the moment only rudimentary data-flow analysis is done to aid the register 16 | allocator and clean up the more egregious output from the code generator. 17 | 18 | NCC includes: 19 | 20 | ncc: compiler driver. 21 | ncpp: an ANSI C89 compliant C preprocessor. 22 | ncc1: the C compiler proper, produces assembly output 23 | nas: accepts 16/32/64-bit Intel syntax assembly and produces .o object. 24 | nld: the object linker - combines .o files into a.out executables. 25 | nobj: object/executable inspector. 26 | 27 | These are all original works and are BSD-licensed. See LICENSE and comments. 28 | 29 | Charles Youse 30 | December 27, 2018 31 | 32 | -------------------------------------------------------------------------------- /a.out.h: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | /* a bsd/64 executable is a fairly conventional a.out format. 26 | 27 | the text segment comes first, which begins with the exec header. 28 | the header is part of the text segment; thus the first user code 29 | in text segment is at offset 0x20 == sizeof(struct exec). the 30 | text segment is padded to a page boundary, and the data section 31 | follows. the data section is padded to an 8-byte boundary. 32 | 33 | the symbol table is last. each symbol consists of a NUL-terminated 34 | identifier, padded to an 8-byte boundary (with NULs), followed by 35 | an 8-byte address. 36 | 37 | if the text only makes RIP-relative references, it is position- 38 | independent (the compiler generates code this way already). if we 39 | wished to add shared library support, we'd simply need to fix up 40 | the initialized data area if it references any symbol addresses. 41 | extending a.out to hold a fixup table and modifying the linker to 42 | generate such a table would be straightforward. */ 43 | 44 | struct exec 45 | { 46 | unsigned a_magic; 47 | unsigned a_text; 48 | unsigned a_data; 49 | unsigned a_bss; 50 | unsigned a_entry; 51 | unsigned a_reserved[2]; 52 | unsigned a_syms; 53 | }; 54 | 55 | #define A_MAGIC 0x87CD1EEB /* $87CD SYNC: homage to OS-9/6809 */ 56 | 57 | -------------------------------------------------------------------------------- /makefile: -------------------------------------------------------------------------------- 1 | CC=gcc 2 | CFLAGS=-Wno-implicit-int -Wno-implicit-function-declaration 3 | 4 | all:: ncc nld nobj 5 | make CC="$(CC)" CFLAGS="$(CFLAGS)" -C ncpp 6 | make CC="$(CC)" CFLAGS="$(CFLAGS)" -C ncc1 7 | make CC="$(CC)" CFLAGS="$(CFLAGS)" -C nas 8 | 9 | ncc: ncc.c 10 | nld: nld.c 11 | nobj: nobj.c 12 | 13 | install:: all 14 | mkdir -p ~/bin 15 | cp ncc nld nobj ncpp/ncpp ncc1/ncc1 nas/nas ~/bin 16 | 17 | clean:: 18 | rm -f nld ncc nobj 19 | make -C ncpp clean 20 | make -C ncc1 clean 21 | make -C nas clean 22 | 23 | -------------------------------------------------------------------------------- /nas/makefile: -------------------------------------------------------------------------------- 1 | OBJS=nas.o name.o input.o output.o pseudo.o insn.o 2 | 3 | nas: $(OBJS) 4 | $(CC) $(CFLAGS) -o nas $(OBJS) 5 | 6 | clean:: 7 | rm -f *.o nas 8 | -------------------------------------------------------------------------------- /nas/name.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #include 26 | #include 27 | #include 28 | #include "nas.h" 29 | 30 | /* guaranteed-not-to-fail allocation, sort of */ 31 | 32 | static char * 33 | allocate(bytes) 34 | int bytes; 35 | { 36 | char * p = malloc(bytes); 37 | 38 | if (p == NULL) error("out of memory"); 39 | return p; 40 | } 41 | 42 | static struct name * buckets[NR_NAME_BUCKETS]; 43 | 44 | /* return the name entry for the string given, creating one if necessary. 45 | this is basically a copy of stringize() from symbol.c in the C compiler. */ 46 | 47 | struct name * 48 | lookup_name(data, length) 49 | char * data; 50 | int length; 51 | { 52 | struct name ** namep; 53 | struct name * name; 54 | unsigned hash; 55 | int i; 56 | 57 | for (i = 0, hash = 0; i < length; i++) { 58 | hash <<= 4; 59 | hash ^= (data[i] & 0xFF); 60 | } 61 | 62 | i = hash % NR_NAME_BUCKETS; 63 | for (namep = &(buckets[i]); (name = *namep); namep = &((*namep)->link)) { 64 | if (name->length != length) continue; 65 | if (name->hash != hash) continue; 66 | if (memcmp(name->data, data, length)) continue; 67 | 68 | *namep = name->link; /* move to top for LRU */ 69 | name->link = buckets[i]; 70 | buckets[i] = name; 71 | 72 | return name; 73 | } 74 | 75 | name = (struct name *) allocate(sizeof(struct name)); 76 | name->link = buckets[i]; 77 | buckets[i] = name; 78 | name->hash = hash; 79 | name->length = length; 80 | name->symbol = NULL; 81 | name->insn_entries = NULL; 82 | name->pseudo = NULL; 83 | name->token = 0; 84 | name->data = allocate(length + 1); 85 | memcpy(name->data, data, length); 86 | name->data[length] = 0; 87 | 88 | return name; 89 | } 90 | 91 | /* this function is called when a symbol is referenced, and 92 | does the necessary housekeeping. on return, name->symbol 93 | has valid data for the caller. 94 | 95 | NOTE: inside the assembler, obj_symbol.index refers to the 96 | symbol index that will be assigned to the symbol, NOT the 97 | index into the name table as it means on disk! */ 98 | 99 | reference(name) 100 | struct name * name; 101 | { 102 | int position; 103 | 104 | if (pass == FIRST_PASS) { 105 | if (name->symbol == NULL) { 106 | name->symbol = (struct obj_symbol *) allocate(sizeof(struct obj_symbol)); 107 | name->symbol->index = nr_symbols; 108 | name->symbol->flags = 0; 109 | name->symbol->value = 0; 110 | } 111 | } else if (pass == FINAL_PASS) { 112 | /* on first encounter, write the symbol and its name to the output file */ 113 | 114 | if (name->symbol->index == nr_symbols) { 115 | position = OBJ_SYMBOLS_OFFSET(header); 116 | position += name->symbol->index * sizeof(struct obj_symbol); 117 | name->symbol->index = name_bytes; 118 | output(position, name->symbol, sizeof(struct obj_symbol)); 119 | name->symbol->index = nr_symbols; 120 | 121 | position = OBJ_NAMES_OFFSET(header); 122 | position += name_bytes; 123 | output(position, name->data, name->length + 1); 124 | name_bytes += name->length + 1; 125 | } 126 | } else { 127 | /* the first intermediate pass catch undefined symbols */ 128 | 129 | if (!(name->symbol->flags & (OBJ_SYMBOL_GLOBAL | OBJ_SYMBOL_DEFINED))) 130 | error("undefined symbol"); 131 | } 132 | 133 | /* regardless of which pass, bump the symbol counter after the 134 | first appearance of any symbol. */ 135 | 136 | if (name->symbol->index == nr_symbols) nr_symbols++; 137 | } 138 | 139 | /* assign the given symbol the specified value, and mark it defined. 140 | like reference(), there's a fair amount of bookkeeping done here. */ 141 | 142 | define(name, value) 143 | struct name * name; 144 | long value; 145 | { 146 | reference(name); 147 | 148 | if (pass == FIRST_PASS) { 149 | if (name->symbol->flags & OBJ_SYMBOL_DEFINED) error("multiple definition"); 150 | } else if (pass == FINAL_PASS) { 151 | if (name->symbol->value != value) error("internal error: symbol changed"); 152 | } else /* intermediate pass */ { 153 | if (name->symbol->value != value) 154 | nr_symbol_changes++; 155 | } 156 | 157 | name->symbol->flags |= OBJ_SYMBOL_DEFINED; 158 | name->symbol->value = value; 159 | } 160 | 161 | /* at start up, names are loaded into the name table. most of the pre-load tables 162 | are here, but mnemonics from the instruction table are also installed. [given 163 | the sheer number of mnemonics for AMD64, these should probably be precomputed.] */ 164 | 165 | static struct 166 | { 167 | char * text; 168 | int token; 169 | } tokens[] = { 170 | { "byte", BYTE }, { "cs", REG_CS }, { "ss", REG_SS }, { "fs", REG_FS }, 171 | { "word", WORD }, { "ds", REG_DS }, { "es", REG_ES }, { "gs", REG_GS }, 172 | 173 | { "dword", DWORD }, 174 | { "qword", QWORD }, 175 | { "rip", REG_RIP }, 176 | 177 | { "cr0", REG_CR0 }, { "cr1", REG_CR1 }, { "cr2", REG_CR2 }, { "cr3", REG_CR3 }, 178 | { "cr4", REG_CR4 }, { "cr5", REG_CR5 }, { "cr6", REG_CR6 }, { "cr7", REG_CR7 }, 179 | 180 | { "rax", REG_RAX }, { "rbx", REG_RBX }, { "rcx", REG_RCX }, { "rdx", REG_RDX }, 181 | { "al", REG_AL }, { "ah", REG_AH }, { "ax", REG_AX }, { "eax", REG_EAX }, 182 | { "bl", REG_BL }, { "bh", REG_BH }, { "bx", REG_BX }, { "ebx", REG_EBX }, 183 | { "cl", REG_CL }, { "ch", REG_CH }, { "cx", REG_CX }, { "ecx", REG_ECX }, 184 | { "dl", REG_DL }, { "dh", REG_DH }, { "dx", REG_DX }, { "edx", REG_EDX }, 185 | { "bpl", REG_BPL }, { "bp", REG_BP }, { "ebp", REG_EBP }, { "rbp", REG_RBP }, 186 | { "spl", REG_SPL }, { "sp", REG_SP }, { "esp", REG_ESP }, { "rsp", REG_RSP }, 187 | { "sil", REG_SIL }, { "si", REG_SI }, { "esi", REG_ESI }, { "rsi", REG_RSI }, 188 | { "dil", REG_DIL }, { "di", REG_DI }, { "edi", REG_EDI }, { "rdi", REG_RDI }, 189 | { "r8b", REG_R8B }, { "r8w", REG_R8W }, { "r8d", REG_R8D }, { "r8", REG_R8 }, 190 | { "r9b", REG_R9B }, { "r9w", REG_R9W }, { "r9d", REG_R9D }, { "r9", REG_R9 }, 191 | { "r10b", REG_R10B }, { "r10w", REG_R10W }, { "r10d", REG_R10D }, { "r10", REG_R10 }, 192 | { "r11b", REG_R11B }, { "r11w", REG_R11W }, { "r11d", REG_R11D }, { "r11", REG_R11 }, 193 | { "r12b", REG_R12B }, { "r12w", REG_R12W }, { "r12d", REG_R12D }, { "r12", REG_R12 }, 194 | { "r13b", REG_R13B }, { "r13w", REG_R13W }, { "r13d", REG_R13D }, { "r13", REG_R13 }, 195 | { "r14b", REG_R14B }, { "r14w", REG_R14W }, { "r14d", REG_R14D }, { "r14", REG_R14 }, 196 | { "r15b", REG_R15B }, { "r15w", REG_R15W }, { "r15d", REG_R15D }, { "r15", REG_R15 }, 197 | 198 | { "xmm0", REG_XMM0 }, { "xmm1", REG_XMM1 }, { "xmm2", REG_XMM2 }, { "xmm3", REG_XMM3 }, 199 | { "xmm4", REG_XMM4 }, { "xmm5", REG_XMM5 }, { "xmm6", REG_XMM6 }, { "xmm7", REG_XMM7 }, 200 | { "xmm8", REG_XMM8 }, { "xmm9", REG_XMM9 }, { "xmm10", REG_XMM10 }, { "xmm11", REG_XMM11 }, 201 | { "xmm12", REG_XMM12 }, { "xmm13", REG_XMM13 }, { "xmm14", REG_XMM14 }, { "xmm15", REG_XMM15 } 202 | }; 203 | 204 | #define NR_TOKENS (sizeof(tokens)/sizeof(*tokens)) 205 | 206 | struct { 207 | char * text; 208 | int ( * handler )(); 209 | } pseudos[] = { 210 | { "byte", pseudo_byte }, 211 | { "word", pseudo_word }, 212 | { "dword", pseudo_dword }, 213 | { "qword", pseudo_qword }, 214 | { "ascii", pseudo_ascii }, 215 | { "global", pseudo_global }, 216 | { "align", pseudo_align }, 217 | { "skip", pseudo_skip }, 218 | { "fill", pseudo_fill }, 219 | { "text", pseudo_text }, 220 | { "data", pseudo_data }, 221 | { "bss", pseudo_bss }, 222 | { "org", pseudo_org }, 223 | { "bits", pseudo_bits } 224 | }; 225 | 226 | #define NR_PSEUDOS (sizeof(pseudos)/sizeof(*pseudos)) 227 | 228 | load_names() 229 | { 230 | struct name * name; 231 | int i; 232 | 233 | for (i = 0; i < NR_TOKENS; i++) { 234 | name = lookup_name(tokens[i].text, strlen(tokens[i].text)); 235 | name->token = tokens[i].token; 236 | } 237 | 238 | for (i = 0; insns[i].mnemonic; i++) { 239 | name = lookup_name(insns[i].mnemonic, strlen(insns[i].mnemonic)); 240 | if (!name->insn_entries) name->insn_entries = &insns[i]; 241 | insns[i].name = name; 242 | } 243 | 244 | for (i = 0; i < NR_PSEUDOS; i++) { 245 | name = lookup_name(pseudos[i].text, strlen(pseudos[i].text)); 246 | name->pseudo = pseudos[i].handler; 247 | } 248 | } 249 | -------------------------------------------------------------------------------- /nas/nas.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include "nas.h" 30 | 31 | int bits = 64; /* .bits */ 32 | int segment = OBJ_SYMBOL_SEG_TEXT; /* .text or .data? */ 33 | int text_bytes; /* current text segment position */ 34 | int data_bytes; /* current data segment position */ 35 | int name_bytes; /* current name bytes (last pass only) */ 36 | int nr_relocs; /* number of relocations */ 37 | int nr_symbols; /* number of symbols */ 38 | int pass; /* between FIRST_PASS .. FINAL_PASS */ 39 | char ** input_paths; /* array of input path names */ 40 | int input_index = -1; /* current index, -1 means "the beginning" */ 41 | FILE * input_file; /* current input file */ 42 | char input_line[MAX_INPUT_LINE]; /* current input line */ 43 | char * input_pos = input_line; /* position on that line */ 44 | int line_number; /* which line number input_line is */ 45 | char * output_path; /* output path ... */ 46 | FILE * output_file; /* ... and file */ 47 | char * list_path; /* these are NULL unless the ... */ 48 | FILE * list_file; /* ... user requested a listing file */ 49 | int token; /* current token */ 50 | struct name * name_token; /* name of most recent NAME or PSEUDO */ 51 | long number_token; /* value of most recent numeric token */ 52 | int nr_symbol_changes; /* symbols defined/changed this pass */ 53 | struct obj_header header; /* header to use for final pass output */ 54 | struct insn * insn; /* instruction being encoded */ 55 | int nr_operands; /* number of operands to current instruction */ 56 | struct operand operands[MAX_OPERANDS]; /* the operands themselves */ 57 | 58 | /* report an error, clean up the output(s), and exit */ 59 | 60 | #ifdef __STDC__ 61 | void 62 | error(char * fmt, ...) 63 | #else 64 | error(fmt) 65 | char * fmt; 66 | #endif 67 | { 68 | va_list args; 69 | 70 | fprintf(stderr, "as: "); 71 | 72 | if (input_index >= 0) { 73 | fprintf(stderr, "'%s' ", input_paths[input_index]); 74 | if (line_number) fprintf(stderr, "(%d) ", line_number); 75 | } 76 | 77 | fprintf(stderr,"ERROR: "); 78 | va_start(args, fmt); 79 | vfprintf(stderr, fmt, args); 80 | va_end(args); 81 | fputc('\n', stderr); 82 | 83 | if (output_file) { 84 | fclose(output_file); 85 | unlink(output_path); 86 | } 87 | 88 | if (list_file) { 89 | fclose(list_file); 90 | unlink(list_path); 91 | } 92 | 93 | exit(1); 94 | } 95 | 96 | 97 | assemble() 98 | { 99 | struct name * name; 100 | int i; 101 | 102 | input_index = -1; 103 | nr_symbol_changes = 0; 104 | nr_symbols = 0; 105 | nr_relocs = 0; 106 | text_bytes = 0; 107 | data_bytes = 0; 108 | segment = OBJ_SYMBOL_SEG_TEXT; 109 | 110 | scan(); 111 | while (token != NONE) { 112 | if (token == '\n') goto end_of_line; 113 | 114 | if (token == NAME && (*input_pos == '=')) { 115 | name = name_token; 116 | scan(); 117 | scan(); 118 | define(name, constant_expression()); 119 | OBJ_SYMBOL_SET_SEG(*(name->symbol), OBJ_SYMBOL_SEG_ABS); 120 | goto end_of_line; 121 | } 122 | 123 | if (token == NAME && (*input_pos == ':')) { 124 | name = name_token; 125 | scan(); 126 | scan(); 127 | define(name, (segment == OBJ_SYMBOL_SEG_TEXT) ? text_bytes : data_bytes); 128 | OBJ_SYMBOL_SET_SEG(*(name->symbol), segment); 129 | if (token == '\n') goto end_of_line; 130 | } 131 | 132 | if (token == PSEUDO) { 133 | name = name_token; 134 | scan(); 135 | 136 | if (name->pseudo == NULL) 137 | error("unknown pseudo-op"); 138 | else { 139 | name->pseudo(); 140 | goto end_of_line; 141 | } 142 | } 143 | 144 | if ((token != NAME) || !(name_token->insn_entries)) error("instruction or pseudo-op expected"); 145 | name = name_token; 146 | scan(); 147 | nr_operands = 0; 148 | 149 | if (token != '\n') { 150 | for (;;) { 151 | if (nr_operands == MAX_OPERANDS) error("too many operands"); 152 | operand(nr_operands); 153 | nr_operands++; 154 | 155 | if (token == ',') 156 | scan(); 157 | else 158 | break; 159 | } 160 | } 161 | 162 | for (insn = name->insn_entries; insn->name == name; insn++) { 163 | if (insn->nr_operands != nr_operands) goto mismatch; 164 | if ((bits == 16) && (insn->insn_flags & I_NO_BITS_16)) goto mismatch; 165 | if ((bits == 32) && (insn->insn_flags & I_NO_BITS_32)) goto mismatch; 166 | if ((bits == 64) && (insn->insn_flags & I_NO_BITS_64)) goto mismatch; 167 | 168 | for (i = 0; i < insn->nr_operands; i++) 169 | if (!(insn->operand_flags[i] & operands[i].flags)) goto mismatch; 170 | 171 | break; 172 | 173 | mismatch: ; 174 | } 175 | 176 | if (insn->name == name) 177 | encode(insn); 178 | else 179 | error("invalid instruction/operand combination"); 180 | 181 | end_of_line: 182 | if (token != '\n') error("trailing garbage - end of line expected"); 183 | scan(); 184 | } 185 | } 186 | 187 | main(argc, argv) 188 | char * argv[]; 189 | { 190 | int opt; 191 | 192 | while ((opt = getopt(argc, argv, "o:l:")) != -1) { 193 | switch (opt) 194 | { 195 | case 'o': 196 | output_path = optarg; 197 | break; 198 | 199 | case 'l': 200 | list_path = optarg; 201 | break; 202 | 203 | default: 204 | exit(1); 205 | } 206 | } 207 | 208 | input_paths = &argv[optind]; 209 | if (*input_paths == NULL) error("no input file(s) specified"); 210 | 211 | if (output_path == NULL) error("no output file (-o) specified"); 212 | output_file = fopen(output_path, "w"); 213 | if (output_file == NULL) error("can't open output file '%s'", output_path); 214 | 215 | if (list_path) { 216 | list_file = fopen(list_path, "w"); 217 | if (list_file == NULL) error("can't open list file '%s'", list_path); 218 | } 219 | 220 | /* assembly is in minimum three passes: 221 | FIRST_PASS: collect symbol data 222 | FIRST_PASS + n: repeat until symbols stabilize 223 | FINAL_PASS: assemble to output file */ 224 | 225 | load_names(); 226 | pass = FIRST_PASS; 227 | assemble(); 228 | 229 | do { 230 | pass++; 231 | if (pass == FINAL_PASS) error("too many passes"); 232 | assemble(); 233 | } while (nr_symbol_changes); 234 | 235 | header.magic = OBJ_MAGIC; 236 | header.text_bytes = text_bytes; 237 | header.data_bytes = data_bytes; 238 | header.nr_symbols = nr_symbols; 239 | header.nr_relocs = nr_relocs; 240 | 241 | pass = FINAL_PASS; 242 | assemble(); 243 | 244 | header.name_bytes = name_bytes; 245 | output(0, &header, sizeof(header)); 246 | 247 | fclose(output_file); 248 | if (list_file) fclose(list_file); 249 | return 0; 250 | } 251 | -------------------------------------------------------------------------------- /nas/pseudo.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #include "nas.h" 26 | 27 | /* .byte [ , ...] 28 | .word [ , ...] 29 | .dword [ , ...] 30 | .qword [ , ...] */ 31 | 32 | static 33 | pseudo_bwdq(flags) 34 | long flags; 35 | { 36 | for (;;) { 37 | operand(0); 38 | 39 | if (operands[0].kind != OPERAND_IMM) 40 | error("invalid operand"); 41 | 42 | if ((operands[0].flags & flags) == 0) 43 | error("out of range"); 44 | 45 | reloc(0, flags, 0); 46 | 47 | if (token == ',') 48 | scan(); 49 | else 50 | break; 51 | } 52 | } 53 | 54 | pseudo_byte() { pseudo_bwdq(O_IMM_8); } 55 | pseudo_word() { pseudo_bwdq(O_IMM_16); } 56 | pseudo_dword() { pseudo_bwdq(O_IMM_32); } 57 | pseudo_qword() { pseudo_bwdq(O_IMM_64); } 58 | 59 | /* .align <#> */ 60 | 61 | pseudo_align() 62 | { 63 | long boundary; 64 | char fill = 0; 65 | int bytes; 66 | 67 | if (segment == OBJ_SYMBOL_SEG_TEXT) { 68 | fill = 0x90; /* NOP */ 69 | bytes = text_bytes; 70 | } else 71 | bytes = data_bytes; 72 | 73 | boundary = constant_expression(); 74 | 75 | switch (boundary) 76 | { 77 | case 1: 78 | case 2: 79 | case 4: 80 | case 8: 81 | while (bytes % boundary) { 82 | emit(fill, 1); 83 | bytes++; 84 | } 85 | break; 86 | 87 | default: 88 | error("bogus alignment"); 89 | } 90 | } 91 | 92 | pseudo_skip() 93 | { 94 | long length; 95 | char fill = 0; 96 | 97 | length = constant_expression(); 98 | if (segment == OBJ_SYMBOL_SEG_TEXT) fill = 0x90; 99 | while (length--) emit(fill, 1); 100 | } 101 | 102 | pseudo_fill() 103 | { 104 | long length; 105 | char fill = 0; 106 | 107 | length = constant_expression(); 108 | if (token != ',') error("missing fill value"); 109 | scan(); 110 | fill = constant_expression(); 111 | if ((fill < -127) || (fill > 255)) error("invalid fill byte"); 112 | while (length--) emit(fill, 1); 113 | } 114 | 115 | /* .org
*/ 116 | 117 | pseudo_org() 118 | { 119 | long target; 120 | char fill = 0; 121 | int bytes; 122 | 123 | target = constant_expression(); 124 | 125 | if (segment == OBJ_SYMBOL_SEG_TEXT) { 126 | fill = 0x90; /* NOP */ 127 | bytes = text_bytes; 128 | } else 129 | bytes = data_bytes; 130 | 131 | target -= bytes; 132 | if (target < 0) error("origin goes backwards"); 133 | while (target--) emit(fill, 1); 134 | } 135 | 136 | /* .text 137 | .data 138 | 139 | select output segment */ 140 | 141 | pseudo_text() 142 | { 143 | segment = OBJ_SYMBOL_SEG_TEXT; 144 | } 145 | 146 | pseudo_data() 147 | { 148 | segment = OBJ_SYMBOL_SEG_DATA; 149 | } 150 | 151 | /* .bss , [ , ] */ 152 | 153 | pseudo_bss() 154 | { 155 | struct name * name; 156 | long number; 157 | int bit; 158 | int log2 = 0; 159 | 160 | if (token != NAME) error("expected symbol name"); 161 | name = name_token; 162 | scan(); 163 | 164 | if (token != ',') error("missing size"); 165 | scan(); 166 | number = constant_expression(); 167 | define(name, number); 168 | OBJ_SYMBOL_SET_SEG(*name->symbol, OBJ_SYMBOL_SEG_BSS); 169 | 170 | if (token == ',') { 171 | scan(); 172 | number = constant_expression(); 173 | bit = 1; 174 | 175 | while ((number & bit) != number) { 176 | log2++; 177 | bit <<= 1; 178 | if (!OBJ_SYMBOL_VALID_ALIGN(log2)) error("invalid alignment"); 179 | } 180 | } 181 | 182 | OBJ_SYMBOL_SET_ALIGN(*name->symbol, log2); 183 | } 184 | 185 | /* .ascii 186 | 187 | a string is any collection of characters (except newline) delimited 188 | by single or double quote. the scanner is bypassed since no other part 189 | of the assembler cares about strings. */ 190 | 191 | pseudo_ascii() 192 | { 193 | int delimiter; 194 | 195 | operands[0].kind == OPERAND_IMM; 196 | operands[0].symbol = NULL; 197 | operands[0].flags = O_IMM_8; 198 | 199 | delimiter = token; 200 | if ((delimiter != '\'') && (delimiter != '\"')) error("string expected"); 201 | 202 | while ((*input_pos != '\n') && (*input_pos != delimiter)) { 203 | operands[0].offset = *input_pos; 204 | reloc(0, O_IMM_8, 0); 205 | input_pos++; 206 | } 207 | 208 | if (*input_pos == delimiter) 209 | input_pos++; 210 | else 211 | error("missing closing delimiter"); 212 | 213 | scan(); 214 | } 215 | 216 | /* .global - mark a symbol for external linkage */ 217 | 218 | pseudo_global() 219 | { 220 | if (token != NAME) error("name expected"); 221 | reference(name_token); 222 | name_token->symbol->flags |= OBJ_SYMBOL_GLOBAL; 223 | scan(); 224 | } 225 | 226 | /* .bits <#> - set assembly mode */ 227 | 228 | pseudo_bits() 229 | { 230 | if (token != NUMBER) error(".bits takes a numeric operand"); 231 | 232 | switch (number_token) 233 | { 234 | case 16: 235 | case 32: 236 | case 64: 237 | bits = number_token; 238 | break; 239 | default: 240 | error("must be 16, 32 or 64"); 241 | } 242 | 243 | scan(); 244 | } 245 | -------------------------------------------------------------------------------- /ncc.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | #include 32 | #include 33 | 34 | /* this list will shrink once we implement archive support in the linker .. */ 35 | 36 | char * libs[] = 37 | { 38 | "/lib/libc.a" 39 | }; 40 | 41 | #define NR_LIBS (sizeof(libs)/sizeof(*libs)) 42 | 43 | char * mem(); 44 | 45 | /* lists holds the arguments used to invoke external programs */ 46 | 47 | #define LIST_INC 10 48 | 49 | struct list 50 | { 51 | int cap; 52 | int len; 53 | char ** s; 54 | }; 55 | 56 | struct list cpp; /* cpp */ 57 | struct list cc1; /* compiler */ 58 | struct list as; /* assembler */ 59 | struct list ld; /* linker */ 60 | struct list args; /* current commands */ 61 | struct list temps; /* list of temporary files (delete before exit) */ 62 | 63 | #define EXEC_FILE 0 64 | #define C_FILE 'c' 65 | #define CC1_FILE 'i' 66 | #define ASM_FILE 's' 67 | #define OBJ_FILE 'o' 68 | 69 | int goal = EXEC_FILE; 70 | char * ld_out; 71 | 72 | /* print an error message and abort */ 73 | 74 | #ifdef __STDC__ 75 | void 76 | error(char * fmt, ...) 77 | #else 78 | error(fmt) 79 | char * fmt; 80 | #endif 81 | { 82 | va_list args; 83 | 84 | if (fmt) { 85 | va_start(args, fmt); 86 | fprintf(stderr, "cc: "); 87 | vfprintf(stderr, fmt, args); 88 | va_end(args); 89 | fputc('\n', stderr); 90 | } 91 | 92 | rmtemps(); 93 | exit(1); 94 | } 95 | 96 | /* safe malloc */ 97 | 98 | char * 99 | mem(sz) 100 | { 101 | char * p = malloc(sz); 102 | 103 | if (!p) error("out of memory"); 104 | return p; 105 | } 106 | 107 | 108 | #define LIST_INC 10 /* increment of list element allocations */ 109 | 110 | #ifdef __STDC__ 111 | void 112 | add(struct list * list, ...) 113 | #else 114 | add(list) 115 | struct list * list; 116 | #endif 117 | { 118 | va_list ss; 119 | char **new; 120 | char *s; 121 | 122 | va_start(ss, list); 123 | while (s = va_arg(ss, char *)) { 124 | if (list->len == list->cap) { 125 | new = (char **) mem(sizeof(char *) * (list->cap + LIST_INC + 1)); 126 | if (new == NULL) error("out of memory"); 127 | if (list->s) { 128 | memcpy(new, list->s, sizeof(char *) * (list->cap)); 129 | free(list->s); 130 | } 131 | list->s = new; 132 | list->cap += LIST_INC; 133 | } 134 | list->s[list->len++] = s; 135 | list->s[list->len] = NULL; 136 | } 137 | } 138 | 139 | /* copy all the elements from 'dst' to 'src' */ 140 | 141 | copy(dst, src) 142 | struct list * dst; 143 | struct list * src; 144 | { 145 | int i; 146 | 147 | dst->len = 0; 148 | if (dst->cap) dst->s[0] = NULL; 149 | for (i = 0; i < src->len; i++) add(dst, src->s[i], NULL); 150 | } 151 | 152 | /* remove all temporary files */ 153 | 154 | rmtemps() 155 | { 156 | int i; 157 | 158 | for (i = 0; i < temps.len; i++) 159 | unlink(temps.s[i]); 160 | } 161 | 162 | 163 | /* return the type of a file based on its extension */ 164 | 165 | type(name) 166 | char * name; 167 | { 168 | char *dot; 169 | 170 | dot = strrchr(name, '.'); 171 | 172 | if (dot) { 173 | dot++; 174 | switch (*dot) { 175 | case C_FILE: 176 | case CC1_FILE: 177 | case ASM_FILE: 178 | case OBJ_FILE: 179 | return *dot; 180 | } 181 | } 182 | 183 | error("'%s': unknown file type", name); 184 | } 185 | 186 | /* return a copy of the name in question with its 187 | extension changed to 'ext' */ 188 | 189 | char * 190 | morph(name, ext) 191 | char * name; 192 | { 193 | char * dot; 194 | char * new; 195 | 196 | new = mem(strlen(name) + 1); 197 | strcpy(new, name); 198 | dot = strrchr(new, '.'); 199 | dot++; 200 | *dot = ext; 201 | 202 | return new; 203 | } 204 | 205 | /* run the command indicated by 'args', which will 206 | output to 'out'. 'out' will be removed if the program 207 | returns an error. */ 208 | 209 | run(args, out) 210 | struct list * args; 211 | char * out; 212 | { 213 | pid_t pid; 214 | int status; 215 | 216 | if ((pid = fork()) == 0) { 217 | execvp(args->s[0], args->s); 218 | error("can't exec '%s': %s", args->s[0], strerror(errno)); 219 | } 220 | 221 | if (pid == -1) error("can't fork: %s", strerror(errno)); 222 | 223 | while (pid != wait(&status)) ; 224 | 225 | if (status != 0) { 226 | error("compilation terminated abnormally"); 227 | add(&temps, out, NULL); 228 | error(NULL); 229 | } 230 | } 231 | 232 | main(argc, argv) 233 | char * argv[]; 234 | { 235 | char * new; 236 | char * src; 237 | int i; 238 | 239 | add(&cpp, "ncpp", NULL); 240 | add(&cc1, "ncc1", NULL); 241 | add(&as, "nas", "-o", NULL); 242 | add(&ld, "nld", "-b", "0xFFFFFF8000000000", "-e", "cstart", "-o", NULL); 243 | 244 | ++argv; 245 | 246 | while (*argv && (*argv[0] == '-')) { 247 | switch ((*argv)[1]) { 248 | case 'D': 249 | case 'I': 250 | add(&cpp, *argv, NULL); 251 | break; 252 | 253 | case 'g': 254 | case 'O': 255 | add(&cc1, *argv, NULL); 256 | break; 257 | 258 | case 'S': 259 | case 'P': 260 | case 'c': 261 | if ((*argv)[2]) error("malformed goal option"); 262 | if (goal != EXEC_FILE) error("conflicting goal options"); 263 | 264 | goal = ((*argv)[1] == 'S') ? 265 | ASM_FILE 266 | : (((*argv)[1] == 'P') ? CC1_FILE : OBJ_FILE); 267 | 268 | break; 269 | 270 | case 'o': 271 | if ((*argv)[2] || !argv[1]) error("malformed output option"); 272 | if (ld_out) error("duplicate output option"); 273 | ++argv; 274 | ld_out = *argv; 275 | break; 276 | 277 | default: 278 | error("unrecognized option: %c\n", (*argv)[1]); 279 | } 280 | ++argv; 281 | } 282 | 283 | if (ld_out == NULL) ld_out = "a.out"; 284 | add(&ld, ld_out, "/lib/cstart.o", NULL); 285 | 286 | if (*argv == NULL) error("no input files"); 287 | 288 | while (*argv) { 289 | src = *argv; 290 | switch (type(src)) { 291 | case C_FILE: 292 | new = morph(src, CC1_FILE); 293 | copy(&args, &cpp); 294 | add(&args, src, NULL); 295 | add(&args, new, NULL); 296 | run(&args, new); 297 | if (goal == CC1_FILE) break; 298 | add(&temps, new, NULL); 299 | src = new; 300 | 301 | case CC1_FILE: 302 | new = morph(src, ASM_FILE); 303 | copy(&args, &cc1); 304 | add(&args, src, NULL); 305 | add(&args, new, NULL); 306 | run(&args, new); 307 | if (goal == ASM_FILE) break; 308 | src = new; 309 | add(&temps, new, NULL); 310 | 311 | case ASM_FILE: 312 | new = morph(src, OBJ_FILE); 313 | copy(&args, &as); 314 | add(&args, new, NULL); 315 | add(&args, src, NULL); 316 | run(&args, new); 317 | if (goal == OBJ_FILE) break; 318 | add(&temps, new, NULL); 319 | src = new; 320 | 321 | case OBJ_FILE: 322 | add(&ld, src, NULL); 323 | } 324 | argv++; 325 | } 326 | 327 | if (goal == EXEC_FILE) { 328 | for (i = 0; i < NR_LIBS; ++i) add(&ld, libs[i], NULL); 329 | run(&ld, ld_out); 330 | } 331 | 332 | rmtemps(); 333 | exit(0); 334 | } 335 | -------------------------------------------------------------------------------- /ncc1/block.h: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #define B_SEQ 0x00000001 /* sequenced */ 26 | #define B_REG 0x00000002 /* registers allocated */ 27 | #define B_RECON 0x00000004 /* reconciliation block */ 28 | 29 | struct block 30 | { 31 | int asm_label; 32 | int bs; 33 | int loop_level; 34 | int nr_insns; 35 | struct block * previous; 36 | struct block * next; 37 | struct insn * first_insn; 38 | struct insn * last_insn; 39 | struct block_list * successors; 40 | struct block_list * predecessors; 41 | int nr_successors; 42 | int nr_predecessors; 43 | struct defuse * defuses; 44 | struct symbol * iregs[NR_REGS]; 45 | struct symbol * fregs[NR_REGS]; 46 | 47 | /* the prohibit_* fields are (real) register bitmasks (1 << R_IDX(x)) 48 | determined in analyze_block() that tell the allocator which registers 49 | aren't available in this block. similarly, the temponly_* bitmasks 50 | indicate which registers are only available for temporaries (DU_TEMP). */ 51 | 52 | int prohibit_iregs; 53 | int prohibit_fregs; 54 | int temponly_iregs; 55 | int temponly_fregs; 56 | }; 57 | 58 | /* for successors, 'cc' is the branch condition that leads to 59 | the successor. (predecessors have 'cc' = CC_NONE.) 60 | 61 | rules regarding block successors: 62 | 63 | 0 successors: only the exit_block has no successors. 64 | 1 successor: 'cc' for the one successor is ALWAYS. 65 | 2 successors: the 'cc's are opposite conditions. */ 66 | 67 | struct block_list 68 | { 69 | int cc; 70 | struct block * block; 71 | struct block_list * link; 72 | }; 73 | 74 | /* CC_* represent AMD64 branch conditions. their values are 75 | important for two reasons: 76 | 77 | 1. they're used as indices to select instructions, 78 | (e.g., I_SETZ + CC_LE = I_SETLE, etc.) and 79 | 2. they're used as table indexes in output.c, 80 | 3. opposite conditions differ in the LSB only, so 81 | CC_INVERT can toggle between them easily. 82 | 83 | CC_NEVER obviously never makes sense for an actual 84 | successor. */ 85 | 86 | 87 | #define CC_INVERT(x) ((x) ^ 1) 88 | 89 | #define CC_Z 0 90 | #define CC_NZ 1 91 | #define CC_G 2 92 | #define CC_LE 3 93 | #define CC_GE 4 94 | #define CC_L 5 95 | #define CC_A 6 96 | #define CC_BE 7 97 | #define CC_AE 8 98 | #define CC_B 9 99 | #define CC_ALWAYS 10 100 | #define CC_NEVER 11 101 | 102 | #define CC_NONE 12 103 | 104 | /* def/use, live variable information tracking and other sundries. */ 105 | 106 | struct defuse 107 | { 108 | struct symbol * symbol; 109 | struct defuse * link; 110 | int dus; /* DU_* */ 111 | int reg; 112 | int cache; /* DU_CACHE */ 113 | int con; /* if DU_CON; see con_prop() [opt.c] */ 114 | 115 | /* first_n and last_n give the insn indexes (insn->n) of the first 116 | and last appearances of the symbol in the block. if the symbol 117 | doesn't actually appear in the block, these are 0. */ 118 | 119 | int first_n, last_n; 120 | 121 | /* for transit variables, 'distance' indicates how far to the 122 | next reference of the variable. it's a heuristic value, but 123 | smaller means closer. */ 124 | 125 | int distance; 126 | }; 127 | 128 | #define DU_IN 0x00000001 /* live in and/or out */ 129 | #define DU_OUT 0x00000002 130 | #define DU_DEF 0x00000004 /* def or use */ 131 | #define DU_USE 0x00000008 132 | #define DU_CON 0x00000010 /* for const_prop() [opt.c] */ 133 | 134 | /* the register allocator uses these fields to track the coherency 135 | between the ->reg and memory, for aliased (i.e., non S_REGISTER) */ 136 | 137 | #define DU_CACHE_CLEAN 0 /* register and memory agree */ 138 | #define DU_CACHE_DIRTY 1 /* register valid, memory out-of-date */ 139 | #define DU_CACHE_INVALID 2 /* register invalid, memory up-to-date */ 140 | 141 | /* a variable is deemed a 'temporary' if it is neither live in nor 142 | live out, and its storage class is S_REGISTER. its entire lifespan 143 | is [first_n..last_n] in this block. */ 144 | 145 | #define DU_TEMP(x) (!(((x).dus) & (DU_IN | DU_OUT)) && ((x).symbol->ss & S_REGISTER)) 146 | 147 | /* a transit variable is not referenced in this block. */ 148 | 149 | #define DU_TRANSIT(x) (!(((x).dus) & (DU_DEF | DU_USE))) 150 | 151 | /* an instruction is an opcode with up to 3 operands, which are 152 | expression trees, but limited to E_REG, E_MEM, E_IMM, E_CON. */ 153 | 154 | #define NR_INSN_OPERANDS 3 155 | #define NR_INSN_REGS (NR_INSN_OPERANDS * 2) /* maximum # of registers referenced in one insn */ 156 | 157 | #define INSN_FLAG_CC 0x00000001 /* condition codes from this insn used */ 158 | 159 | struct insn 160 | { 161 | struct insn * previous; 162 | struct insn * next; 163 | 164 | /* the opcode and operands are sufficient to describe the 165 | instruction for output to the assembler. */ 166 | 167 | int opcode; /* I_* */ 168 | struct tree * operand[NR_INSN_OPERANDS]; 169 | 170 | /* these fields aren't valid until we begin analyis 171 | before optimization and register allocation */ 172 | 173 | int flags; /* INSN_FLAG_* */ 174 | int n; /* insn # in block (1..2..3..) */ 175 | char mem_used; /* memory read? */ 176 | char mem_defd; /* memory written? */ 177 | int regs_used[NR_INSN_REGS]; /* USEd regs */ 178 | int regs_defd[NR_INSN_REGS]; /* DEFd regs */ 179 | }; 180 | 181 | /* I[7:0] are unique, and serve as 182 | indices into insns[] in output.c */ 183 | 184 | #define I_IDX(x) ((x) & 0xFF) 185 | 186 | /* I[9:8] encode the number of operands */ 187 | 188 | #define I_NR_OPERANDS(x) (((x) >> 8) & 0x03) 189 | #define I_0_OPERANDS (0 << 8) 190 | #define I_1_OPERANDS (1 << 8) 191 | #define I_2_OPERANDS (2 << 8) 192 | #define I_3_OPERANDS (3 << 8) /* none yet */ 193 | 194 | /* I[12:10] are the def bits, I[15:13] are the use bits */ 195 | 196 | #define I_DEF(i) (1 << (10 + (i))) 197 | #define I_USE(i) (1 << (13 + (i))) 198 | 199 | /* I[23:16] are for implicit def/use */ 200 | 201 | #define I_DEF_AX (1 << 16) 202 | #define I_DEF_DX (1 << 17) 203 | #define I_DEF_CX (1 << 18) 204 | #define I_DEF_XMM0 (1 << 19) 205 | #define I_USE_AX (1 << 20) 206 | #define I_USE_DX (1 << 21) 207 | #define I_DEF_MEM (1 << 22) 208 | #define I_USE_MEM (1 << 23) 209 | 210 | /* I[24] indicates if this instruction modifies CCs */ 211 | /* I[25] indicates if this instruction uses CCs */ 212 | 213 | #define I_DEF_CC (1 << 24) 214 | #define I_USE_CC (1 << 25) 215 | 216 | /* the instructions */ 217 | 218 | #define I_NOP ( 0 | I_0_OPERANDS ) 219 | #define I_MOV ( 1 | I_2_OPERANDS | I_DEF(0) | I_USE(1) ) 220 | #define I_MOVSX ( 2 | I_2_OPERANDS | I_DEF(0) | I_USE(1) ) 221 | #define I_MOVZX ( 3 | I_2_OPERANDS | I_DEF(0) | I_USE(1) ) 222 | #define I_MOVSS ( 4 | I_2_OPERANDS | I_DEF(0) | I_USE(1) ) 223 | #define I_MOVSD ( 5 | I_2_OPERANDS | I_DEF(0) | I_USE(1) ) 224 | #define I_LEA ( 6 | I_2_OPERANDS | I_DEF(0) | I_USE(1) ) 225 | #define I_CMP ( 7 | I_2_OPERANDS | I_USE(0) | I_USE(1) | I_DEF_CC ) 226 | #define I_UCOMISS ( 8 | I_2_OPERANDS | I_USE(0) | I_USE(1) | I_DEF_CC ) 227 | #define I_UCOMISD ( 9 | I_2_OPERANDS | I_USE(0) | I_USE(1) | I_DEF_CC ) 228 | #define I_PXOR ( 10 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) ) 229 | #define I_CVTSS2SI ( 11 | I_2_OPERANDS | I_DEF(0) | I_USE(1) ) 230 | #define I_CVTSD2SI ( 12 | I_2_OPERANDS | I_DEF(0) | I_USE(1) ) 231 | #define I_CVTSI2SS ( 13 | I_2_OPERANDS | I_DEF(0) | I_USE(1) ) 232 | #define I_CVTSI2SD ( 14 | I_2_OPERANDS | I_DEF(0) | I_USE(1) ) 233 | #define I_CVTSS2SD ( 15 | I_2_OPERANDS | I_DEF(0) | I_USE(1) ) 234 | #define I_CVTSD2SS ( 16 | I_2_OPERANDS | I_DEF(0) | I_USE(1) ) 235 | #define I_SHL ( 17 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) | I_DEF_CC ) 236 | #define I_SHR ( 18 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) | I_DEF_CC ) 237 | #define I_SAR ( 19 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) | I_DEF_CC ) 238 | #define I_ADD ( 20 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) | I_DEF_CC ) 239 | #define I_ADDSS ( 21 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) ) 240 | #define I_ADDSD ( 22 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) ) 241 | #define I_SUB ( 23 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) | I_DEF_CC ) 242 | #define I_SUBSS ( 24 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) ) 243 | #define I_SUBSD ( 25 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) ) 244 | #define I_IMUL ( 26 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) | I_DEF_CC ) 245 | #define I_MULSS ( 27 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) ) 246 | #define I_MULSD ( 28 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) ) 247 | #define I_OR ( 29 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) | I_DEF_CC ) 248 | #define I_XOR ( 30 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) | I_DEF_CC ) 249 | #define I_AND ( 31 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) | I_DEF_CC ) 250 | #define I_CDQ ( 32 | I_0_OPERANDS | I_USE_AX | I_DEF_DX ) 251 | #define I_CQO ( 33 | I_0_OPERANDS | I_USE_AX | I_DEF_DX ) 252 | #define I_DIV ( 34 | I_1_OPERANDS | I_USE(0) | I_USE_AX | I_DEF_AX | I_USE_DX | I_DEF_DX | I_DEF_CC ) 253 | #define I_IDIV ( 35 | I_1_OPERANDS | I_USE(0) | I_USE_AX | I_DEF_AX | I_USE_DX | I_DEF_DX | I_DEF_CC ) 254 | #define I_DIVSS ( 36 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) ) 255 | #define I_DIVSD ( 37 | I_2_OPERANDS | I_DEF(0) | I_USE(0) | I_USE(1) ) 256 | #define I_CBW ( 38 | I_0_OPERANDS | I_USE_AX | I_DEF_AX ) 257 | #define I_CWD ( 39 | I_0_OPERANDS | I_USE_AX | I_DEF_AX | I_DEF_DX ) 258 | #define I_SETZ ( 40 | I_1_OPERANDS | I_DEF(0) | I_USE_CC ) 259 | #define I_SETNZ ( 41 | I_1_OPERANDS | I_DEF(0) | I_USE_CC ) 260 | #define I_SETG ( 42 | I_1_OPERANDS | I_DEF(0) | I_USE_CC ) 261 | #define I_SETLE ( 43 | I_1_OPERANDS | I_DEF(0) | I_USE_CC ) 262 | #define I_SETGE ( 44 | I_1_OPERANDS | I_DEF(0) | I_USE_CC ) 263 | #define I_SETL ( 45 | I_1_OPERANDS | I_DEF(0) | I_USE_CC ) 264 | #define I_SETA ( 46 | I_1_OPERANDS | I_DEF(0) | I_USE_CC ) 265 | #define I_SETBE ( 47 | I_1_OPERANDS | I_DEF(0) | I_USE_CC ) 266 | #define I_SETAE ( 48 | I_1_OPERANDS | I_DEF(0) | I_USE_CC ) 267 | #define I_SETB ( 49 | I_1_OPERANDS | I_DEF(0) | I_USE_CC ) 268 | #define I_NOT ( 50 | I_1_OPERANDS | I_DEF(0) | I_USE(0) ) 269 | #define I_NEG ( 51 | I_1_OPERANDS | I_DEF(0) | I_USE(0) | I_DEF_CC ) 270 | #define I_PUSH ( 52 | I_1_OPERANDS | I_USE(0) ) 271 | #define I_POP ( 53 | I_1_OPERANDS | I_DEF(0) ) 272 | #define I_CALL ( 54 | I_1_OPERANDS | I_USE(0) | I_DEF_AX | I_DEF_CX | I_DEF_DX | I_DEF_XMM0 | I_DEF_CC ) 273 | #define I_TEST ( 55 | I_2_OPERANDS | I_USE(0) | I_USE(1) | I_DEF_CC ) 274 | #define I_RET ( 56 | I_0_OPERANDS ) 275 | #define I_INC ( 57 | I_1_OPERANDS | I_DEF(0) | I_USE(0) | I_DEF_CC ) 276 | #define I_DEC ( 58 | I_1_OPERANDS | I_DEF(0) | I_USE(0) | I_DEF_CC ) 277 | -------------------------------------------------------------------------------- /ncc1/init.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #include "ncc1.h" 26 | 27 | /* convert a floating-point E_CON into an E_MEM node referencing 28 | an anonymous, static symbol in the text segment. */ 29 | 30 | struct tree * 31 | float_literal(tree) 32 | struct tree * tree; 33 | { 34 | struct symbol * symbol; 35 | 36 | symbol = new_symbol(NULL, S_STATIC, copy_type(tree->type)); 37 | symbol->i = next_asm_label++; 38 | put_symbol(symbol, SCOPE_RETIRED); 39 | segment(SEGMENT_TEXT); 40 | output("%G: %s %O\n", symbol, (tree->type->ts & T_LFLOAT) ? ".qword" : ".dword", tree); 41 | free_tree(tree); 42 | return memory_tree(symbol); 43 | } 44 | 45 | /* this function tracks state to achieve bit granularity in initializer output. */ 46 | 47 | static 48 | initialize_bits(i, n) 49 | long i; 50 | { 51 | static char buf; 52 | static char pos; 53 | 54 | while (n--) { 55 | buf >>= 1; 56 | 57 | if (i & 1) 58 | buf |= 0x80; 59 | else 60 | buf &= 0x7F; 61 | 62 | i >>= 1; 63 | pos++; 64 | if ((pos % 8) == 0) output(" .byte %d\n", buf & 255); 65 | } 66 | } 67 | 68 | static initialize(); 69 | 70 | /* read one scalar value and output it to the current 71 | position in the data segment. 'type' is NOT consumed. */ 72 | 73 | static 74 | initialize_scalar(type) 75 | struct type * type; 76 | { 77 | struct symbol * symbol; 78 | struct tree * tree; 79 | 80 | /* fake an assignment to a fake symbol of the right type, then discard 81 | everything but the right side of the assignment expression. */ 82 | 83 | symbol = new_symbol(NULL, S_AUTO, copy_type(type)); 84 | tree = symbol_tree(symbol); 85 | tree = assignment_expression(tree); 86 | decap_tree(tree, NULL, NULL, &tree, NULL); 87 | tree = generate(tree, GOAL_VALUE, NULL); 88 | free_symbol(symbol); 89 | 90 | if ((tree->op != E_CON) && (tree->op != E_IMM)) error(ERROR_BADINIT); 91 | 92 | /* static/extern address have RIP set - lose it */ 93 | 94 | if (tree->op == E_IMM) { 95 | tree->u.mi.rip = 0; 96 | if ((tree->u.mi.b != R_NONE) || (tree->u.mi.i != R_NONE)) error(ERROR_BADINIT); 97 | } 98 | 99 | if (type->ts & T_FIELD) { 100 | if (tree->op != E_CON) error(ERROR_BADINIT); 101 | initialize_bits(tree->u.con.i, T_GET_SIZE(type->ts)); 102 | } else { 103 | if (type->ts & T_IS_BYTE) output(" .byte "); 104 | if (type->ts & T_IS_WORD) output(" .word "); 105 | if (type->ts & T_IS_DWORD) output(" .dword "); 106 | if (type->ts & T_IS_QWORD) output(" .qword "); 107 | output("%O\n", tree); 108 | } 109 | 110 | free_tree(tree); 111 | } 112 | 113 | static 114 | initialize_array(type) 115 | struct type * type; 116 | { 117 | struct type * element_type; 118 | int nr_elements = 0; 119 | 120 | element_type = type->next; 121 | 122 | if ((element_type->ts & T_IS_CHAR) && (token.kk == KK_STRLIT)) { 123 | nr_elements = token.u.text->length; 124 | if (nr_elements != type->nr_elements) nr_elements++; 125 | output_string(token.u.text, nr_elements); 126 | lex(); 127 | } else { 128 | match(KK_LBRACE); 129 | while (token.kk != KK_RBRACE) { 130 | initialize(element_type); 131 | nr_elements++; 132 | 133 | if (token.kk == KK_COMMA) { 134 | lex(); 135 | prohibit(KK_RBRACE); 136 | } else 137 | break; 138 | } 139 | match(KK_RBRACE); 140 | } 141 | 142 | if (type->nr_elements == 0) type->nr_elements = nr_elements; 143 | if ((type->nr_elements == 0) || (nr_elements > type->nr_elements)) error(ERROR_BADINIT); 144 | 145 | if (nr_elements < type->nr_elements) 146 | output(" .fill %d,0\n", (type->nr_elements - nr_elements) * size_of(element_type)); 147 | } 148 | 149 | static 150 | initialize_struct(type) 151 | struct type * type; 152 | { 153 | struct symbol * member; 154 | int offset_bits = 0; 155 | int adjust_bits; 156 | 157 | match(KK_LBRACE); 158 | member = type->tag->list; 159 | if (type->tag->ss & S_UNION) error(ERROR_BADINIT); 160 | 161 | while (token.kk != KK_RBRACE) { 162 | if (member == NULL) error(ERROR_BADINIT); 163 | adjust_bits = (member->i * BITS) + T_GET_SHIFT(member->type->ts) - offset_bits; 164 | initialize_bits(0, adjust_bits % 8); 165 | if (adjust_bits / BITS) output(" .fill %d,0\n", adjust_bits / 8); 166 | offset_bits += adjust_bits; 167 | initialize(member->type); 168 | 169 | if (member->type->ts & T_FIELD) 170 | offset_bits += T_GET_SIZE(member->type->ts); 171 | else 172 | offset_bits += size_of(member->type) * 8; 173 | 174 | if (token.kk == KK_COMMA) { 175 | lex(); 176 | prohibit(KK_RBRACE); 177 | } else 178 | break; 179 | 180 | member = member->list; 181 | } 182 | 183 | match(KK_RBRACE); 184 | adjust_bits = (size_of(type) * BITS) - offset_bits; 185 | initialize_bits(0, adjust_bits % BITS); 186 | if (adjust_bits / BITS) output(" .fill %d,0\n", adjust_bits / 8); 187 | } 188 | 189 | static 190 | initialize(type) 191 | struct type * type; 192 | { 193 | if (type->ts & T_IS_SCALAR) 194 | initialize_scalar(type); 195 | else if (type->ts & T_ARRAY) 196 | initialize_array(type); 197 | else if (type->ts & T_TAG) 198 | initialize_struct(type); 199 | else 200 | error(ERROR_INTERNAL); 201 | } 202 | 203 | /* just declared the 'symbol' with the explicit storage class 'ss'. 204 | (we only care about 'ss' to distinguish between explicit and 205 | implicit 'extern'). the job of initializer() is to process an 206 | initializer, if present, or reserve uninitialized storage instead. */ 207 | 208 | initializer(symbol, ss) 209 | struct symbol * symbol; 210 | { 211 | struct tree * tree; 212 | struct block * saved_block; 213 | 214 | if (symbol->ss & S_BLOCK) size_of(symbol->type); 215 | 216 | if ((symbol->ss & (S_STATIC | S_EXTERN)) && !(ss & S_EXTERN)) { 217 | if (symbol->ss & S_DEFINED) error(ERROR_DUPDEF); 218 | if (symbol->ss & S_STATIC) symbol->i = next_asm_label++; 219 | if (symbol->ss & S_EXTERN) output(".global %G\n", symbol); 220 | symbol->ss |= S_DEFINED; 221 | 222 | if (token.kk == KK_EQ) { 223 | lex(); 224 | saved_block = current_block; /* don't allow code generation */ 225 | current_block = NULL; 226 | 227 | segment(SEGMENT_DATA); 228 | output(".align %d\n", align_of(symbol->type)); 229 | output("%G:", symbol); 230 | initialize(symbol->type); 231 | symbol->ss |= S_DEFINED; 232 | current_block = saved_block; 233 | } else 234 | output(".bss %G,%d,%d\n", symbol, size_of(symbol->type), align_of(symbol->type)); 235 | } else { 236 | if (token.kk == KK_EQ) { 237 | lex(); 238 | if (symbol->ss & S_TYPEDEF) error(ERROR_BADINIT); 239 | if (ss & S_EXTERN) error(ERROR_BADINIT); 240 | if (!(symbol->type->ts & T_IS_SCALAR)) error(ERROR_BADINIT); 241 | tree = symbol_tree(symbol); 242 | tree = assignment_expression(tree); 243 | generate(tree, GOAL_EFFECT, NULL); 244 | } 245 | } 246 | } 247 | -------------------------------------------------------------------------------- /ncc1/makefile: -------------------------------------------------------------------------------- 1 | HDRS=ncc1.h token.h symbol.h type.h tree.h block.h reg.h 2 | OBJS=ncc1.o lex.o symbol.o type.o decl.o init.o stmt.o block.o \ 3 | opt.o reg.o tree.o output.o gen.o 4 | 5 | ncc1: $(OBJS) 6 | $(CC) $(CFLAGS) -o ncc1 $(OBJS) 7 | 8 | clean:: 9 | rm -f *.o ncc1 10 | -------------------------------------------------------------------------------- /ncc1/ncc1.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include "ncc1.h" 31 | 32 | int g_flag; /* -g: produce debug info */ 33 | int O_flag; /* -O: enable optimizations */ 34 | FILE * yyin; /* lexical input */ 35 | struct token token; 36 | struct string * input_name; /* input file name and line number ... */ 37 | int line_number; /* ... subject to change from # line */ 38 | FILE * output_file; 39 | struct string * output_name; 40 | int next_asm_label = 1; 41 | int next_iregister = R_IPSEUDO; 42 | int next_fregister = R_FPSEUDO; 43 | int current_scope = SCOPE_GLOBAL; 44 | struct symbol * current_function; 45 | int frame_offset; 46 | int save_iregs; /* bitsets (1 << R_IDX(x)) of registers .. */ 47 | int save_fregs; /* .. used in this function */ 48 | int loop_level; 49 | struct block * first_block; 50 | struct block * last_block; 51 | struct block * current_block; 52 | struct block * entry_block; 53 | struct block * exit_block; 54 | 55 | /* report an error to the user, clean up, and abort. 56 | error messages must match the indices (ERROR_*) in cc1.h. */ 57 | 58 | static char *errors[] = 59 | { 60 | "command line syntax", /* ERROR_CMDLINE */ 61 | "can't read input file", /* ERROR_INPUT */ 62 | "syntax", /* ERROR_SYNTAX */ 63 | "out of memory", /* ERROR_MEMORY */ 64 | "malformed directive", /* ERROR_DIRECTIVE */ 65 | "lexical failure", /* ERROR_LEXICAL */ 66 | "malformed integral constant", /* ERROR_BADICON */ 67 | "integral constant out of range", /* ERROR_IRANGE */ 68 | "malformed floating constant", /* ERROR_BADFCON */ 69 | "floating constant out of range", /* ERROR_FRANGE */ 70 | "invalid octal escape sequence", /* ERROR_ESCAPE */ 71 | "unterminated string literal", /* ERROR_UNTERM */ 72 | "invalid character constant", /* ERROR_BADCCON */ 73 | "overflow in multi-character constant", /* ERROR_CRANGE */ 74 | "can't open output file", /* ERROR_OUTPUT */ 75 | "nesting level too deep", /* ERROR_NESTING */ 76 | "type too big", /* ERROR_TYPESIZE */ 77 | "incomplete type", /* ERROR_INCOMPLETE */ 78 | "illegal use of function type", /* ERROR_ILLFUNC */ 79 | "illegal array specification", /* ERROR_ILLARRAY */ 80 | "illegal function return type", /* ERROR_RETURN */ 81 | "illegal use of struct/union type", /* ERROR_STRUCT */ 82 | "misplaced formal arguments", /* ERROR_NOARGS */ 83 | "bit field not permitted", /* ERROR_ILLFIELD */ 84 | "illegal type for bit field", /* ERROR_FIELDTY */ 85 | "invalid bit field size", /* ERROR_FIELDSZ */ 86 | "storage class not permitted", /* ERROR_SCLASS */ 87 | "struct/union already defined", /* ERROR_TAGREDEF */ 88 | "empty struct/union definition", /* ERROR_EMPTY */ 89 | "duplicate member declaration", /* ERROR_REMEMBER */ 90 | "illegal/incompatible redeclaration", /* ERROR_REDECL */ 91 | "unknown argument identifier", /* ERROR_NOTARG */ 92 | "declare functions at file scope", /* ERROR_NOFUNC */ 93 | "can't do that with typedef", /* ERROR_TYPEDEF */ 94 | "unknown identifier", /* ERROR_UNKNOWN */ 95 | "illegal operand(s)", /* ERROR_OPERANDS */ 96 | "incompatible type", /* ERROR_INCOMPAT */ 97 | "not an lvalue", /* ERROR_LVALUE */ 98 | "abstract type required", /* ERROR_ABSTRACT */ 99 | "declaration missing identifier", /* ERROR_MISSING */ 100 | "bad type cast", /* ERROR_BADCAST */ 101 | "illegal indirection", /* ERROR_INDIR */ 102 | "struct or union required", /* ERROR_NOTSTRUCT */ 103 | "not a member of that struct/union", /* ERROR_NOTMEMBER */ 104 | "can only call functions", /* ERROR_NEEDFUNC */ 105 | "compiler bug", /* ERROR_INTERNAL */ 106 | "can't take address of register", /* ERROR_REGISTER */ 107 | "constant expression required", /* ERROR_CONEXPR */ 108 | "division by 0", /* ERROR_DIV0 */ 109 | "illegal constant expression", /* ERROR_BADEXPR */ 110 | "wrong segment for function call", /* ERROR_SEGMENT */ 111 | "bad initializer", /* ERROR_BADINIT */ 112 | "duplicate definition", /* ERROR_DUPDEF */ 113 | "misplaced break, continue or case", /* ERROR_MISPLACED */ 114 | "dangling goto (undefined label)", /* ERROR_DANGLING */ 115 | "duplicate case label", /* ERROR_DUPCASE */ 116 | "switch/case expression not integral" /* ERROR_CASE */ 117 | }; 118 | 119 | error(code) 120 | { 121 | fprintf(stderr, "cc1: "); 122 | 123 | if (input_name) { 124 | fprintf(stderr, "'%s' ", input_name->data); 125 | if (line_number) fprintf(stderr, "(%d) ", line_number); 126 | } 127 | 128 | fprintf(stderr, "ERROR: %s\n", errors[code]); 129 | 130 | if (output_file) { 131 | fclose(output_file); 132 | unlink(output_name->data); 133 | } 134 | 135 | exit(1); 136 | } 137 | 138 | /* a general-purpose allocation function. guarantees success. */ 139 | 140 | char * 141 | allocate(bytes) 142 | int bytes; 143 | { 144 | char * p = malloc(bytes); 145 | 146 | if (p == NULL) error(ERROR_MEMORY); 147 | return p; 148 | } 149 | 150 | 151 | main(argc, argv) 152 | char *argv[]; 153 | { 154 | int opt; 155 | 156 | while ((opt = getopt(argc, argv, "gO")) != -1) 157 | { 158 | switch (opt) 159 | { 160 | case 'O': 161 | ++O_flag; 162 | break; 163 | case 'g': 164 | ++g_flag; 165 | break; 166 | default: 167 | exit(1); 168 | } 169 | } 170 | 171 | argc -= optind; 172 | argv = &argv[optind]; 173 | 174 | if (argc != 2) error(ERROR_CMDLINE); 175 | 176 | output_name = stringize(argv[1], strlen(argv[1])); 177 | input_name = output_name; /* trick error() for a sec */ 178 | output_file = fopen(argv[1], "w"); 179 | if (!output_file) error(ERROR_OUTPUT); 180 | 181 | input_name = stringize(argv[0], strlen(argv[0])); 182 | yyin = fopen(argv[0], "r"); 183 | if (!yyin) error(ERROR_INPUT); 184 | 185 | yyinit(); 186 | translation_unit(); 187 | literals(); 188 | externs(); 189 | 190 | fclose(output_file); 191 | exit(0); 192 | } 193 | -------------------------------------------------------------------------------- /ncc1/ncc1.h: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #ifndef __STDC__ 26 | typedef long float double; 27 | #define strtod strtolf 28 | #else 29 | extern void output(char * fmt, ...); 30 | #endif 31 | 32 | /* let's start off with the basics. */ 33 | 34 | #define BITS 8 35 | 36 | /* some useful generic macros. */ 37 | 38 | #define MIN(a,b) (((a) < (b)) ? (a) : (b)) 39 | #define MAX(a,b) (((a) > (b)) ? (a) : (b)) 40 | #define ROUND_UP(a,b) (((a) % (b)) ? ((a) + ((b) - ((a) % (b)))) : (a)) 41 | #define ROUND_DOWN(a,b) ((a) - ((a) % (b))) 42 | 43 | /* same basic data about the activation records. probably 44 | shouldn't be changed, at least not without serious consideration. */ 45 | 46 | #define FRAME_ARGUMENTS 16 /* start of arguments in frame */ 47 | #define FRAME_ALIGN 8 /* always 8-byte aligned */ 48 | 49 | /* number of buckets in the hash tables. a power of two is preferable. 50 | more buckets can improve performance, but with NR_SYMBOL_BUCKETS in 51 | particular, larger numbers can have a negative impact, as every bucket 52 | must be scanned when exiting a scope. */ 53 | 54 | #define NR_STRING_BUCKETS 64 55 | #define NR_SYMBOL_BUCKETS 32 56 | 57 | /* limits the level of block nesting. the number is arbitrary, but 58 | it must be at least "a few less" than INT_MAX at most. */ 59 | 60 | #define SCOPE_MAX 1000 61 | 62 | /* 63 | * MAX_SIZE limits the number of bytes specified by a type. 256MB - 1 64 | * is currently the largest safe value, due to the use of 'int' to 65 | * store type sizes and the expansion from byte counts to bit counts 66 | * in certain places (notably the struct type code). increasing this 67 | * maximum doesn't seem to be worth the penalty in speed and complexity. 68 | */ 69 | 70 | #define MAX_SIZE ((256 * 1024 * 1024) - 1) 71 | 72 | #include 73 | #include 74 | #include 75 | #include "token.h" 76 | #include "tree.h" 77 | #include "symbol.h" 78 | #include "type.h" 79 | #include "reg.h" 80 | #include "block.h" 81 | 82 | extern int g_flag; 83 | extern int O_flag; 84 | extern FILE * yyin; 85 | extern struct token token; 86 | extern int line_number; 87 | extern struct string * input_name; 88 | extern struct string * output_name; 89 | extern FILE * output_file; 90 | extern int current_scope; 91 | extern int next_asm_label; 92 | extern int next_iregister; 93 | extern int next_fregister; 94 | extern int loop_level; 95 | extern struct symbol * current_function; 96 | extern int frame_offset; 97 | extern int save_iregs; 98 | extern int save_fregs; 99 | extern struct block * entry_block; 100 | extern struct block * current_block; 101 | extern struct block * exit_block; 102 | extern struct block * first_block; 103 | extern struct block * last_block; 104 | 105 | extern char * allocate(); 106 | extern struct string * stringize(); 107 | extern struct symbol * string_symbol(); 108 | extern struct symbol * new_symbol(); 109 | extern struct symbol * find_symbol(); 110 | extern struct symbol * find_symbol_by_reg(); 111 | extern struct symbol * find_typedef(); 112 | extern struct symbol * find_symbol_list(); 113 | extern struct symbol * temporary_symbol(); 114 | extern struct symbol * find_label(); 115 | extern struct type * new_type(); 116 | extern struct type * copy_type(); 117 | extern struct type * splice_types(); 118 | extern struct type * argument_type(); 119 | extern struct tree * new_tree(); 120 | extern struct tree * copy_tree(); 121 | extern struct tree * expression(); 122 | extern struct tree * assignment_expression(); 123 | extern struct tree * conditional_expression(); 124 | extern struct tree * scalar_expression(); 125 | extern struct tree * null_pointer(); 126 | extern struct tree * reg_tree(); 127 | extern struct tree * stack_tree(); 128 | extern struct tree * memory_tree(); 129 | extern struct tree * symbol_tree(); 130 | extern struct tree * int_tree(); 131 | extern struct tree * float_tree(); 132 | extern struct tree * addr_tree(); 133 | extern struct type * abstract_type(); 134 | extern struct block * new_block(); 135 | extern struct block * block_successor(); 136 | extern struct block * block_predecessor(); 137 | extern struct insn * new_insn(); 138 | extern struct tree * generate(); 139 | extern struct tree * float_literal(); 140 | extern struct defuse * find_defuse(); 141 | extern struct defuse * find_defuse_by_symbol(); 142 | 143 | /* goals for generate() */ 144 | 145 | #define GOAL_EFFECT 0 146 | #define GOAL_CC 1 147 | #define GOAL_VALUE 2 148 | 149 | /* modes for find_defuse() */ 150 | 151 | #define FIND_DEFUSE_NORMAL 0 152 | #define FIND_DEFUSE_CREATE 1 153 | 154 | /* flags for generate() */ 155 | 156 | #define GENERATE_LVALUE 0x00000001 157 | 158 | /* flags for declarations() */ 159 | 160 | #define DECLARATIONS_ARGS 0x00000001 161 | #define DECLARATIONS_INTS 0x00000002 162 | #define DECLARATIONS_FIELDS 0x00000004 163 | 164 | /* flags for check_types() */ 165 | 166 | #define CHECK_TYPES_COMPOSE 0x00000001 167 | 168 | /* output segments */ 169 | 170 | #define SEGMENT_TEXT 0 /* code */ 171 | #define SEGMENT_DATA 1 /* initialized data */ 172 | 173 | /* these codes must match the indices of errors[] in cc1.c */ 174 | 175 | #define ERROR_CMDLINE 0 /* bad command line */ 176 | #define ERROR_INPUT 1 /* input file error */ 177 | #define ERROR_SYNTAX 2 /* syntax error */ 178 | #define ERROR_MEMORY 3 /* out of memory */ 179 | #define ERROR_DIRECTIVE 4 /* bad # directive */ 180 | #define ERROR_LEXICAL 5 /* invalid character in input */ 181 | #define ERROR_BADICON 6 /* malformed integral constant */ 182 | #define ERROR_IRANGE 7 /* integral constant out of range */ 183 | #define ERROR_BADFCON 8 /* malformed floating constant */ 184 | #define ERROR_FRANGE 9 /* floating constant out of range */ 185 | #define ERROR_ESCAPE 10 /* invalid octal escape sequence */ 186 | #define ERROR_UNTERM 11 /* unterminated string literal */ 187 | #define ERROR_BADCCON 12 /* invalid character constant */ 188 | #define ERROR_CRANGE 13 /* multi-character constant out of range */ 189 | #define ERROR_OUTPUT 14 /* output file error */ 190 | #define ERROR_NESTING 15 /* nesting level too deep */ 191 | #define ERROR_TYPESIZE 16 /* type too big */ 192 | #define ERROR_INCOMPLT 17 /* incomplete type */ 193 | #define ERROR_ILLFUNC 18 /* illegal use of function type */ 194 | #define ERROR_ILLARRAY 19 /* illegal array specification */ 195 | #define ERROR_RETURN 20 /* illegal function return type */ 196 | #define ERROR_STRUCT 21 /* illegal use of struct/union type */ 197 | #define ERROR_NOARGS 22 /* misplaced formal arguments */ 198 | #define ERROR_ILLFIELD 23 /* illegal use of bit field */ 199 | #define ERROR_FIELDTY 24 /* illegal bit field type */ 200 | #define ERROR_FIELDSZ 25 /* invalid bit field size */ 201 | #define ERROR_SCLASS 26 /* storage class not permitted */ 202 | #define ERROR_TAGREDEF 27 /* struct/union already defined */ 203 | #define ERROR_EMPTY 28 /* empty struct or union */ 204 | #define ERROR_REMEMBER 29 /* duplicate member declaration */ 205 | #define ERROR_REDECL 30 /* illegal redeclaration */ 206 | #define ERROR_NOTARG 31 /* unknown argument identifier */ 207 | #define ERROR_NOFUNC 32 /* functions must be globals */ 208 | #define ERROR_TYPEDEF 33 /* can't do that with typedef */ 209 | #define ERROR_UNKNOWN 34 /* unknown identifier */ 210 | #define ERROR_OPERANDS 35 /* illegal operands */ 211 | #define ERROR_INCOMPAT 36 /* incompatible operands */ 212 | #define ERROR_LVALUE 37 /* not an lvalue */ 213 | #define ERROR_ABSTRACT 38 /* abstract declarator required */ 214 | #define ERROR_MISSING 39 /* declarator missing identifier */ 215 | #define ERROR_BADCAST 40 /* bad typecast */ 216 | #define ERROR_INDIR 41 /* illegal indirection */ 217 | #define ERROR_NOTSTRUCT 42 /* left side must be struct */ 218 | #define ERROR_NOTMEMBER 43 /* not a member of that struct/union */ 219 | #define ERROR_NEEDFUNC 44 /* function type required */ 220 | #define ERROR_INTERNAL 45 /* compiler internal error */ 221 | #define ERROR_REGISTER 46 /* can't take address of register */ 222 | #define ERROR_CONEXPR 47 /* constant expression required */ 223 | #define ERROR_DIV0 48 /* division by zero */ 224 | #define ERROR_BADEXPR 49 /* illegal constant expression */ 225 | #define ERROR_SEGMENT 50 /* wrong segment */ 226 | #define ERROR_BADINIT 51 /* bad initializer */ 227 | #define ERROR_DUPDEF 52 /* duplicate definition */ 228 | #define ERROR_MISPLACED 53 /* misplaced break, continue or case */ 229 | #define ERROR_DANGLING 54 /* undefined label */ 230 | #define ERROR_DUPCASE 55 /* duplicate case label */ 231 | #define ERROR_CASE 56 /* switch/case must be integral */ 232 | -------------------------------------------------------------------------------- /ncc1/output.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #include "ncc1.h" 26 | #include 27 | 28 | /* write a register name to the output file. the T_* type bits 29 | are used solely as a hint to the proper size for integer registers. */ 30 | 31 | static 32 | output_reg(reg, ts) 33 | { 34 | static char *iregs[NR_REGS][4] = { 35 | { "al", "ax", "eax", "rax" }, 36 | { "dl", "dx", "edx", "rdx" }, 37 | { "cl", "cx", "ecx", "rcx" }, 38 | { "bl", "bx", "ebx", "rbx" }, 39 | { "sil", "si", "esi", "rsi" }, 40 | { "dil", "di", "edi", "rdi" }, 41 | { "bpl", "bp", "ebp", "rbp" }, 42 | { "spl", "sp", "esp", "rsp" }, 43 | { "r8b", "r8w", "r8d", "r8" }, 44 | { "r9b", "r9w", "r9d", "r9" }, 45 | { "r10b", "r10w", "r10d", "r10" }, 46 | { "r11b", "r11w", "r11d", "r11" }, 47 | { "r12b", "r12w", "r12d", "r12" }, 48 | { "r13b", "r13w", "r13d", "r13" }, 49 | { "r14b", "r14w", "r14d", "r14" }, 50 | { "r15b", "r15w", "r15d", "r15" } 51 | }; 52 | 53 | int i; 54 | 55 | if (R_IS_PSEUDO(reg)) { 56 | fprintf(output_file, "%c#%d", (reg & R_IS_FLOAT) ? 'f' : 'i', R_IDX(reg)); 57 | if (reg & R_IS_INTEGRAL) { 58 | if (ts & T_IS_BYTE) fputc('b', output_file); 59 | if (ts & T_IS_WORD) fputc('w', output_file); 60 | if (ts & T_IS_DWORD) fputc('d', output_file); 61 | if (ts & T_IS_QWORD) fputc('q', output_file); 62 | } 63 | } else { 64 | if (reg & R_IS_FLOAT) 65 | fprintf(output_file, "xmm%d", R_IDX(reg)); 66 | else { 67 | if (ts & T_IS_BYTE) i = 0; 68 | if (ts & T_IS_WORD) i = 1; 69 | if (ts & T_IS_DWORD) i = 2; 70 | if (ts & T_IS_QWORD) i = 3; 71 | fprintf(output_file, "%s", iregs[R_IDX(reg)][i]); 72 | } 73 | } 74 | } 75 | 76 | /* outputting operands is an easy, if messy, business. */ 77 | 78 | static 79 | output_operand(tree) 80 | struct tree * tree; 81 | { 82 | double lf; 83 | float f; 84 | int indexed; 85 | 86 | switch (tree->op) { 87 | case E_CON: 88 | if (tree->type->ts & T_FLOAT) { 89 | f = tree->u.con.f; 90 | fprintf(output_file, "0x%x", *((unsigned *) &f)); 91 | } else if (tree->type->ts & T_LFLOAT) { 92 | lf = tree->u.con.f; 93 | fprintf(output_file, "0x%lx", *((unsigned long *) &lf)); 94 | } else 95 | fprintf(output_file, "%ld", tree->u.con.i); 96 | 97 | break; 98 | case E_REG: 99 | output_reg(tree->u.reg, tree->type->ts); 100 | break; 101 | case E_IMM: 102 | case E_MEM: 103 | if (tree->op == E_MEM) { 104 | if (tree->type->ts & T_IS_BYTE) fprintf(output_file, "byte "); 105 | if (tree->type->ts & T_IS_WORD) fprintf(output_file, "word "); 106 | if (tree->type->ts & T_IS_DWORD) fprintf(output_file, "dword "); 107 | if (tree->type->ts & T_IS_QWORD) fprintf(output_file, "qword "); 108 | fputc('[', output_file); 109 | } 110 | 111 | if (tree->u.mi.rip) { 112 | fprintf(output_file, "rip "); 113 | output("%G", tree->u.mi.glob); 114 | if (tree->u.mi.ofs) { 115 | if (tree->u.mi.ofs > 0) fputc('+', output_file); 116 | fprintf(output_file, "%ld", tree->u.mi.ofs); 117 | } 118 | } else { 119 | indexed = 0; 120 | 121 | if (tree->u.mi.b != R_NONE) { 122 | output_reg(tree->u.mi.b, T_PTR); 123 | indexed++; 124 | } 125 | 126 | if (tree->u.mi.i != R_NONE) { 127 | fputc(',', output_file); 128 | output_reg(tree->u.mi.i, T_PTR); 129 | if (tree->u.mi.s > 1) fprintf(output_file, "*%d", tree->u.mi.s); 130 | indexed++; 131 | } 132 | 133 | if (tree->u.mi.glob) { 134 | if (indexed) fputc(',', output_file); 135 | output("%G", tree->u.mi.glob); 136 | } 137 | 138 | if (tree->u.mi.ofs) { 139 | if (tree->u.mi.glob && (tree->u.mi.ofs > 0)) 140 | fputc('+', output_file); 141 | else if (indexed) 142 | fputc(',', output_file); 143 | 144 | fprintf(output_file, "%ld", tree->u.mi.ofs); 145 | } 146 | } 147 | 148 | if (tree->op == E_MEM) fputc(']', output_file); 149 | break; 150 | 151 | default: error(ERROR_INTERNAL); 152 | } 153 | } 154 | 155 | /* write to the output file. the recognized specifiers are: 156 | 157 | %% (no argument) the literal '%' 158 | %L (int) an asm label 159 | %R (int, int) a register (with type bits) 160 | %G (struct symbol *) the assembler name of a global 161 | %O (struct tree *) an operand expression 162 | 163 | %s, %d, %x like printf() 164 | %X like %x, but for long 165 | 166 | any unrecognized specifiers will bomb */ 167 | 168 | #ifdef __STDC__ 169 | void 170 | output(char * fmt, ...) 171 | #else 172 | output(fmt) 173 | char * fmt; 174 | #endif 175 | { 176 | va_list args; 177 | struct symbol * symbol; 178 | int reg; 179 | int ts; 180 | 181 | va_start(args, fmt); 182 | 183 | while (*fmt) { 184 | if (*fmt == '%') { 185 | fmt++; 186 | switch (*fmt) { 187 | case '%': 188 | fputc('%', output_file); 189 | break; 190 | case 'd': 191 | fprintf(output_file, "%d", va_arg(args, int)); 192 | break; 193 | case 'L': 194 | fprintf(output_file, "L%d", va_arg(args, int)); 195 | break; 196 | case 's': 197 | fprintf(output_file, "%s", va_arg(args, char *)); 198 | break; 199 | case 'x': 200 | fprintf(output_file, "%x", va_arg(args, int)); 201 | break; 202 | case 'X': 203 | fprintf(output_file, "%lx", va_arg(args, long)); 204 | break; 205 | case 'G': 206 | symbol = va_arg(args, struct symbol *); 207 | 208 | if ((symbol->ss & (S_EXTERN | S_STATIC)) && (symbol->scope == SCOPE_GLOBAL) && symbol->id) 209 | fprintf(output_file, "_%s", symbol->id->data); 210 | else if (symbol->ss & (S_STATIC | S_EXTERN)) 211 | fprintf(output_file, "L%d", symbol->i); 212 | else 213 | error(ERROR_INTERNAL); 214 | 215 | break; 216 | case 'O': 217 | output_operand(va_arg(args, struct tree *)); 218 | break; 219 | case 'R': 220 | reg = va_arg(args, int); 221 | ts = va_arg(args, int); 222 | output_reg(reg, ts); 223 | break; 224 | default: 225 | error(ERROR_INTERNAL); 226 | } 227 | } else 228 | fputc(*fmt, output_file); 229 | 230 | fmt++; 231 | } 232 | 233 | va_end(args); 234 | } 235 | 236 | /* emit assembler directive to select the appropriate 237 | SEGMENT_*, if not already selected */ 238 | 239 | segment(new) 240 | { 241 | static int current = -1; 242 | 243 | if (new != current) { 244 | output("%s\n", (new == SEGMENT_TEXT) ? ".text" : ".data"); 245 | current = new; 246 | } 247 | } 248 | 249 | /* output 'length' bytes of 'string' to the assembler output. 250 | the caller is assumed to have selected the appropriate segment 251 | and emitted a label, if necessary. if 'length' exceeds the 252 | length of the string, the output is padded with zeroes. */ 253 | 254 | output_string(string, length) 255 | struct string * string; 256 | { 257 | int i = 0; 258 | 259 | while (i < length) { 260 | if (i % 16) 261 | output(","); 262 | else { 263 | if (i) output("\n"); 264 | output(" .byte "); 265 | } 266 | output("%d", (i >= string->length) ? 0 : string->data[i]); 267 | i++; 268 | } 269 | output("\n"); 270 | } 271 | 272 | /* conditional jump mnemonics. these must match the CC_* values in block.h. */ 273 | 274 | static char * jmps[] = { 275 | "jz", "jnz", "jg", "jle", "jge", "jl", "ja", 276 | "jbe", "jae", "jb", "jmp", "NEVER" 277 | }; 278 | 279 | /* instruction mnemonics. keyed to I_IDX() from insn.h */ 280 | 281 | static char *insns[] = 282 | { 283 | /* 0 */ "nop", "mov", "movsx", "movzx", "movss", 284 | /* 5 */ "movsd", "lea", "cmp", "ucomiss", "ucomisd", 285 | /* 10 */ "pxor", "cvtss2si", "cvtsd2si", "cvtsi2ss", "cvtsi2sd", 286 | /* 15 */ "cvtss2sd", "cvtsd2ss", "shl", "shr", "sar", 287 | /* 20 */ "add", "addss", "addsd", "sub", "subss", 288 | /* 25 */ "subsd", "imul", "mulss", "mulsd", "or", 289 | /* 30 */ "xor", "and", "cdq", "cqo", "div", 290 | /* 35 */ "idiv", "divss", "divsd", "cbw", "cwd", 291 | /* 40 */ "setz", "setnz", "setg", "setle", "setge", 292 | /* 45 */ "setl", "seta", "setbe", "setae", "setb", 293 | /* 50 */ "not", "neg", "push", "pop", "call", 294 | /* 55 */ "test", "ret", "inc", "dec" 295 | }; 296 | 297 | /* output a block. the main task of this function is to output the 298 | instructions -- a simple task. the debugging data is most of the work! */ 299 | 300 | static 301 | output_block1(block, du) 302 | struct block * block; 303 | { 304 | struct defuse * defuse; 305 | 306 | output("\n; "); 307 | 308 | switch (du) 309 | { 310 | case DU_USE: output("USE: "); break; 311 | case DU_DEF: output("DEF: "); break; 312 | case DU_IN: output(" IN: "); break; 313 | case DU_OUT: output("OUT: "); break; 314 | } 315 | 316 | for (defuse = block->defuses; defuse; defuse = defuse->link) { 317 | if (defuse->dus & du) { 318 | output("%R", defuse->symbol->reg, defuse->symbol->type->ts); 319 | if (defuse->reg != R_NONE) output("=%R", defuse->reg, defuse->symbol->type->ts); 320 | if (defuse->symbol->id) output("(%s)", defuse->symbol->id->data); 321 | if ((du == DU_OUT) && DU_TRANSIT(*defuse)) output("[dist %d]", defuse->distance); 322 | output(" "); 323 | } 324 | } 325 | } 326 | 327 | output_block(block) 328 | struct block * block; 329 | { 330 | struct insn * insn; 331 | int i; 332 | struct block * cessor; 333 | int n; 334 | 335 | if (g_flag) { 336 | output("\n; block %d", block->asm_label); 337 | if (block == entry_block) output(" ENTRY"); 338 | if (block == exit_block) output(" EXIT"); 339 | 340 | if (block->bs & B_RECON) 341 | output(" RECON"); 342 | else { 343 | output_block1(block, DU_IN); 344 | output_block1(block, DU_USE); 345 | output_block1(block, DU_DEF); 346 | output_block1(block, DU_OUT); 347 | } 348 | 349 | output("\n; %d predecessors:", block->nr_predecessors); 350 | for (n = 0; cessor = block_predecessor(block, n); ++n) 351 | output(" %d", cessor->asm_label); 352 | 353 | output("\n; %d successors:", block->nr_successors); 354 | 355 | for (n = 0; cessor = block_successor(block, n); ++n) 356 | output(" %s=%d", 357 | jmps[block_successor_cc(block, n)], 358 | cessor->asm_label); 359 | } 360 | 361 | output("\n%L:\n", block->asm_label); 362 | 363 | for (insn = block->first_insn; insn; insn = insn->next) { 364 | output(" %s ", insns[I_IDX(insn->opcode)]); 365 | for (i = 0; i < I_NR_OPERANDS(insn->opcode); i++) { 366 | if (i) output(","); 367 | output("%O", insn->operand[i]); 368 | } 369 | if ((insn->flags & INSN_FLAG_CC) && g_flag) output(" ; FLAG_CC"); 370 | output("\n"); 371 | } 372 | } 373 | 374 | /* called after the code generator is complete, to output all the function blocks. 375 | the main task of this function is to glue the successive blocks together with 376 | appropriate jump instructions, which is surprisingly tedious. */ 377 | 378 | output_function() 379 | { 380 | struct block * block; 381 | struct block * successor1; 382 | struct block * successor2; 383 | int cc1; 384 | 385 | block = first_block; 386 | segment(SEGMENT_TEXT); 387 | if (current_function->ss & S_EXTERN) output(".global %G\n", current_function); 388 | output("%G:\n", current_function); 389 | 390 | for (block = first_block; block; block = block->next) { 391 | output_block(block); 392 | 393 | successor1 = block_successor(block, 0); 394 | if (successor1) cc1 = block_successor_cc(block, 0); 395 | 396 | /* there's no glue if there aren't any successors (exit block) */ 397 | 398 | if (!successor1) continue; 399 | 400 | /* if there's only one successor, it should be unconditional, 401 | so emit a jump unless the target is being output next. */ 402 | 403 | if (block->nr_successors == 1) { 404 | if (cc1 != CC_ALWAYS) error(ERROR_INTERNAL); 405 | if (block->next != successor1) output(" jmp %L\n", successor1->asm_label); 406 | continue; 407 | } 408 | 409 | /* the only remaining case is that of two successors, which must have opposite 410 | condition codes. we make an extra effort here to emit unconditional branches, 411 | rather than conditional ones, to minimize the impact on the branch predictor. */ 412 | 413 | successor2 = block_successor(block, 1); 414 | if (block_successor_cc(block, 1) != CC_INVERT(cc1)) error(ERROR_INTERNAL); 415 | 416 | if (successor1 == block->next) 417 | output(" %s %L\n", jmps[CC_INVERT(cc1)], successor2->asm_label); 418 | else { 419 | output(" %s %L\n", jmps[cc1], successor1->asm_label); 420 | if (successor2 != block->next) output(" jmp %L\n", successor2->asm_label); 421 | } 422 | } 423 | } 424 | -------------------------------------------------------------------------------- /ncc1/reg.h: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | /* registers are divided into two classes, integral and float. 26 | in those classes each register has an index, starting from 0. 27 | the first 16 registers in each class are the real registers, 28 | and the remainder are pseudo registers. */ 29 | 30 | #define NR_REGS 16 /* in each class */ 31 | 32 | #define R_IDX(x) ((x) & 0x0FFFFFFF) 33 | #define R_IS_PSEUDO(x) (R_IDX(x) >= NR_REGS) 34 | 35 | #define R_NONE 0 36 | #define R_IS_INTEGRAL 0x10000000 37 | #define R_IS_FLOAT 0x20000000 38 | 39 | #define R_AX (R_IS_INTEGRAL | 0) 40 | #define R_DX (R_IS_INTEGRAL | 1) 41 | #define R_CX (R_IS_INTEGRAL | 2) 42 | #define R_BX (R_IS_INTEGRAL | 3) 43 | #define R_SI (R_IS_INTEGRAL | 4) 44 | #define R_DI (R_IS_INTEGRAL | 5) 45 | #define R_BP (R_IS_INTEGRAL | 6) 46 | #define R_SP (R_IS_INTEGRAL | 7) 47 | #define R_8 (R_IS_INTEGRAL | 8) 48 | #define R_9 (R_IS_INTEGRAL | 9) 49 | #define R_10 (R_IS_INTEGRAL | 10) 50 | #define R_11 (R_IS_INTEGRAL | 11) 51 | #define R_12 (R_IS_INTEGRAL | 12) 52 | #define R_13 (R_IS_INTEGRAL | 13) 53 | #define R_14 (R_IS_INTEGRAL | 14) 54 | #define R_15 (R_IS_INTEGRAL | 15) 55 | #define R_IPSEUDO (R_IS_INTEGRAL | NR_REGS) 56 | 57 | #define R_XMM0 (R_IS_FLOAT | 0) 58 | #define R_XMM1 (R_IS_FLOAT | 1) 59 | #define R_XMM2 (R_IS_FLOAT | 2) 60 | #define R_XMM3 (R_IS_FLOAT | 3) 61 | #define R_XMM4 (R_IS_FLOAT | 4) 62 | #define R_XMM5 (R_IS_FLOAT | 5) 63 | #define R_XMM6 (R_IS_FLOAT | 6) 64 | #define R_XMM7 (R_IS_FLOAT | 7) 65 | #define R_XMM8 (R_IS_FLOAT | 8) 66 | #define R_XMM9 (R_IS_FLOAT | 9) 67 | #define R_XMM10 (R_IS_FLOAT | 10) 68 | #define R_XMM11 (R_IS_FLOAT | 11) 69 | #define R_XMM12 (R_IS_FLOAT | 12) 70 | #define R_XMM13 (R_IS_FLOAT | 13) 71 | #define R_XMM14 (R_IS_FLOAT | 14) 72 | #define R_XMM15 (R_IS_FLOAT | 15) 73 | #define R_FPSEUDO (R_IS_FLOAT | NR_REGS) 74 | -------------------------------------------------------------------------------- /ncc1/stmt.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #include 26 | #include "ncc1.h" 27 | 28 | /* switch cases are kept in a linked list */ 29 | 30 | struct switchcase 31 | { 32 | struct tree * value; 33 | struct block * target; 34 | struct switchcase * next; 35 | }; 36 | 37 | /* these variables are associated with the closest-enclosing 38 | block of appropriate type. the parser routines deal with 39 | saving/restoring their contents in their activation records. */ 40 | 41 | static struct switchcase * switchcases; 42 | static struct type * switch_type; 43 | static struct block * break_block; 44 | static struct block * continue_block; 45 | static struct block * default_block; 46 | 47 | static statement(); 48 | 49 | static 50 | if_statement() 51 | { 52 | struct block * true_block; 53 | struct block * else_block; 54 | struct block * join_block; 55 | struct tree * test; 56 | int cc; 57 | 58 | true_block = new_block(); 59 | else_block = new_block(); 60 | join_block = new_block(); 61 | 62 | lex(); 63 | match(KK_LPAREN); 64 | test = expression(); 65 | test = scalar_expression(test); 66 | match(KK_RPAREN); 67 | generate(test, GOAL_CC, &cc); 68 | succeed_block(current_block, cc, true_block); 69 | succeed_block(current_block, CC_INVERT(cc), else_block); 70 | 71 | current_block = true_block; 72 | statement(); 73 | true_block = current_block; 74 | succeed_block(true_block, CC_ALWAYS, join_block); 75 | 76 | if (token.kk == KK_ELSE) { 77 | lex(); 78 | current_block = else_block; 79 | statement(); 80 | else_block = current_block; 81 | } 82 | 83 | succeed_block(else_block, CC_ALWAYS, join_block); 84 | current_block = join_block; 85 | } 86 | 87 | static 88 | while_statement() 89 | { 90 | struct block * test_block; 91 | struct block * body_block; 92 | struct block * saved_break_block; 93 | struct block * saved_continue_block; 94 | struct tree * test; 95 | int cc; 96 | 97 | saved_continue_block = continue_block; 98 | saved_break_block = break_block; 99 | 100 | test_block = new_block(); 101 | body_block = new_block(); 102 | break_block = new_block(); 103 | continue_block = test_block; 104 | succeed_block(current_block, CC_ALWAYS, test_block); 105 | current_block = test_block; 106 | 107 | lex(); 108 | match(KK_LPAREN); 109 | test = expression(); 110 | test = scalar_expression(test); 111 | match(KK_RPAREN); 112 | generate(test, GOAL_CC, &cc); 113 | succeed_block(current_block, cc, body_block); 114 | succeed_block(current_block, CC_INVERT(cc), break_block); 115 | 116 | current_block = body_block; 117 | statement(); 118 | body_block = current_block; 119 | succeed_block(body_block, CC_ALWAYS, test_block); 120 | 121 | current_block = break_block; 122 | continue_block = saved_continue_block; 123 | break_block = saved_break_block; 124 | } 125 | 126 | static 127 | do_statement() 128 | { 129 | struct block * saved_continue_block; 130 | struct block * saved_break_block; 131 | struct block * body_block; 132 | struct tree * test; 133 | int cc; 134 | 135 | saved_continue_block = continue_block; 136 | saved_break_block = break_block; 137 | continue_block = new_block(); 138 | break_block = new_block(); 139 | body_block = new_block(); 140 | succeed_block(current_block, CC_ALWAYS, body_block); 141 | current_block = body_block; 142 | 143 | lex(); 144 | statement(); 145 | match(KK_WHILE); 146 | match(KK_LPAREN); 147 | test = expression(); 148 | test = scalar_expression(test); 149 | match(KK_RPAREN); 150 | match(KK_SEMI); 151 | succeed_block(current_block, CC_ALWAYS, continue_block); 152 | current_block = continue_block; 153 | generate(test, GOAL_CC, &cc); 154 | succeed_block(current_block, cc, body_block); 155 | succeed_block(current_block, CC_INVERT(cc), break_block); 156 | 157 | current_block = break_block; 158 | continue_block = saved_continue_block; 159 | break_block = saved_break_block; 160 | } 161 | 162 | static 163 | return_statement() 164 | { 165 | struct type * return_type; 166 | struct tree * tree; 167 | 168 | match(KK_RETURN); 169 | return_type = current_function->type->next; 170 | 171 | if (token.kk != KK_SEMI) { 172 | if (return_type->ts & T_IS_FLOAT) 173 | tree = reg_tree(R_XMM0, copy_type(return_type)); 174 | else 175 | tree = reg_tree(R_AX, copy_type(return_type)); 176 | 177 | tree = assignment_expression(tree); 178 | generate(tree, GOAL_EFFECT, NULL); 179 | } 180 | 181 | succeed_block(current_block, CC_ALWAYS, exit_block); 182 | current_block = new_block(); 183 | match(KK_SEMI); 184 | } 185 | 186 | static 187 | for_statement() 188 | { 189 | struct block * saved_continue_block; 190 | struct block * saved_break_block; 191 | struct block * test_block; 192 | struct block * body_block; 193 | struct tree * initial = NULL; 194 | struct tree * test = NULL; 195 | struct tree * step = NULL; 196 | int cc; 197 | 198 | saved_continue_block = continue_block; 199 | saved_break_block = break_block; 200 | test_block = new_block(); 201 | body_block = new_block(); 202 | continue_block = new_block(); 203 | break_block = new_block(); 204 | 205 | lex(); 206 | match(KK_LPAREN); 207 | if (token.kk != KK_SEMI) initial = expression(); 208 | match(KK_SEMI); 209 | if (token.kk != KK_SEMI) test = expression(); 210 | match(KK_SEMI); 211 | if (token.kk != KK_RPAREN) step = expression(); 212 | match(KK_RPAREN); 213 | 214 | if (initial) generate(initial, GOAL_EFFECT, NULL); 215 | succeed_block(current_block, CC_ALWAYS, test_block); 216 | current_block = test_block; 217 | 218 | if (test) { 219 | generate(test, GOAL_CC, &cc); 220 | succeed_block(current_block, cc, body_block); 221 | succeed_block(current_block, CC_INVERT(cc), break_block); 222 | } else 223 | succeed_block(current_block, CC_ALWAYS, body_block); 224 | 225 | current_block = body_block; 226 | statement(); 227 | succeed_block(current_block, CC_ALWAYS, continue_block); 228 | 229 | current_block = continue_block; 230 | if (step) generate(step, GOAL_EFFECT, NULL); 231 | succeed_block(current_block, CC_ALWAYS, test_block); 232 | 233 | current_block = break_block; 234 | continue_block = saved_continue_block; 235 | break_block = saved_break_block; 236 | } 237 | 238 | compound() 239 | { 240 | enter_scope(); 241 | match(KK_LBRACE); 242 | local_declarations(); 243 | while (token.kk != KK_RBRACE) statement(); 244 | match(KK_RBRACE); 245 | exit_scope(); 246 | } 247 | 248 | static 249 | loop_control(block) 250 | struct block * block; 251 | { 252 | lex(); 253 | if (block == NULL) error(ERROR_MISPLACED); 254 | succeed_block(current_block, CC_ALWAYS, block); 255 | current_block = new_block(); 256 | match(KK_SEMI); 257 | } 258 | 259 | static 260 | label_statement() 261 | { 262 | struct symbol * label; 263 | 264 | label = find_label(token.u.text); 265 | if (label->ss & S_DEFINED) error(ERROR_DUPDEF); 266 | label->ss |= S_DEFINED; 267 | 268 | lex(); 269 | match(KK_COLON); 270 | 271 | succeed_block(current_block, CC_ALWAYS, label->target); 272 | current_block = label->target; 273 | 274 | statement(); 275 | } 276 | 277 | static 278 | goto_statement() 279 | { 280 | struct symbol * label; 281 | 282 | match(KK_GOTO); 283 | expect(KK_IDENT); 284 | label = find_label(token.u.text); 285 | match(KK_IDENT); 286 | match(KK_SEMI); 287 | 288 | succeed_block(current_block, CC_ALWAYS, label->target); 289 | current_block = new_block(); 290 | } 291 | 292 | static 293 | switch_statement() 294 | { 295 | struct switchcase * saved_switchcases; 296 | struct block * saved_default_block; 297 | struct block * saved_break_block; 298 | struct type * saved_switch_type; 299 | struct tree * tree; 300 | struct tree * reg_ax; 301 | struct block * control_block; 302 | 303 | saved_switchcases = switchcases; 304 | saved_default_block = default_block; 305 | saved_break_block = break_block; 306 | saved_switch_type = switch_type; 307 | switchcases = NULL; 308 | default_block = NULL; 309 | break_block = new_block(); 310 | 311 | lex(); 312 | match(KK_LPAREN); 313 | tree = expression(); 314 | tree = generate(tree, GOAL_VALUE, NULL); 315 | if (!(tree->type->ts & T_IS_INTEGRAL)) error(ERROR_CASE); 316 | switch_type = copy_type(tree->type); 317 | reg_ax = reg_tree(R_AX, copy_type(switch_type)); 318 | choose(E_ASSIGN, copy_tree(reg_ax), tree); 319 | match(KK_RPAREN); 320 | control_block = current_block; 321 | 322 | current_block = new_block(); 323 | statement(); 324 | succeed_block(current_block, CC_ALWAYS, break_block); 325 | if (default_block == NULL) default_block = break_block; 326 | 327 | current_block = control_block; 328 | 329 | while (switchcases) { 330 | struct switchcase * switchcase; 331 | struct block * tmp; 332 | 333 | switchcase = switchcases; 334 | switchcases = switchcases->next; 335 | 336 | put_insn(current_block, new_insn(I_CMP, copy_tree(reg_ax), switchcase->value), NULL); 337 | succeed_block(current_block, CC_Z, switchcase->target); 338 | succeed_block(current_block, CC_NZ, tmp = new_block()); 339 | current_block = tmp; 340 | 341 | free(switchcase); 342 | } 343 | 344 | succeed_block(current_block, CC_ALWAYS, default_block); 345 | 346 | free_tree(reg_ax); 347 | current_block = break_block; 348 | switchcases = saved_switchcases; 349 | default_block = saved_default_block; 350 | break_block = saved_break_block; 351 | switch_type = saved_switch_type; 352 | } 353 | 354 | static 355 | case_statement() 356 | { 357 | struct switchcase * switchcase; 358 | struct tree * value; 359 | 360 | if (switch_type == NULL) error(ERROR_MISPLACED); 361 | 362 | if (token.kk == KK_DEFAULT) { 363 | lex(); 364 | if (default_block) error(ERROR_DUPCASE); 365 | default_block = new_block(); 366 | succeed_block(current_block, CC_ALWAYS, default_block); 367 | current_block = default_block; 368 | } else { 369 | lex(); 370 | 371 | value = expression(); 372 | if (!(value->type->ts & T_IS_INTEGRAL)) error(ERROR_CASE); 373 | value = new_tree(E_CAST, copy_type(switch_type), value); 374 | value = generate(value, GOAL_VALUE, NULL); 375 | if (value->op != E_CON) error(ERROR_CONEXPR); 376 | 377 | for (switchcase = switchcases; switchcase; switchcase = switchcase->next) 378 | if (switchcase->value->u.con.i == value->u.con.i) error(ERROR_DUPCASE); 379 | 380 | switchcase = (struct switchcase *) allocate(sizeof(struct switchcase)); 381 | switchcase->value = value; 382 | switchcase->target = new_block(); 383 | switchcase->next = switchcases; 384 | switchcases = switchcase; 385 | 386 | succeed_block(current_block, CC_ALWAYS, switchcase->target); 387 | current_block = switchcase->target; 388 | } 389 | 390 | match(KK_COLON); 391 | statement(); 392 | } 393 | 394 | static 395 | statement() 396 | { 397 | struct tree * tree; 398 | 399 | switch (token.kk) { 400 | case KK_LBRACE: 401 | compound(); 402 | break; 403 | case KK_BREAK: 404 | loop_control(break_block); 405 | break; 406 | case KK_CONTINUE: 407 | loop_control(continue_block); 408 | break; 409 | case KK_CASE: 410 | case KK_DEFAULT: 411 | case_statement(); 412 | break; 413 | case KK_DO: 414 | do_statement(); 415 | break; 416 | case KK_IF: 417 | if_statement(); 418 | break; 419 | case KK_FOR: 420 | for_statement(); 421 | break; 422 | case KK_GOTO: 423 | goto_statement(); 424 | break; 425 | case KK_SWITCH: 426 | switch_statement(); 427 | break; 428 | case KK_WHILE: 429 | while_statement(); 430 | break; 431 | case KK_RETURN: 432 | return_statement(); 433 | break; 434 | case KK_IDENT: 435 | if (peek(NULL) == KK_COLON) { 436 | label_statement(); 437 | break; 438 | } 439 | /* fall through */ 440 | default: 441 | tree = expression(); 442 | tree = generate(tree, GOAL_EFFECT, NULL); 443 | case KK_SEMI: 444 | match(KK_SEMI); 445 | } 446 | } 447 | 448 | -------------------------------------------------------------------------------- /ncc1/symbol.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #include 26 | #include 27 | #include "ncc1.h" 28 | 29 | /* return the string table entry associated with a string 30 | containing 'length' bytes at 'data', creating one if necessary. */ 31 | 32 | static struct string * string_buckets[NR_STRING_BUCKETS]; 33 | 34 | struct string * 35 | stringize(data, length) 36 | char * data; 37 | int length; 38 | { 39 | struct string * string; 40 | struct string ** stringp; 41 | unsigned hash; 42 | int i; 43 | 44 | for (i = 0, hash = 0; i < length; i++) { 45 | hash <<= 4; 46 | hash ^= (data[i] & 0xff); 47 | } 48 | 49 | i = hash % NR_STRING_BUCKETS; 50 | for (stringp = &(string_buckets[i]); (string = *stringp); stringp = &((*stringp)->link)) { 51 | if (string->length != length) continue; 52 | if (string->hash != hash) continue; 53 | if (memcmp(string->data, data, length)) continue; 54 | 55 | *stringp = string->link; 56 | string->link = string_buckets[i]; 57 | string_buckets[i] = string; 58 | 59 | return string; 60 | } 61 | 62 | string = (struct string *) allocate(sizeof(struct string)); 63 | string->link = string_buckets[i]; 64 | string_buckets[i] = string; 65 | string->hash = hash; 66 | string->length = length; 67 | string->asm_label = 0; 68 | string->token = KK_IDENT; 69 | 70 | string->data = allocate(length + 1); 71 | memcpy(string->data, data, length); 72 | string->data[length] = 0; 73 | 74 | return string; 75 | } 76 | 77 | /* walk the string table and output all the pending string literals. */ 78 | 79 | literals() 80 | { 81 | struct string * string; 82 | int i; 83 | 84 | for (i = 0; i < NR_STRING_BUCKETS; i++) 85 | for (string = string_buckets[i]; string; string = string->link) 86 | if (string->asm_label) { 87 | segment(SEGMENT_TEXT); 88 | output("%L:\n", string->asm_label); 89 | output_string(string, string->length + 1); 90 | } 91 | } 92 | 93 | /* walk the symbol table and output directives for all undefined externs */ 94 | 95 | externs1(symbol) 96 | struct symbol * symbol; 97 | { 98 | if ((symbol->ss & S_EXTERN) && !(symbol->ss & S_DEFINED) && (symbol->ss & S_REFERENCED)) 99 | output(".global %G\n", symbol); 100 | } 101 | 102 | externs() 103 | { 104 | walk_symbols(SCOPE_GLOBAL, SCOPE_GLOBAL, externs1); 105 | } 106 | 107 | /* return the symbol table entry associated with a string literal. */ 108 | 109 | struct symbol * 110 | string_symbol(string) 111 | struct string * string; 112 | { 113 | struct symbol * symbol; 114 | struct type * type; 115 | 116 | type = splice_types(new_type(T_ARRAY), new_type(T_CHAR)); 117 | type->nr_elements = string->length + 1; 118 | if (string->asm_label == 0) string->asm_label = next_asm_label++; 119 | symbol = new_symbol(NULL, S_STATIC, type); 120 | symbol->i = string->asm_label; 121 | put_symbol(symbol, current_scope); 122 | return symbol; 123 | } 124 | 125 | /* the symbol table borrows the hash from the symbol identifiers 126 | to use for its own purposes. since anonymous symbols are possible, 127 | there's an extra bucket just for them - putting them in the main 128 | table would serve no purpose, since we never find them by name. */ 129 | 130 | static struct symbol * symbol_buckets[NR_SYMBOL_BUCKETS + 1]; 131 | 132 | #define EXTRA_BUCKET NR_SYMBOL_BUCKETS 133 | #define SYMBOL_BUCKET(id) ((id) ? ((id)->hash % NR_SYMBOL_BUCKETS) : EXTRA_BUCKET) 134 | 135 | /* allocate a new symbol. if 'type' is supplied, 136 | the caller yields ownership. */ 137 | 138 | struct symbol * 139 | new_symbol(id, ss, type) 140 | struct string * id; 141 | struct type * type; 142 | { 143 | struct symbol * symbol; 144 | 145 | symbol = (struct symbol *) allocate(sizeof(struct symbol)); 146 | 147 | symbol->id = id; 148 | symbol->ss = ss; 149 | symbol->type = type; 150 | symbol->scope = SCOPE_NONE; 151 | symbol->reg = R_NONE; 152 | symbol->align = 0; 153 | symbol->target = NULL; 154 | symbol->link = NULL; 155 | symbol->i = 0; 156 | symbol->list = NULL; 157 | 158 | return symbol; 159 | } 160 | 161 | /* put the symbol in the symbol table at the specified scope level. */ 162 | 163 | put_symbol(symbol, scope) 164 | struct symbol * symbol; 165 | { 166 | struct symbol ** bucketp; 167 | 168 | symbol->scope = scope; 169 | bucketp = &symbol_buckets[SYMBOL_BUCKET(symbol->id)]; 170 | 171 | while (*bucketp && ((*bucketp)->scope > symbol->scope)) 172 | bucketp = &((*bucketp)->link); 173 | 174 | symbol->link = *bucketp; 175 | *bucketp = symbol; 176 | } 177 | 178 | /* remove the symbol from the symbol table. */ 179 | 180 | get_symbol(symbol) 181 | struct symbol * symbol; 182 | { 183 | struct symbol ** bucketp; 184 | 185 | bucketp = &symbol_buckets[SYMBOL_BUCKET(symbol->id)]; 186 | while (*bucketp != symbol) bucketp = &((*bucketp)->link); 187 | *bucketp = symbol->link; 188 | } 189 | 190 | /* put the symbol on the end of the list. */ 191 | 192 | put_symbol_list(symbol, list) 193 | struct symbol * symbol; 194 | struct symbol ** list; 195 | { 196 | while (*list) list = &((*list)->list); 197 | *list = symbol; 198 | } 199 | 200 | /* find a symbol on the list. returns NULL if not found. */ 201 | 202 | struct symbol * 203 | find_symbol_list(id, list) 204 | struct string * id; 205 | struct symbol ** list; 206 | { 207 | struct symbol * symbol = *list; 208 | 209 | while (symbol && (symbol->id != id)) symbol = symbol->list; 210 | return symbol; 211 | } 212 | 213 | /* if 'id' is a typedef'd name visible in the current scope, 214 | return its symbol, otherwise NULL. note we can't just look 215 | for S_TYPEDEF directly, because that would skip names that 216 | hide the typedef. */ 217 | 218 | struct symbol * 219 | find_typedef(id) 220 | struct string * id; 221 | { 222 | struct symbol * symbol; 223 | 224 | symbol = find_symbol(id, S_NORMAL, SCOPE_GLOBAL, current_scope); 225 | 226 | if (symbol && (symbol->ss & S_TYPEDEF)) 227 | return symbol; 228 | else 229 | return NULL; 230 | } 231 | 232 | /* find, or create, the label symbol */ 233 | 234 | struct symbol * 235 | find_label(id) 236 | struct string * id; 237 | { 238 | struct symbol * symbol; 239 | 240 | symbol = find_symbol(id, S_LABEL, SCOPE_FUNCTION, SCOPE_FUNCTION); 241 | if (symbol == NULL) { 242 | symbol = new_symbol(id, S_LABEL, NULL); 243 | symbol->target = new_block(); 244 | put_symbol(symbol, SCOPE_FUNCTION); 245 | } 246 | return symbol; 247 | } 248 | 249 | /* find a symbol with the given 'id' in the namespace 'ss' 250 | between scopes 'start' and 'end' (inclusive). returns NULL 251 | if nothing found. */ 252 | 253 | struct symbol * 254 | find_symbol(id, ss, start, end) 255 | struct string * id; 256 | { 257 | struct symbol * symbol; 258 | int i; 259 | 260 | i = SYMBOL_BUCKET(id); 261 | for (symbol = symbol_buckets[i]; symbol; symbol = symbol->link) { 262 | if (symbol->scope < start) break; 263 | if (symbol->scope > end) continue; 264 | if (symbol->id != id) continue; 265 | if ((symbol->ss & ss) == 0) continue; 266 | return symbol; 267 | } 268 | 269 | return NULL; 270 | } 271 | 272 | /* find a symbol by (pseudo) register. return NULL if not found. this 273 | is very slow, because the symbol table isn't indexed by register. */ 274 | 275 | struct symbol * 276 | find_symbol_by_reg(reg) 277 | { 278 | struct symbol * symbol; 279 | int i; 280 | 281 | for (i = 0; i <= EXTRA_BUCKET; i++) 282 | for (symbol = symbol_buckets[i]; symbol; symbol = symbol->link) 283 | if (symbol->reg == reg) return symbol; 284 | 285 | return NULL; 286 | } 287 | 288 | /* someone's interested in the memory allocated to this symbol, 289 | so make sure it's allocated. */ 290 | 291 | store_symbol(symbol) 292 | struct symbol * symbol; 293 | { 294 | if ((symbol->ss & S_BLOCK) && (symbol->i == 0)) { 295 | frame_offset += size_of(symbol->type); 296 | frame_offset = ROUND_UP(frame_offset, align_of(symbol->type)); 297 | symbol->i = -frame_offset; 298 | } 299 | } 300 | 301 | /* return the pseudo register associated with the symbol, allocating 302 | one of appropriate type, if necessary. */ 303 | 304 | symbol_reg(symbol) 305 | struct symbol * symbol; 306 | { 307 | if (symbol->reg != R_NONE) return symbol->reg; 308 | 309 | if (symbol->type->ts & (T_IS_INTEGRAL | T_PTR)) 310 | symbol->reg = next_iregister++; 311 | else if (symbol->type->ts & T_IS_FLOAT) 312 | symbol->reg = next_fregister++; 313 | else error(ERROR_INTERNAL); 314 | 315 | return symbol->reg; 316 | } 317 | 318 | /* create an anonymous temporary symbol of the given type. 319 | these are always S_REGISTER (because the compiler uses them 320 | internally and will never take their addresses) and always 321 | SCOPE_RETIRED (because they were never in scope to begin with). 322 | 323 | caller yields ownership of 'type'. */ 324 | 325 | struct symbol * 326 | temporary_symbol(type) 327 | struct type * type; 328 | { 329 | struct symbol * symbol; 330 | 331 | symbol = new_symbol(NULL, S_REGISTER, type); 332 | put_symbol(symbol, SCOPE_RETIRED); 333 | return symbol; 334 | } 335 | 336 | /* walk the symbol table between scopes 'start' and 'end', inclusive, 337 | calling f() on each one. the EXTRA_BUCKET is included in the traversal. */ 338 | 339 | walk_symbols(start, end, f) 340 | int f(); 341 | { 342 | struct symbol * symbol; 343 | struct symbol * link; 344 | int i; 345 | 346 | for (i = 0; i <= EXTRA_BUCKET; i++) { 347 | for (symbol = symbol_buckets[i]; symbol; symbol = link) { 348 | link = symbol->link; 349 | if (symbol->scope < start) break; 350 | if (symbol->scope > end) continue; 351 | f(symbol); 352 | } 353 | } 354 | } 355 | 356 | /* entering a scope is trivial. */ 357 | 358 | enter_scope() 359 | { 360 | ++current_scope; 361 | if (current_scope > SCOPE_MAX) error(ERROR_NESTING); 362 | } 363 | 364 | /* exiting a scope just moves all symbols in the current 365 | scope into SCOPE_RETIRED, so they can't be seen. */ 366 | 367 | static 368 | exit1(symbol) 369 | struct symbol * symbol; 370 | { 371 | get_symbol(symbol); 372 | put_symbol(symbol, SCOPE_RETIRED); 373 | } 374 | 375 | exit_scope() 376 | { 377 | walk_symbols(current_scope, SCOPE_MAX, exit1); 378 | --current_scope; 379 | } 380 | 381 | /* after a function definition is completed, call free_symbols() 382 | to finally clear out all the out-of-scope symbols. */ 383 | 384 | free_symbol(symbol) 385 | struct symbol * symbol; 386 | { 387 | free_type(symbol->type); 388 | free(symbol); 389 | } 390 | 391 | static 392 | free_symbols1(symbol) 393 | struct symbol * symbol; 394 | { 395 | get_symbol(symbol); 396 | free_symbol(symbol); 397 | } 398 | 399 | free_symbols() 400 | { 401 | walk_symbols(SCOPE_FUNCTION, SCOPE_RETIRED, free_symbols1); 402 | } 403 | 404 | -------------------------------------------------------------------------------- /ncc1/symbol.h: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | /* all lexical strings encountered (both identifiers and string literals) 26 | are kept in a hash table for the lifetime of the compilation. this simplifies 27 | memory management and speeds comparisons. technique from Fraser and Hanson's LCC. 28 | 29 | the buckets are kept in LRU order to mitigate the overhead associated with 30 | passing every identifier through the string table. 31 | 32 | if 'asm_label' is non-zero, then this is a string literal that must be output 33 | to the text section at the end of compilation - see literals(). 34 | 35 | if 'token' is not KK_IDENT, then this is a keyword with that token code. 36 | 37 | 'data' is guaranteed to be NUL-terminated, so can be used as a C string. 38 | just keep in mind that string literals may have embedded NULs. */ 39 | 40 | struct string 41 | { 42 | unsigned hash; 43 | int length; 44 | char * data; 45 | int asm_label; 46 | int token; 47 | struct string * link; 48 | }; 49 | 50 | /* the symbol table is a hash table where each bucket is ordered 51 | (in decreasing order) by 'scope'. 52 | 53 | SCOPE_NONE is just an initial value. no one should end up in the 54 | symbol table at this scope. 55 | 56 | SCOPE_GLOBAL is file scope. 57 | 58 | SCOPE_FUNCTION ... SCOPE_MAX are the (nesting) function block scopes. 59 | 60 | SCOPE_RETIRED is a temporary home for non-globals that have gone out of 61 | scope. we keep them until finished generating the current function. */ 62 | 63 | #define SCOPE_NONE 0 64 | #define SCOPE_GLOBAL 1 65 | #define SCOPE_FUNCTION 2 66 | #define SCOPE_RETIRED (SCOPE_MAX + 1) 67 | 68 | /* the S_* flags encompass "storage classes" and a few other things. the 'ss' field of 69 | a symbol is a basic indicator of its kind. S_* are defined as bitfields so that a 70 | simple bitwise-and can be used to determine if two symbols are in the same namespace. 71 | 72 | some explanation of block-level storage classes is in order: 73 | 74 | S_LOCAL is the storage class assigned to block-level symbols without an explicit 75 | storage class. during the processing of the function, if the address is taken of an 76 | S_LOCAL, it is converted to S_AUTO. at the end of a function, before the optimizer 77 | is invoked, all S_LOCALs are converted to S_REGISTER. thus the optimizer can assume 78 | that an S_AUTO is aliased, and S_REGISTER is not. */ 79 | 80 | #define S_NONE 0 81 | 82 | #define S_STRUCT 0x00000001 83 | #define S_UNION 0x00000002 84 | 85 | /* tags live in their own namespace */ 86 | 87 | #define S_TAG (S_STRUCT | S_UNION) 88 | 89 | #define S_TYPEDEF 0x00000004 90 | #define S_STATIC 0x00000008 91 | #define S_EXTERN 0x00000010 92 | #define S_AUTO 0x00000020 93 | #define S_REGISTER 0x00000040 94 | #define S_LOCAL 0x00000080 95 | 96 | /* S_BLOCK is shorthand for all block-level symbols */ 97 | 98 | #define S_BLOCK (S_AUTO | S_REGISTER | S_LOCAL) 99 | 100 | /* S_NORMAL covers all symbols in the "normal" namespace */ 101 | 102 | #define S_NORMAL (S_LOCAL | S_REGISTER | S_AUTO | S_TYPEDEF | S_STATIC | S_EXTERN) 103 | 104 | #define S_MEMBER 0x00000100 105 | #define S_LABEL 0x00000200 106 | 107 | /* formal arguments are put in this storage class until 108 | their types have been declared in the function header */ 109 | 110 | #define S_ARG 0x00000400 111 | 112 | /* the S_REFERENCED flag is set on a symbol once it's 113 | actually used in an expression. we only care about this 114 | on S_EXTERNs, and that's just to keep from emitting 115 | unnecessary directives at the end of compilation. */ 116 | 117 | #define S_REFERENCED 0x40000000 118 | 119 | /* the S_DEFINED flag is used by labels, struct/union tags, 120 | and extern/static objects/functions to indicate a definition 121 | has already been seen. */ 122 | 123 | #define S_DEFINED 0x80000000 124 | 125 | struct symbol 126 | { 127 | struct string * id; /* can be NULL (anonymous symbols) */ 128 | int ss; /* S_* */ 129 | int scope; /* SCOPE_* */ 130 | struct type * type; /* NULL if not applicable (e.g., S_TAG) */ 131 | int reg; /* assigned pseudo-register (or R_NONE) */ 132 | int align; /* S_TAG only */ 133 | struct block * target; /* S_LABEL only */ 134 | struct symbol * link; /* table bucket link */ 135 | 136 | /* 'i' is vaguely named because it's a general purpose holder. 137 | 138 | S_STATIC: for non-global static objects, this is the 139 | assembler label assigned to the storage. 140 | 141 | S_MEMBER: offset from the struct/union base. 142 | 143 | S_AUTO/S_REGISTER/S_LOCAL: offset from the frame pointer. 144 | note that if this is 0, the address hasn't been assigned yet! 145 | (assignments for S_REGISTER might be deferred until spilled.) 146 | 147 | S_STRUCT/S_UNION: when S_DEFINED, the size in in BYTES. 148 | When !S_DEFINED, size is a running counter IN BITS. */ 149 | 150 | int i; 151 | 152 | /* S_MEMBERs live in lists threaded through the symbol table: the 153 | S_STRUCT or S_UNION container holds the head of the list. formal 154 | arguments also use this field, but the head is maintained externally. */ 155 | 156 | struct symbol * list; 157 | }; 158 | -------------------------------------------------------------------------------- /ncc1/token.h: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | /* a token is comprised of a class and possibly some additional data 26 | in 'u' (see KK_* comments for details). */ 27 | 28 | struct token 29 | { 30 | int kk; 31 | 32 | union 33 | { 34 | long i; 35 | double f; 36 | struct string * text; 37 | } u; 38 | }; 39 | 40 | /* token classes are for the most part arbitrary values, 41 | except that binary operators that are parsed by binary() 42 | have an embedded precedence level. the pseudo tokens are 43 | used internally by the lexer and never seen by the parser. */ 44 | 45 | #define KK_PREC(x) (0x00000100 << (x)) 46 | 47 | #define KK_PREC_MUL 0 48 | #define KK_PREC_ADD 1 49 | #define KK_PREC_SHIFT 2 50 | #define KK_PREC_REL 3 51 | #define KK_PREC_EQ 4 52 | #define KK_PREC_AND 5 53 | #define KK_PREC_XOR 6 54 | #define KK_PREC_OR 7 55 | #define KK_PREC_LAND 8 56 | #define KK_PREC_LOR 9 57 | 58 | #define KK_NONE 0 59 | 60 | #define KK_IDENT 1 /* identifier: u.text */ 61 | #define KK_STRLIT 2 /* string literal: u.text */ 62 | #define KK_FCON 3 /* float constant: u.f */ 63 | #define KK_LFCON 4 /* long float constant: u.f */ 64 | #define KK_ICON 5 /* int constant: u.i */ 65 | #define KK_LCON 6 /* long constant: u.i */ 66 | 67 | #define KK_NL 10 /* pseudo tokens */ 68 | #define KK_HASH 11 69 | 70 | #define KK_LPAREN /* ( */ 20 /* operators/punctuators */ 71 | #define KK_RPAREN /* ) */ 21 72 | #define KK_LBRACK /* [ */ 22 73 | #define KK_RBRACK /* ] */ 23 74 | #define KK_LBRACE /* { */ 24 75 | #define KK_RBRACE /* } */ 25 76 | #define KK_DOT /* . */ 26 77 | #define KK_XOR /* ^ */ (27 | KK_PREC(KK_PREC_XOR)) 78 | #define KK_COMMA /* , */ 28 79 | #define KK_COLON /* : */ 29 80 | #define KK_SEMI /* ; */ 30 81 | #define KK_QUEST /* ? */ 31 82 | #define KK_TILDE /* ~ */ 32 83 | #define KK_ARROW /* -> */ 33 84 | #define KK_INC /* ++ */ 34 85 | #define KK_DEC /* -- */ 35 86 | #define KK_BANG /* ! */ 36 87 | #define KK_DIV /* / */ (37 | KK_PREC(KK_PREC_MUL)) 88 | #define KK_STAR /* * */ (38 | KK_PREC(KK_PREC_MUL)) 89 | #define KK_PLUS /* + */ (39 | KK_PREC(KK_PREC_ADD)) 90 | #define KK_MINUS /* - */ (40 | KK_PREC(KK_PREC_ADD)) 91 | #define KK_GT /* > */ (41 | KK_PREC(KK_PREC_REL)) 92 | #define KK_SHR /* >> */ (42 | KK_PREC(KK_PREC_SHIFT)) 93 | #define KK_GTEQ /* >= */ (43 | KK_PREC(KK_PREC_REL)) 94 | #define KK_SHREQ /* >>= */ 44 95 | #define KK_LT /* < */ (45 | KK_PREC(KK_PREC_REL)) 96 | #define KK_SHL /* << */ (46 | KK_PREC(KK_PREC_SHIFT)) 97 | #define KK_LTEQ /* <= */ (47 | KK_PREC(KK_PREC_REL)) 98 | #define KK_SHLEQ /* <<= */ 48 99 | #define KK_AND /* & */ (49 | KK_PREC(KK_PREC_AND)) 100 | #define KK_ANDAND /* && */ (50 | KK_PREC(KK_PREC_LAND)) 101 | #define KK_ANDEQ /* &= */ 51 102 | #define KK_BAR /* | */ (52 | KK_PREC(KK_PREC_OR)) 103 | #define KK_BARBAR /* || */ (53 | KK_PREC(KK_PREC_LOR)) 104 | #define KK_BAREQ /* |= */ 54 105 | #define KK_MINUSEQ /* -= */ 55 106 | #define KK_PLUSEQ /* += */ 56 107 | #define KK_STAREQ /* *= */ 57 108 | #define KK_DIVEQ /* /= */ 58 109 | #define KK_EQEQ /* == */ (59 | KK_PREC(KK_PREC_EQ)) 110 | #define KK_BANGEQ /* != */ (60 | KK_PREC(KK_PREC_EQ)) 111 | #define KK_MOD /* % */ (61 | KK_PREC(KK_PREC_MUL)) 112 | #define KK_MODEQ /* %= */ 62 113 | #define KK_XOREQ /* ^= */ 63 114 | #define KK_EQ /* = */ 64 115 | 116 | /* the keywords should be sequentially numbered 117 | in alphabetical order to match keyword[] in lex.c */ 118 | 119 | #define KK_AUTO 70 /* keywords */ 120 | #define KK_BREAK 71 121 | #define KK_CASE 72 122 | #define KK_CHAR 73 123 | #define KK_CONTINUE 74 124 | #define KK_DEFAULT 75 125 | #define KK_DO 76 126 | #define KK_ELSE 77 127 | #define KK_EXTERN 78 128 | #define KK_FLOAT 79 129 | #define KK_FOR 80 130 | #define KK_GOTO 81 131 | #define KK_IF 82 132 | #define KK_INT 83 133 | #define KK_LONG 84 134 | #define KK_REGISTER 85 135 | #define KK_RETURN 86 136 | #define KK_SHORT 87 137 | #define KK_SIZEOF 88 138 | #define KK_STATIC 89 139 | #define KK_STRUCT 90 140 | #define KK_SWITCH 91 141 | #define KK_TYPEDEF 92 142 | #define KK_UNION 93 143 | #define KK_UNSIGNED 94 144 | #define KK_WHILE 95 145 | -------------------------------------------------------------------------------- /ncc1/tree.h: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | /* a conventional tree representation is used for expressions. a node 26 | may be a leaf, or have up to three children. on occasion (that is, 27 | for function arguments), a forest of trees is created via 'list'. 28 | 29 | the members of this struct have purposely terse names, as this 30 | (counter-intuitively) makes the code that manipulates trees more 31 | digestible, once the shorthand is grokked. */ 32 | 33 | #define NR_TREE_CH 3 34 | 35 | struct tree 36 | { 37 | int op; /* E_* */ 38 | struct tree * list; 39 | struct type * type; 40 | 41 | union 42 | { 43 | /* E_SYM represents the value referred to by the symbol. 44 | these are used during tree construction - early in code 45 | generation these turn into E_REG or E_MEM nodes. */ 46 | 47 | struct symbol * sym; 48 | 49 | /* E_REG represents a value held in a register. */ 50 | 51 | int reg; 52 | 53 | /* E_CON represents a constant. all floating point constants and 54 | arithmetic in the compiler are performed in double precision. 55 | integers of all sizes are kept normalized (that is, sign- or 56 | zero-extended to long) to simplify constant folding. */ 57 | 58 | union 59 | { 60 | long i; 61 | double f; 62 | } con; 63 | 64 | /* E_MEM represents a value held in memory, at [b+i*s+glob+ofs] or 65 | [rip glob+ofs]. E_IMM represents the address of that value. 'glob' 66 | must be a symbol known to the assembler (S_EXTERN or S_STATIC). */ 67 | 68 | struct /* E_IMM or E_MEM */ 69 | { 70 | struct symbol * glob; /* global */ 71 | long ofs; /* offset */ 72 | int b, i; /* base and index registers */ 73 | int s; /* 1, 2, 4 or 8: scale for index reg */ 74 | int rip; /* rIP-relative (flag) */ 75 | } mi; 76 | 77 | struct tree * ch[NR_TREE_CH]; 78 | } u; 79 | }; 80 | 81 | /* the E_* numbers aren't arbitrary assigned. for starters, they're ordered and 82 | grouped by the number of children they have. in addition, binary operators are 83 | a fixed numeric distance from their assignment counterparts (if they exist). 84 | finally, the numbers must match the indices into the table for debug_tree() */ 85 | 86 | #define E_IS_LEAF(x) ((x) < E_FETCH) 87 | #define E_HAS_CH0(x) ((x) >= E_FETCH) 88 | #define E_HAS_CH1(x) ((x) >= E_ADD) 89 | #define E_HAS_CH2(x) ((x) == E_TERN) 90 | 91 | #define E_TO_ASSIGN(x) ((x) + (E_ADDASS - E_ADD)) 92 | #define E_FROM_ASSIGN(x) ((x) - (E_ADDASS - E_ADD)) 93 | 94 | /* leaves */ 95 | 96 | #define E_NOP 0 97 | #define E_SYM 1 /* tree.u.sym */ 98 | #define E_CON 2 /* tree.u.con */ 99 | #define E_IMM 3 /* tree.u.mi */ 100 | #define E_MEM 4 /* tree.u.mi */ 101 | #define E_REG 5 /* tree.u.reg */ 102 | 103 | /* unary operators */ 104 | 105 | #define E_FETCH 6 /* * */ 106 | #define E_ADDR 7 /* & */ 107 | #define E_NEG 8 /* - */ 108 | #define E_COM 9 /* ~ */ 109 | #define E_CAST 10 /* (cast) */ 110 | #define E_NOT 11 /* ! */ 111 | 112 | /* binary operators */ 113 | 114 | #define E_ADD 12 /* + */ 115 | #define E_SUB 13 /* - */ 116 | #define E_MUL 14 /* * */ 117 | #define E_DIV 15 /* / */ 118 | #define E_MOD 16 /* % */ 119 | #define E_SHL 17 /* << */ 120 | #define E_SHR 18 /* >> */ 121 | #define E_AND 19 /* & */ 122 | #define E_OR 20 /* | */ 123 | #define E_XOR 21 /* ^ */ 124 | 125 | #define E_ADDASS 22 /* += */ 126 | #define E_SUBASS 23 /* -= */ 127 | #define E_MULASS 24 /* *= */ 128 | #define E_DIVASS 25 /* /= */ 129 | #define E_MODASS 26 /* %= */ 130 | #define E_SHLASS 27 /* <<= */ 131 | #define E_SHRASS 28 /* >>= */ 132 | #define E_ANDASS 29 /* &= */ 133 | #define E_ORASS 30 /* |= */ 134 | #define E_XORASS 31 /* ^= */ 135 | 136 | #define E_LAND 32 /* && */ 137 | #define E_LOR 33 /* || */ 138 | #define E_EQ 34 /* == */ 139 | #define E_NEQ 35 /* != */ 140 | #define E_GT 36 /* > */ 141 | #define E_LT 37 /* < */ 142 | #define E_GTEQ 38 /* >= */ 143 | #define E_LTEQ 39 /* <= */ 144 | #define E_COMMA 40 /* , */ 145 | #define E_ASSIGN 41 /* = */ 146 | 147 | /* the increment/decrement operators are unary in C, but binary in 148 | the trees. the second operand specifies the amount to add/subtract, 149 | so the front end does the scaling. */ 150 | 151 | #define E_PRE 42 /* prefix -- or ++ */ 152 | #define E_POST 43 /* postfix -- or ++ */ 153 | 154 | /* function call is considered binary, with ch[0] 155 | being the function to call, and ch[1] being the 156 | head of the argument expression forest. */ 157 | 158 | #define E_CALL 44 159 | 160 | /* the ternary operator. note that the children aren't 161 | ordered as one might expect. the condition is last. 162 | "a ? b : c", ch[0] = b, ch[1] = c, ch[2] = a. */ 163 | 164 | #define E_TERN 45 165 | -------------------------------------------------------------------------------- /ncc1/type.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #include 26 | #include "ncc1.h" 27 | 28 | /* create a new type node and give it sane defaults */ 29 | 30 | struct type * 31 | new_type(ts) 32 | { 33 | struct type * type; 34 | 35 | type = (struct type *) allocate(sizeof(struct type)); 36 | type->ts = ts; 37 | type->nr_elements = 0; 38 | type->tag = NULL; 39 | type->next = NULL; 40 | return type; 41 | } 42 | 43 | /* free a type - the whole type, not just the node. 44 | safe to call with NULL 'type'. */ 45 | 46 | free_type(type) 47 | struct type * type; 48 | { 49 | struct type * tmp; 50 | 51 | while (type) { 52 | tmp = type->next; 53 | free(type); 54 | type = tmp; 55 | } 56 | } 57 | 58 | /* return a copy of the type */ 59 | 60 | struct type * 61 | copy_type(type) 62 | struct type * type; 63 | { 64 | struct type * copy = NULL; 65 | struct type ** typep = © 66 | 67 | while (type) { 68 | *typep = new_type(type->ts); 69 | (*typep)->nr_elements = type->nr_elements; 70 | (*typep)->tag = type->tag; 71 | type = type->next; 72 | typep = &((*typep)->next); 73 | } 74 | 75 | return copy; 76 | } 77 | 78 | /* glue two types into one by appending 'type2' on the end of 'type1'. */ 79 | 80 | struct type * 81 | splice_types(type1, type2) 82 | struct type * type1; 83 | struct type * type2; 84 | { 85 | struct type * tmp; 86 | 87 | if (type1 == NULL) return type2; 88 | for (tmp = type1; tmp->next; tmp = tmp->next) ; 89 | tmp->next = type2; 90 | 91 | return type1; 92 | } 93 | 94 | /* check for compatibility between two types. and merge any 95 | additional information from 'type2' into 'type1'. */ 96 | 97 | check_types(type1, type2, mode) 98 | struct type * type1; 99 | struct type * type2; 100 | { 101 | while (type1 && type2) { 102 | if ((type1->ts & T_BASE) != (type2->ts & T_BASE)) break; 103 | if (type1->tag != type2->tag) break; 104 | if ((type1->nr_elements && type2->nr_elements) && (type1->nr_elements != type2->nr_elements)) break; 105 | if ((mode == CHECK_TYPES_COMPOSE) && (type2->nr_elements)) type1->nr_elements = type2->nr_elements; 106 | type1 = type1->next; 107 | type2 = type2->next; 108 | } 109 | 110 | if (type1 || type2) error(ERROR_INCOMPAT); 111 | } 112 | 113 | /* return the size or alignment of the type in bytes. 114 | abort with an error if the value isn't, or can't be, known. */ 115 | 116 | size_of(type) 117 | struct type * type; 118 | { 119 | long size = 1; 120 | 121 | while (type) { 122 | if (type->ts & (T_CHAR | T_UCHAR)) 123 | /* size *= 1 */ ; 124 | else if (type->ts & (T_SHORT | T_USHORT)) 125 | size *= 2; 126 | else if (type->ts & (T_INT | T_UINT | T_FLOAT)) 127 | size *= 4; 128 | else if (type->ts & (T_LONG | T_ULONG | T_LFLOAT | T_PTR)) 129 | size *= 8; 130 | else if (type->ts & T_FUNC) 131 | error(ERROR_ILLFUNC); 132 | else if (type->ts & T_TAG) { 133 | if (type->tag->ss & S_DEFINED) 134 | size *= type->tag->i; 135 | else 136 | error(ERROR_INCOMPLT); 137 | } else if (type->ts & T_ARRAY) { 138 | if (type->nr_elements == 0) 139 | error(ERROR_INCOMPLT); 140 | else 141 | size *= type->nr_elements; 142 | } 143 | 144 | if (size > MAX_SIZE) error(ERROR_TYPESIZE); 145 | if (type->ts & T_PTR) break; 146 | type = type->next; 147 | } 148 | 149 | return size; 150 | } 151 | 152 | align_of(type) 153 | struct type * type; 154 | { 155 | int align = 1; 156 | 157 | while (type) { 158 | if (type->ts & T_ARRAY) 159 | /* */ ; 160 | else if (type->ts & (T_CHAR | T_UCHAR)) 161 | align = 1; 162 | else if (type->ts & (T_SHORT | T_USHORT)) 163 | align = 2; 164 | else if (type->ts & (T_INT | T_UINT | T_FLOAT)) 165 | align = 4; 166 | else if (type->ts & (T_LONG | T_ULONG | T_LFLOAT | T_PTR)) 167 | align = 8; 168 | else if (type->ts & T_FUNC) 169 | error(ERROR_ILLFUNC); 170 | else if (type->ts & T_TAG) { 171 | if (type->tag->ss & S_DEFINED) 172 | align = type->tag->align; 173 | else 174 | error(ERROR_INCOMPLT); 175 | } 176 | 177 | if (type->ts & T_PTR) break; 178 | type = type->next; 179 | } 180 | 181 | return align; 182 | } 183 | 184 | /* perform some preliminary checks on a type: 185 | 1. functions must return scalars, 186 | 2. array elements can't be functions, 187 | 3. only the first index of an array can be unbounded. */ 188 | 189 | validate_type(type) 190 | struct type * type; 191 | { 192 | while (type) { 193 | if (type->ts & T_ARRAY) { 194 | if (type->next->ts & T_ARRAY) { 195 | if (type->next->nr_elements == 0) 196 | error(ERROR_ILLARRAY); 197 | } else if (type->next->ts & T_FUNC) { 198 | error(ERROR_ILLFUNC); 199 | } 200 | } else if (type->ts & T_FUNC) { 201 | if (!(type->next->ts & T_IS_SCALAR)) 202 | error(ERROR_RETURN); 203 | } 204 | type = type->next; 205 | } 206 | } 207 | 208 | /* adjust and validate the type of a formal argument. 209 | 1. arrays become pointers, and 210 | 2. functions become pointers to function. 211 | 3. structs/unions are prohibited. */ 212 | 213 | struct type * 214 | argument_type(type) 215 | struct type * type; 216 | { 217 | if (type->ts & T_ARRAY) { 218 | type->ts &= ~T_ARRAY; 219 | type->ts = T_PTR; 220 | } 221 | 222 | if (type->ts & T_FUNC) type = splice_types(new_type(T_PTR), type); 223 | if (type->ts & T_TAG) error(ERROR_STRUCT); 224 | 225 | return type; 226 | } 227 | 228 | #ifndef NDEBUG 229 | 230 | static struct 231 | { 232 | int ts; 233 | char *text; 234 | } ts[] = { 235 | { T_CHAR, "char" }, 236 | { T_UCHAR, "unsigned char" }, 237 | { T_SHORT, "short" }, 238 | { T_USHORT, "unsigned short" }, 239 | { T_INT, "int" }, 240 | { T_UINT, "unsigned int" }, 241 | { T_LONG, "long" }, 242 | { T_ULONG, "unsigned long" }, 243 | { T_FLOAT, "float" }, 244 | { T_LFLOAT, "long float" }, 245 | { T_PTR, "pointer to " }, 246 | { T_FUNC, "function returning " }, 247 | { T_ARRAY, "array[" } 248 | }; 249 | 250 | #define NR_TFLAGS (sizeof(ts)/sizeof(*ts)) 251 | 252 | debug_type(type) 253 | struct type * type; 254 | { 255 | struct symbol * tag; 256 | int i; 257 | 258 | while (type) { 259 | for (i = 0; i < NR_TFLAGS; i++) 260 | if (ts[i].ts & type->ts) 261 | fprintf(stderr, "%s", ts[i].text); 262 | 263 | if (type->ts & T_ARRAY) { 264 | if (type->nr_elements) fprintf(stderr, "%d", type->nr_elements); 265 | fprintf(stderr,"] of "); 266 | } 267 | 268 | if (type->ts & T_TAG) { 269 | if (type->tag->ss & S_STRUCT) fprintf(stderr, "struct"); 270 | if (type->tag->ss & S_UNION) fprintf(stderr, "union"); 271 | if (type->tag->id) 272 | fprintf(stderr, " '%s'", type->tag->id->data); 273 | else 274 | fprintf(stderr, " @ %p", type->tag); 275 | } 276 | 277 | if (type->ts & T_FIELD) 278 | fprintf(stderr, "[FIELD size %d shift %d] ", T_GET_SIZE(type->ts), T_GET_SHIFT(type->ts)); 279 | 280 | type = type->next; 281 | } 282 | } 283 | 284 | #endif 285 | 286 | -------------------------------------------------------------------------------- /ncc1/type.h: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | /* types are represented by lists of type nodes. */ 26 | 27 | struct type 28 | { 29 | int ts; /* T_* */ 30 | struct type * next; 31 | int nr_elements; /* T_ARRAY */ 32 | struct symbol * tag; /* T_TAG */ 33 | }; 34 | 35 | /* type nodes will have exactly one of the following bits set. they're defined 36 | as a powerset so that testing for groups of types is a simple operation. 37 | these types are also ordered to make the usual arithmetic conversions easy - 38 | see usuals() in tree.c */ 39 | 40 | #define T_CHAR 0x00000001 41 | #define T_UCHAR 0x00000002 42 | #define T_SHORT 0x00000004 43 | #define T_USHORT 0x00000008 44 | #define T_INT 0x00000010 45 | #define T_UINT 0x00000020 46 | #define T_LONG 0x00000040 47 | #define T_ULONG 0x00000080 48 | #define T_FLOAT 0x00000100 49 | #define T_LFLOAT 0x00000200 50 | #define T_TAG 0x00000400 /* struct/union 'u.tag' */ 51 | #define T_ARRAY 0x00000800 /* array[nr_elements] of ... */ 52 | #define T_PTR 0x00001000 /* ptr to ... */ 53 | #define T_FUNC 0x00002000 /* function returning ... */ 54 | 55 | #define T_BASE 0x00002FFF /* bits that must be exclusive */ 56 | 57 | /* bit fields must have one of the integral T_* bits set, as well as T_FIELD. 58 | the size and shift are then encoded using the macros below. these types 59 | will only appear in the symbol table as a struct/union member, and will only 60 | appear in expression trees under an E_FETCH or E_MEM referencing those members. */ 61 | 62 | #define T_FIELD 0x80000000 63 | 64 | #define T_GET_SIZE(ts) (((ts) & 0x3F000000) >> 24) 65 | #define T_SET_SIZE(ts, bits) ((ts) |= (((bits) & 0x3F) << 24)) 66 | #define T_GET_SHIFT(ts) (((ts) & 0x003F0000) >> 16) 67 | #define T_SET_SHIFT(ts, bits) ((ts) |= (((bits) & 0x3F) << 16)) 68 | 69 | /* useful sets of types */ 70 | 71 | #define T_IS_CHAR (T_CHAR | T_UCHAR) 72 | #define T_IS_SHORT (T_SHORT | T_USHORT) 73 | #define T_IS_INT (T_INT | T_UINT) 74 | #define T_IS_LONG (T_LONG | T_ULONG) 75 | #define T_IS_FLOAT (T_FLOAT | T_LFLOAT) 76 | 77 | #define T_IS_SIGNED (T_CHAR | T_SHORT | T_INT | T_LONG) 78 | #define T_IS_UNSIGNED (T_UCHAR | T_USHORT | T_UINT | T_ULONG) 79 | #define T_IS_INTEGRAL (T_IS_SIGNED | T_IS_UNSIGNED) 80 | #define T_IS_ARITH (T_IS_FLOAT | T_IS_INTEGRAL) 81 | 82 | #define T_IS_SCALAR (T_IS_ARITH | T_PTR) 83 | 84 | #define T_IS_BYTE T_IS_CHAR 85 | #define T_IS_WORD T_IS_SHORT 86 | #define T_IS_DWORD (T_IS_INT | T_FLOAT) 87 | #define T_IS_QWORD (T_IS_LONG | T_PTR | T_LFLOAT) 88 | 89 | -------------------------------------------------------------------------------- /ncpp/EXAMPLE3: -------------------------------------------------------------------------------- 1 | /* example #3 from ANSI C89 standard, 6.8.3.5 */ 2 | 3 | #define x 3 4 | #define f(a) f(x * (a)) 5 | #undef x 6 | #define x 2 7 | #define g f 8 | #define z z[0] 9 | #define h g(~ 10 | #define m(a) a(w) 11 | #define w 0,1 12 | #define t(a) a 13 | 14 | f(y+1) + f(f(z)) % t(t(g)(0) + t)(1); 15 | 16 | g(x+(3,4)-w) | h 5) & m 17 | (f)^m(m); 18 | 19 | -------------------------------------------------------------------------------- /ncpp/EXAMPLE4: -------------------------------------------------------------------------------- 1 | /* example #4 from ANSI C89 standard, 6.8.3.5 */ 2 | 3 | #define str(s) # s 4 | #define xstr(s) str(s) 5 | #define debug(s, t) printf("x" # s "= %d, x" # t "= %s", \ 6 | x ## s, x ## t) 7 | #define INCFILE(n) vers ## n 8 | #define glue(a, b) a ## b 9 | #define xglue(a, b) glue(a, b) 10 | #define HIGHLOW "hello" 11 | #define LOW LOW ", world" 12 | 13 | debug(1, 2); 14 | fputs(str(strncmp("abc\0d", "abc", '\4') /* this goes away */ 15 | == 0) str(: @\n), s); 16 | /* #include xstr(INCFILE(2).h) */ 17 | glue(HIGH, LOW); 18 | xglue(HIGH, LOW) 19 | -------------------------------------------------------------------------------- /ncpp/input.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include "ncpp.h" 30 | 31 | struct input * input_stack; 32 | 33 | /* open a new file and put it on top of the input stack. the next call to 34 | input_line() will return text from this file. ownership of 'path' is 35 | yielded by the caller. */ 36 | 37 | input_open(path) 38 | struct vstring * path; 39 | { 40 | struct input * input; 41 | 42 | input = (struct input *) safe_malloc(sizeof(struct input)); 43 | input->path = path; 44 | input->line_number = 0; 45 | input->stack_link = input_stack; 46 | 47 | input->file = fopen(path->data, "r"); 48 | if (!input->file) fail("can't open '%V' for reading", path); 49 | 50 | input_stack = input; 51 | } 52 | 53 | /* erase comments by over-writing with space */ 54 | 55 | static int in_comment; 56 | 57 | static 58 | erase_comments(vstring) 59 | struct vstring * vstring; 60 | { 61 | int i = 0; 62 | int delimiter = 0; 63 | 64 | while (i < vstring->length) { 65 | if (delimiter) { 66 | if (vstring->data[i] == delimiter) 67 | delimiter = 0; 68 | 69 | if ((vstring->data[i] == '\\') && vstring->data[i]) 70 | ++i; 71 | } else if (in_comment) { 72 | if ((vstring->data[i] == '*') && (vstring->data[i+1] == '/')) { 73 | vstring->data[i + 1] = ' '; 74 | in_comment = 0; 75 | } 76 | vstring->data[i] = ' '; 77 | } else { 78 | if ((vstring->data[i] == '/') && (vstring->data[i+1] == '*')) { 79 | vstring->data[i] = ' '; 80 | vstring->data[i + 1] = ' '; 81 | in_comment = 1; 82 | } else { 83 | if ((vstring->data[i] == '"') || (vstring->data[i] == '\'')) 84 | delimiter = vstring->data[i]; 85 | } 86 | } 87 | ++i; 88 | } 89 | } 90 | 91 | /* get the next line of input off the input stack. returns NULL if there is 92 | no more input. this routine is responsible for logical line concatenation. 93 | if 'mode' is INPUT_LINE_LIMITED, then refuse to cross a file boundary. */ 94 | 95 | struct vstring * 96 | input_line(mode) 97 | { 98 | struct vstring * vstring; 99 | int c; 100 | int last_was_backslash; 101 | struct input * tmp_input; 102 | 103 | while (input_stack) { 104 | c = getc(input_stack->file); 105 | 106 | if (c == -1) { 107 | if (in_comment) fail("file ends mid-comment"); 108 | if (mode == INPUT_LINE_LIMITED) return NULL; 109 | 110 | fclose(input_stack->file); 111 | tmp_input = input_stack->stack_link; 112 | free(input_stack); 113 | input_stack = tmp_input; 114 | } else { 115 | ungetc(c, input_stack->file); 116 | break; 117 | } 118 | } 119 | 120 | if (!input_stack) return NULL; 121 | 122 | vstring = vstring_new(NULL); 123 | input_stack->line_number++; 124 | 125 | for (;;) { 126 | last_was_backslash = (c == '\\'); 127 | c = getc(input_stack->file); 128 | 129 | switch (c) { 130 | case '\n': 131 | if (last_was_backslash) { 132 | vstring_rubout(vstring); 133 | input_stack->line_number++; 134 | continue; 135 | } 136 | /* else fall through */ 137 | case -1: 138 | erase_comments(vstring); 139 | return vstring; 140 | 141 | default: 142 | vstring_putc(vstring, c); 143 | } 144 | } 145 | } 146 | 147 | /* system include directories- that is, those searched when 148 | INPUT_INCLUDE_SYSTEM is given to input_include()- are kept 149 | in a list that is searched in the reverse order that they 150 | are specified using input_include_directory(). */ 151 | 152 | struct include_directory 153 | { 154 | struct vstring * path; 155 | struct include_directory * previous; 156 | }; 157 | 158 | struct include_directory * include_directories; 159 | 160 | /* add a directory to search for system includes. */ 161 | 162 | input_include_directory(path) 163 | char * path; 164 | { 165 | struct include_directory * directory; 166 | 167 | if (strlen(path) == 0) fail("missing include directory argument"); 168 | 169 | directory = (struct include_directory *) safe_malloc(sizeof(struct include_directory)); 170 | directory->previous = include_directories; 171 | directory->path = vstring_new(path); 172 | include_directories = directory; 173 | } 174 | 175 | /* input_include() searches appropriate places for a file with the given 176 | path and puts it on top of the input stack with input_open(). the mode 177 | governs what constitutes "an appropriate place": 178 | 179 | INPUT_INCLUDE_SYSTEM: search for the given path relative to 180 | the 'include_directories'. the first file found wins. 181 | 182 | INPUT_INCLUDE_LOCAL: assume the given path is relative to the 183 | path of the current input file (not the current directory). 184 | 185 | ownership of 'path' is yielded by the caller. */ 186 | 187 | input_include(path, mode) 188 | struct vstring * path; 189 | { 190 | struct include_directory * directory; 191 | struct vstring * new_path; 192 | 193 | if (mode == INPUT_INCLUDE_LOCAL) { 194 | new_path = vstring_copy(input_stack->path); 195 | 196 | while (new_path->length && (new_path->data[new_path->length - 1] != '/')) 197 | vstring_rubout(new_path); 198 | 199 | vstring_concat(new_path, path); 200 | } else { 201 | directory = include_directories; 202 | while (directory) { 203 | new_path = vstring_copy(directory->path); 204 | vstring_putc(new_path, '/'); 205 | vstring_concat(new_path, path); 206 | 207 | if (!access(new_path->data, 0)) 208 | break; 209 | else 210 | vstring_free(new_path); 211 | 212 | directory = directory->previous; 213 | } 214 | if (directory == NULL) fail("'%V' not found in system include paths", path); 215 | } 216 | 217 | input_open(new_path); 218 | vstring_free(path); 219 | } 220 | -------------------------------------------------------------------------------- /ncpp/makefile: -------------------------------------------------------------------------------- 1 | OBJS=ncpp.o input.o directive.o token.o macro.o vstring.o 2 | 3 | ncpp: $(OBJS) 4 | $(CC) $(CFLAGS) -o ncpp $(OBJS) 5 | 6 | clean:: 7 | rm -f *.o ncpp 8 | -------------------------------------------------------------------------------- /ncpp/ncpp.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include "ncpp.h" 30 | 31 | struct vstring * output_path; 32 | FILE * output_file; 33 | 34 | /* specialized printf()-like output, used by fail() and out(). 35 | 36 | the recognized format specifiers are: 37 | 38 | %d %s like printf() 39 | %V struct vstring * 40 | %T struct token * 41 | 42 | other specifiers are just ignored. */ 43 | 44 | static 45 | print(file, fmt, args) 46 | FILE * file; 47 | char * fmt; 48 | va_list args; 49 | { 50 | struct vstring * vstring; 51 | 52 | while (*fmt) { 53 | if (*fmt != '%') { 54 | fputc(*fmt, file); 55 | } else { 56 | fmt++; 57 | switch (*fmt) { 58 | case 's': 59 | fprintf(file, "%s", va_arg(args, char *)); 60 | break; 61 | 62 | case 'd': 63 | fprintf(file, "%d", va_arg(args, int)); 64 | break; 65 | 66 | case 'V': 67 | vstring = va_arg(args, struct vstring *); 68 | if (vstring->length) 69 | fprintf(file, "%s", vstring->data); 70 | else 71 | fprintf(file, "[empty]"); 72 | break; 73 | 74 | case 'T': 75 | token_print(va_arg(args, struct token *), file); 76 | break; 77 | } 78 | } 79 | fmt++; 80 | } 81 | } 82 | 83 | /* invoke print() on output_file. */ 84 | 85 | #ifdef __STDC__ 86 | void 87 | out(char * fmt, ...) 88 | #else 89 | out(fmt) 90 | char * fmt; 91 | #endif 92 | { 93 | va_list args; 94 | 95 | va_start(args, fmt); 96 | print(output_file, fmt, args); 97 | va_end(args); 98 | } 99 | 100 | /* report an error to the user, clean up, and exit. 101 | format specifiers and arguments are processed by print(), above. */ 102 | 103 | #ifdef __STDC__ 104 | void 105 | fail(char * fmt, ...) 106 | #else 107 | fail(fmt) 108 | char * fmt; 109 | #endif 110 | { 111 | va_list args; 112 | 113 | if (input_stack) { 114 | fprintf(stderr, "'%s'", input_stack->path->data); 115 | if (input_stack->line_number) fprintf(stderr, " (%d)", input_stack->line_number); 116 | fprintf(stderr, ": "); 117 | } 118 | 119 | fprintf(stderr, "ERROR: "); 120 | va_start(args, fmt); 121 | print(stderr, fmt, args); 122 | va_end(args); 123 | fputc('\n', stderr); 124 | 125 | if (output_file) { 126 | fclose(output_file); 127 | unlink(output_path->data); 128 | } 129 | 130 | exit(1); 131 | } 132 | 133 | /* a simple wrapper to use instead of malloc() */ 134 | 135 | char * 136 | safe_malloc(sz) 137 | { 138 | char * p = malloc(sz); 139 | 140 | if (p == NULL) fail("out of memory"); 141 | 142 | return p; 143 | } 144 | 145 | /* synchronize the output file's idea of its path name and line number 146 | with the input file. if the output is less than SYNC_WINDOW lines behind, 147 | just rectify with newlines, otherwise issue a #line directive. */ 148 | 149 | #define SYNC_WINDOW 10 150 | 151 | static 152 | sync() 153 | { 154 | static struct vstring * path; 155 | static int line_number; 156 | 157 | if ((path == NULL) 158 | || !vstring_equal(path, input_stack->path) 159 | || (line_number > input_stack->line_number) 160 | || (line_number < (input_stack->line_number - SYNC_WINDOW))) 161 | { 162 | if (path != NULL) { 163 | out("\n"); 164 | vstring_free(path); 165 | } 166 | 167 | path = vstring_copy(input_stack->path); 168 | line_number = input_stack->line_number; 169 | out("# %d \"%V\"\n", line_number, path); 170 | } 171 | 172 | while (line_number < input_stack->line_number) { 173 | out("\n"); 174 | line_number++; 175 | } 176 | } 177 | 178 | /* call input_line() with the given 'mode' and tokenize the line 179 | onto the end of 'list'. returns non-zero on success, or zero if 180 | there is no more input. */ 181 | 182 | static 183 | fill(mode, list) 184 | int mode; 185 | struct list * list; 186 | { 187 | struct vstring * line; 188 | 189 | line = input_line(mode); 190 | if (line == NULL) return 0; 191 | tokenize(line, list); 192 | vstring_free(line); 193 | 194 | return 1; 195 | } 196 | 197 | /* subject the first 'count' tokens of 'list' to macro replacement. */ 198 | 199 | replace_tokens(list, count) 200 | struct list * list; 201 | { 202 | struct list * replace_list; 203 | 204 | replace_list = list_new(); 205 | list_move(replace_list, list, count, NULL); 206 | macro_replace(replace_list, MACRO_REPLACE_ONCE); 207 | list_move(list, replace_list, -1, list->first); 208 | list_free(replace_list); 209 | } 210 | 211 | /* return the number of tokens (including 'start') that comprise 212 | the parentheses-enclosed actual macro argument list. if no open 213 | parenthesis is found, the return value is either 0 (no action 214 | required) or -1 (additional lines were read, fixup is required). */ 215 | 216 | static 217 | match_parentheses(list, start) 218 | struct list * list; 219 | struct token * start; 220 | { 221 | int no_match = 0; 222 | struct token * cursor; 223 | int parentheses; 224 | int count; 225 | 226 | restart: 227 | 228 | for (;;) { 229 | cursor = start; 230 | parentheses = 0; 231 | count = 0; 232 | 233 | while (cursor && (cursor->class == TOKEN_SPACE)) { 234 | count++; 235 | cursor = cursor->next; 236 | } 237 | 238 | if (cursor && (cursor->class != TOKEN_LPAREN)) return no_match; 239 | 240 | if (!cursor) { 241 | no_match = -1; 242 | if (!fill(INPUT_LINE_LIMITED, list)) return no_match; 243 | goto restart; 244 | } 245 | 246 | do { 247 | if (cursor->class == TOKEN_LPAREN) parentheses++; 248 | if (cursor->class == TOKEN_RPAREN) parentheses--; 249 | cursor = cursor->next; 250 | count++; 251 | if (!cursor && parentheses) { 252 | if (!fill(INPUT_LINE_LIMITED, list)) return count; 253 | goto restart; 254 | } 255 | } while (parentheses); 256 | 257 | return count; 258 | } 259 | } 260 | 261 | /* main() seeds the keyword strings, processes the command line arguments, 262 | and then loops copying input to output until there's no more. no surprises here. */ 263 | 264 | main(argc, argv) 265 | char ** argv; 266 | { 267 | struct vstring * input_path; 268 | struct list * list; 269 | int check_directives; 270 | 271 | macro_predefine(); 272 | 273 | ++argv; 274 | --argc; 275 | 276 | while (*argv && (**argv == '-')) { 277 | switch ((*argv)[1]) { 278 | case 'I': 279 | input_include_directory((*argv) + 2); 280 | break; 281 | 282 | case 'D': 283 | macro_option((*argv) + 2); 284 | break; 285 | 286 | default: 287 | fail("bad argument '%s'", *argv); 288 | } 289 | ++argv; 290 | } 291 | 292 | if (!*argv) fail("no input path specified"); 293 | input_path = vstring_new(*argv); 294 | ++argv; 295 | 296 | if (!*argv) fail("no output path specified"); 297 | output_path = vstring_new(*argv); 298 | output_file = fopen(output_path->data, "w"); 299 | if (!output_file) fail("could not open '%V' for writing", output_path); 300 | ++argv; 301 | 302 | if (*argv) fail("too many arguments"); 303 | 304 | input_open(input_path); 305 | list = list_new(); 306 | 307 | for (;;) { 308 | while (list->count == 0) { 309 | if (!fill(INPUT_LINE_NORMAL, list)) { 310 | out("\n"); 311 | fclose(output_file); 312 | exit(0); 313 | } 314 | check_directives = 1; 315 | } 316 | 317 | if (check_directives) { 318 | directive(list); 319 | check_directives = 0; 320 | } 321 | 322 | if (list->first && (list->first->class == TOKEN_NAME)) { 323 | struct macro * macro = macro_lookup(list->first->u.text, MACRO_LOOKUP_NORMAL); 324 | 325 | if (macro) { 326 | if (!macro->arguments) { 327 | replace_tokens(list, 1); 328 | continue; 329 | } else { 330 | int i = match_parentheses(list, list->first->next); 331 | 332 | if (i > 0) { 333 | replace_tokens(list, 1 + i); 334 | continue; 335 | } 336 | 337 | if (i == -1) check_directives = 1; 338 | } 339 | } 340 | } 341 | 342 | sync(); 343 | 344 | if (list->count) { 345 | if (list->first->class != TOKEN_SPACE) 346 | out("%T ", list->first); 347 | 348 | list_delete(list, list->first); 349 | } 350 | } 351 | } 352 | -------------------------------------------------------------------------------- /ncpp/ncpp.h: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | extern FILE * output_file; 26 | 27 | extern char * safe_malloc(); 28 | 29 | #ifdef __STDC__ 30 | extern void fail(char *, ...); 31 | extern void output(char *, ...); 32 | #endif 33 | 34 | /* struct vstring represents a variable-length string */ 35 | 36 | struct vstring 37 | { 38 | char * data; 39 | int length; 40 | int capacity; 41 | }; 42 | 43 | extern struct vstring * vstring_new(); 44 | extern struct vstring * vstring_copy(); 45 | extern struct vstring * vstring_from_literal(); 46 | extern int vstring_equal(); 47 | extern int vstring_equal_s(); 48 | extern unsigned vstring_hash(); 49 | 50 | struct macro 51 | { 52 | struct vstring * name; 53 | struct list * arguments; 54 | struct list * replacement; 55 | struct macro * link; 56 | int predefined; 57 | }; 58 | 59 | #define MACRO_LOOKUP_NORMAL 0 60 | #define MACRO_LOOKUP_CREATE 1 61 | 62 | extern struct macro * macro_lookup(); 63 | 64 | #define MACRO_REPLACE_ONCE 0 65 | #define MACRO_REPLACE_REPEAT 1 66 | 67 | struct input 68 | { 69 | struct vstring * path; 70 | FILE * file; 71 | int line_number; 72 | struct input * stack_link; 73 | }; 74 | 75 | extern struct input * input_stack; 76 | 77 | #define INPUT_LINE_NORMAL 0 78 | #define INPUT_LINE_LIMITED 1 79 | 80 | extern struct vstring * input_line(); 81 | 82 | #define INPUT_INCLUDE_LOCAL 0 83 | #define INPUT_INCLUDE_SYSTEM 1 84 | 85 | struct token 86 | { 87 | int class; 88 | 89 | struct token * previous; 90 | struct token * next; 91 | 92 | union 93 | { 94 | struct vstring * text; 95 | int argument_no; 96 | int ascii; 97 | long int_value; 98 | unsigned long unsigned_value; 99 | } u; 100 | }; 101 | 102 | struct list 103 | { 104 | int count; 105 | struct token * first; 106 | struct token * last; 107 | }; 108 | 109 | /* token classes. be careful when changing this list- the values 110 | must match the indices into the token_text[] array in token.c */ 111 | 112 | #define TOKEN_SPACE 0 /* u.ascii: whitespace (except newline) */ 113 | #define TOKEN_INT 1 /* u.int_value: integer (in expression) */ 114 | #define TOKEN_UNSIGNED 2 /* u.unsigned_value: unsigned (in expression) */ 115 | #define TOKEN_ARG 3 /* u.argument_no: placeholder for function-like macro argument */ 116 | #define TOKEN_UNKNOWN 4 /* u.ascii: any char in input not otherwise accounted for */ 117 | #define TOKEN_STRING 5 /* u.text: string literal */ 118 | #define TOKEN_CHAR 6 /* u.text: char constant */ 119 | #define TOKEN_NUMBER 7 /* u.text: preprocessing number */ 120 | #define TOKEN_NAME 8 /* u.text: an identifier subject to macro replacement */ 121 | #define TOKEN_EXEMPT_NAME 9 /* u.text: an identifier NOT subject to macro replacement */ 122 | 123 | #define TOKEN_GT 10 /* > */ 124 | #define TOKEN_LT 11 /* < */ 125 | #define TOKEN_GTEQ 12 /* >= */ 126 | #define TOKEN_LTEQ 13 /* <= */ 127 | #define TOKEN_SHL 14 /* << */ 128 | #define TOKEN_SHLEQ 15 /* <<= */ 129 | #define TOKEN_SHR 16 /* >> */ 130 | #define TOKEN_SHREQ 17 /* >>= */ 131 | #define TOKEN_EQ 18 /* = */ 132 | #define TOKEN_EQEQ 19 /* == */ 133 | 134 | #define TOKEN_NOTEQ 20 /* != */ 135 | #define TOKEN_PLUS 21 /* + */ 136 | #define TOKEN_PLUSEQ 22 /* += */ 137 | #define TOKEN_INC 23 /* ++ */ 138 | #define TOKEN_MINUS 24 /* - */ 139 | #define TOKEN_MINUSEQ 25 /* -= */ 140 | #define TOKEN_DEC 26 /* -- */ 141 | #define TOKEN_ARROW 27 /* -> */ 142 | #define TOKEN_LPAREN 28 /* ( */ 143 | #define TOKEN_RPAREN 29 /* ) */ 144 | 145 | #define TOKEN_LBRACK 30 /* [ */ 146 | #define TOKEN_RBRACK 31 /* ] */ 147 | #define TOKEN_LBRACE 32 /* { */ 148 | #define TOKEN_RBRACE 33 /* } */ 149 | #define TOKEN_COMMA 34 /* , */ 150 | #define TOKEN_DOT 35 /* . */ 151 | #define TOKEN_QUEST 36 /* ? */ 152 | #define TOKEN_COLON 37 /* : */ 153 | #define TOKEN_SEMI 38 /* ; */ 154 | #define TOKEN_OR 39 /* | */ 155 | 156 | #define TOKEN_OROR 40 /* || */ 157 | #define TOKEN_OREQ 41 /* |= */ 158 | #define TOKEN_AND 42 /* & */ 159 | #define TOKEN_ANDAND 43 /* && */ 160 | #define TOKEN_ANDEQ 44 /* &= */ 161 | #define TOKEN_MUL 45 /* * */ 162 | #define TOKEN_MULEQ 46 /* *= */ 163 | #define TOKEN_MOD 47 /* % */ 164 | #define TOKEN_MODEQ 48 /* %= */ 165 | #define TOKEN_XOR 49 /* ^ */ 166 | 167 | #define TOKEN_XOREQ 50 /* ^= */ 168 | #define TOKEN_NOT 51 /* ! */ 169 | #define TOKEN_HASH 52 /* # */ 170 | #define TOKEN_HASHHASH 53 /* ## (impotent) */ 171 | #define TOKEN_DIV 54 /* / */ 172 | #define TOKEN_DIVEQ 55 /* /= */ 173 | #define TOKEN_TILDE 56 /* ~ */ 174 | #define TOKEN_ELLIPSIS 57 /* ... */ 175 | #define TOKEN_PASTE 58 /* ## (in macro replacement list) */ 176 | 177 | extern struct token * token_new(); 178 | extern struct token * token_copy(); 179 | extern struct token * token_paste(); 180 | extern int token_equal(); 181 | extern struct list * list_new(); 182 | extern struct token * list_unlink(); 183 | extern struct token * list_delete(); 184 | struct list * list_copy(); 185 | extern int list_equal(); 186 | 187 | #define LIST_TRIM_LEADING 0x00000001 188 | #define LIST_TRIM_TRAILING 0x00000002 189 | #define LIST_TRIM_EDGES (LIST_TRIM_LEADING | LIST_TRIM_TRAILING) 190 | #define LIST_TRIM_FOLD 0x00000004 191 | #define LIST_TRIM_STRIP 0x00000008 192 | 193 | #define LIST_GLUE_RAW 0 194 | #define LIST_GLUE_STRINGIZE 1 195 | 196 | extern struct vstring * list_glue(); 197 | 198 | #define SKIP_SPACES(t) while ((t) && ((t)->class == TOKEN_SPACE)) ((t) = (t)->next) 199 | 200 | -------------------------------------------------------------------------------- /ncpp/vstring.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | #include 32 | #include "ncpp.h" 33 | 34 | /* the more buckets the better, to a point. for optimal performance, 35 | NR_STRING BUCKETS should really be a power of two. */ 36 | 37 | #define NR_STRING_BUCKETS 16 38 | 39 | static struct string * buckets[NR_STRING_BUCKETS]; 40 | 41 | /* the initial capacity of a vstring. there is probably a 'best' value for 42 | any given standard library, but 16 seems pretty safe for now. */ 43 | 44 | #define VSTRING_INITIAL_ALLOC 16 45 | 46 | /* interpret the C escape sequence at 'cp' and determine its value. 'c' is 47 | set to that value. 'cp' is advanced to the next character position. if the 48 | sequence is invalid, return false, otherwise, return true. */ 49 | 50 | static char * digits = "0123456789ABCDEF"; 51 | 52 | #define isodigit(x) (isdigit(x) && ((x) != '8') && ((x) != '9')) 53 | #define digit_value(x) (strchr(digits, toupper(x)) - digits) 54 | 55 | static 56 | escape(cp, c) 57 | char ** cp; 58 | int * c; 59 | { 60 | if (isodigit(**cp)) { 61 | *c = digit_value(**cp); 62 | (*cp)++; 63 | if (isodigit(**cp)) { 64 | *c <<= 3; 65 | *c += digit_value(**cp); 66 | (*cp)++; 67 | if (isodigit(**cp)) { 68 | *c <<= 3; 69 | *c += digit_value(**cp); 70 | (*cp)++; 71 | if (*c > UCHAR_MAX) return 0; 72 | } 73 | } 74 | } else if (**cp == 'x') { 75 | *c = 0; 76 | (*cp)++; 77 | if (!isxdigit(**cp)) return 0; 78 | *c = digit_value(**cp); 79 | (*cp)++; 80 | 81 | if (isxdigit(**cp)) { 82 | *c <<= 4; 83 | *c += digit_value(**cp); 84 | (*cp)++; 85 | if (isxdigit(**cp)) return 0; 86 | } 87 | } else { 88 | switch (**cp) { 89 | case 'a': 90 | *c = '\a'; 91 | break; 92 | 93 | case 'b': 94 | *c = '\b'; 95 | break; 96 | 97 | case 'f': 98 | *c = '\f'; 99 | break; 100 | 101 | case 'n': 102 | *c = '\n'; 103 | break; 104 | 105 | case 'r': 106 | *c = '\r'; 107 | break; 108 | 109 | case 't': 110 | *c = '\t'; 111 | break; 112 | 113 | case 'v': 114 | *c = '\v'; 115 | break; 116 | 117 | case '\?': 118 | case '\"': 119 | case '\\': 120 | case '\'': 121 | *c = **cp; 122 | break; 123 | 124 | default: 125 | return 0; 126 | } 127 | (*cp)++; 128 | } 129 | 130 | return 1; 131 | } 132 | 133 | /* return a new string from the token text of a C string literal, interpreting 134 | escape sequences. if the literal contains illegal escape sequences, return NULL. */ 135 | 136 | struct vstring * 137 | vstring_from_literal(literal) 138 | struct vstring * literal; 139 | { 140 | struct vstring * raw; 141 | char * cp; 142 | int c; 143 | 144 | if (literal->length == 0) return NULL; 145 | cp = literal->data; 146 | if (*cp == 'L') cp++; 147 | cp++; /* opening quote */ 148 | raw = vstring_new(NULL); 149 | 150 | while (*cp != '\"') { 151 | if (*cp == '\\') { 152 | cp++; 153 | if (!escape(&cp, &c)) { 154 | vstring_free(raw); 155 | return NULL; 156 | } 157 | } else 158 | c = *cp++; 159 | 160 | vstring_putc(raw, c); 161 | } 162 | 163 | return raw; 164 | } 165 | 166 | /* allocate a new struct vstring. note that this only allocates and initializes 167 | the structure itself- a zero-capacity string has no storage. if 's' is not NULL, 168 | then the vstring is initialized with the contents of the C-style string 's'. */ 169 | 170 | struct vstring * 171 | vstring_new(s) 172 | char * s; 173 | { 174 | struct vstring * vstring = (struct vstring *) safe_malloc(sizeof(struct vstring)); 175 | 176 | vstring->data = NULL; 177 | vstring->length = 0; 178 | vstring->capacity = 0; 179 | 180 | if (s) vstring_puts(vstring, s); 181 | 182 | return vstring; 183 | } 184 | 185 | /* append the contents of the source string to the destination string. 186 | the source string is unmodified. */ 187 | 188 | vstring_concat(destination, source) 189 | struct vstring * destination; 190 | struct vstring * source; 191 | { 192 | int i = 0; 193 | 194 | while (i < source->length) vstring_putc(destination, source->data[i++]); 195 | } 196 | 197 | /* create a new vstring initialized with the contents of another vstring. */ 198 | 199 | struct vstring * 200 | vstring_copy(source) 201 | struct vstring * source; 202 | { 203 | struct vstring * vstring; 204 | 205 | vstring = vstring_new(NULL); 206 | vstring_concat(vstring, source); 207 | 208 | return vstring; 209 | } 210 | 211 | /* free a vstring, and its associated storage if present. */ 212 | 213 | vstring_free(vstring) 214 | struct vstring * vstring; 215 | { 216 | if (vstring->data) free(vstring->data); 217 | free(vstring); 218 | } 219 | 220 | /* return true if string contents are equal */ 221 | 222 | vstring_equal(vstring1, vstring2) 223 | struct vstring * vstring1; 224 | struct vstring * vstring2; 225 | { 226 | if (vstring1->length != vstring2->length) return 0; 227 | return (memcmp(vstring1->data, vstring2->data, vstring1->length) == 0); 228 | } 229 | 230 | vstring_equal_s(vstring, s) 231 | struct vstring * vstring; 232 | char * s; 233 | { 234 | int length = strlen(s); 235 | 236 | if (vstring->length != length) return 0; 237 | return (memcmp(vstring->data, s, length) == 0); 238 | } 239 | 240 | /* add a char to the end of a vstring, increasing the capacity and 241 | (re)allocating storage if necessary */ 242 | 243 | vstring_putc(vstring, c) 244 | struct vstring * vstring; 245 | { 246 | if (vstring->length == vstring->capacity) { 247 | if (vstring->capacity == 0) { 248 | vstring->data = safe_malloc(VSTRING_INITIAL_ALLOC); 249 | vstring->capacity = VSTRING_INITIAL_ALLOC - 1; 250 | } else { 251 | char *new_data = safe_malloc((vstring->capacity + 1) * 2); 252 | memcpy(new_data, vstring->data, vstring->capacity); 253 | free(vstring->data); 254 | vstring->data = new_data; 255 | vstring->capacity = ((vstring->capacity + 1) * 2) - 1; 256 | } 257 | } 258 | 259 | vstring->data[vstring->length++] = c; 260 | vstring->data[vstring->length] = 0; 261 | } 262 | 263 | /* add a null-terminated string to the end of a vstring. */ 264 | 265 | vstring_puts(vstring, s) 266 | struct vstring * vstring; 267 | char * s; 268 | { 269 | while (*s) vstring_putc(vstring, *s++); 270 | } 271 | 272 | 273 | /* erase the last character from the string (if any) */ 274 | 275 | vstring_rubout(vstring) 276 | struct vstring * vstring; 277 | { 278 | if (vstring->length) { 279 | vstring->length--; 280 | vstring->data[vstring->length] = 0; 281 | } 282 | } 283 | 284 | /* return a hash value for the string */ 285 | 286 | unsigned 287 | vstring_hash(vstring) 288 | struct vstring * vstring; 289 | { 290 | unsigned hash = 0; 291 | int i; 292 | 293 | for (i = 0; i < vstring->length; i++) { 294 | hash <<= 3; 295 | hash ^= vstring->data[i]; 296 | } 297 | 298 | return hash; 299 | } 300 | 301 | -------------------------------------------------------------------------------- /nobj.c: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | #include 26 | #include 27 | #include 28 | 29 | #include "obj.h" 30 | #include "a.out.h" 31 | 32 | struct exec exec; 33 | struct obj_header hdr; 34 | FILE * fp; 35 | char * path; 36 | int s_flag = 0; 37 | int r_flag = 0; 38 | 39 | error(msg) 40 | char * msg; 41 | { 42 | fprintf(stderr, "obj: "); 43 | if (path) fprintf(stderr, "'%s': ", path); 44 | fprintf(stderr, "%s\n", msg); 45 | exit(1); 46 | } 47 | 48 | /* read 'nr_bytes' bytes into 'buffer' from 'position' in 'fp', 49 | and make sure no errors occur in the process. */ 50 | 51 | input(position, buffer, nr_bytes) 52 | char * buffer; 53 | { 54 | if (fseek(fp, (long) position, SEEK_SET)) 55 | error("seek error"); 56 | 57 | if (fread(buffer, sizeof(char), sizeof(char) * nr_bytes, fp) != (sizeof(char) * nr_bytes)) 58 | error("read error"); 59 | } 60 | 61 | /* print out a NUL-terminated string starting at 'position'. 62 | returns the size of the string printed (excluding terminator). */ 63 | 64 | string(position) 65 | { 66 | int i; 67 | char c; 68 | 69 | for (i = 0; ; ++i) { 70 | input(position + i, &c, 1); 71 | 72 | if (c) 73 | putchar(c); 74 | else 75 | return i; 76 | } 77 | } 78 | 79 | /* print out the name of a symbol, given the symbol's index. */ 80 | 81 | obj_name(index) 82 | { 83 | struct obj_symbol symbol; 84 | 85 | input(OBJ_SYMBOL_OFFSET(hdr, index), &symbol, sizeof(symbol)); 86 | string(OBJ_NAME_OFFSET(hdr, symbol.index)); 87 | } 88 | 89 | /* print out a table of all the symbols in the object */ 90 | 91 | obj_symbols() 92 | { 93 | struct obj_symbol symbol; 94 | int i; 95 | 96 | putchar('\n'); 97 | 98 | if (hdr.nr_symbols) { 99 | for (i = 0; i < hdr.nr_symbols; i++) { 100 | input(OBJ_SYMBOL_OFFSET(hdr, i), &symbol, sizeof(symbol)); 101 | putchar((symbol.flags & OBJ_SYMBOL_GLOBAL) ? '+' : ' '); 102 | 103 | if (symbol.flags & OBJ_SYMBOL_DEFINED) { 104 | switch (OBJ_SYMBOL_GET_SEG(symbol)) 105 | { 106 | case OBJ_SYMBOL_SEG_TEXT: 107 | printf("text %08lx ", symbol.value); 108 | break; 109 | case OBJ_SYMBOL_SEG_DATA: 110 | printf("data %08lx ", symbol.value); 111 | break; 112 | case OBJ_SYMBOL_SEG_ABS: 113 | printf("abs %08lx ", symbol.value); 114 | break; 115 | case OBJ_SYMBOL_SEG_BSS: 116 | printf("bss %08lx,%-5d ", symbol.value, 1 << OBJ_SYMBOL_GET_ALIGN(symbol)); 117 | } 118 | } else 119 | printf("? "); 120 | 121 | obj_name(i); 122 | putchar('\n'); 123 | } 124 | } else 125 | puts("no symbols"); 126 | 127 | putchar('\n'); 128 | } 129 | 130 | /* print out all the relocation records in the file */ 131 | 132 | obj_relocs() 133 | { 134 | struct obj_reloc reloc; 135 | int i; 136 | 137 | putchar('\n'); 138 | 139 | if (hdr.nr_relocs) { 140 | for (i = 0; i < hdr.nr_relocs; i++) { 141 | input(OBJ_RELOC_OFFSET(hdr, i), &reloc, sizeof(reloc)); 142 | putchar((reloc.flags & OBJ_RELOC_REL) ? 'R' : ' '); 143 | switch (OBJ_RELOC_GET_SIZE(reloc)) 144 | { 145 | case OBJ_RELOC_SIZE_8: printf("8 "); break; 146 | case OBJ_RELOC_SIZE_16: printf("16 "); break; 147 | case OBJ_RELOC_SIZE_32: printf("32 "); break; 148 | case OBJ_RELOC_SIZE_64: printf("64 "); break; 149 | } 150 | printf(" @ %s %08lx (", (reloc.flags & OBJ_RELOC_TEXT) ? "text" : "data", reloc.target); 151 | obj_name(reloc.index); 152 | puts(")"); 153 | } 154 | } else 155 | puts("no relocation records"); 156 | 157 | putchar('\n'); 158 | } 159 | 160 | exec_symbols() 161 | { 162 | int symofs; 163 | int length; 164 | long value; 165 | int position; 166 | 167 | putchar('\n'); 168 | if (exec.a_syms) { 169 | symofs = exec.a_text + exec.a_data; 170 | for (position = 0; position < exec.a_syms; ) { 171 | length = string(symofs + position); 172 | position += length + 1; 173 | position += sizeof(value) - 1; 174 | position &= ~(sizeof(value) - 1); 175 | input(symofs + position, &value, sizeof(value)); 176 | while (length < 32) { 177 | length++; 178 | putchar(' '); 179 | } 180 | printf(" %016lx\n", value); 181 | position += sizeof(value); 182 | } 183 | } else 184 | puts("no symbols"); 185 | 186 | putchar('\n'); 187 | } 188 | 189 | exec_relocs() 190 | { 191 | } 192 | 193 | main(argc, argv) 194 | char * argv[]; 195 | { 196 | int opt; 197 | int magic; 198 | 199 | while ((opt = getopt(argc, argv, "rs")) != -1) { 200 | switch (opt) 201 | { 202 | case 'r': r_flag++; break; 203 | case 's': s_flag++; break; 204 | 205 | default: 206 | exit(1); 207 | } 208 | } 209 | 210 | argv = &argv[optind]; 211 | 212 | while (*argv) { 213 | path = *argv; 214 | if (!(fp = fopen(path, "r"))) error("can't open"); 215 | 216 | input(0, &magic, sizeof(magic)); 217 | 218 | if (magic == OBJ_MAGIC) { 219 | input(0, &hdr, sizeof(hdr)); 220 | printf("object file: %s\n", path); 221 | printf(" text size: %d\n", hdr.text_bytes); 222 | printf(" data size: %d\n", hdr.data_bytes); 223 | printf(" # symbols: %d\n", hdr.nr_symbols); 224 | printf(" # relocs: %d\n", hdr.nr_relocs); 225 | printf(" name size: %d\n", hdr.name_bytes); 226 | if (s_flag) obj_symbols(); 227 | if (r_flag) obj_relocs(); 228 | } else if (magic == A_MAGIC) { 229 | input(0, &exec, sizeof(exec)); 230 | printf(" a.out file: %s\n", path); 231 | printf(" text size: %d\n", exec.a_text); 232 | printf(" data size: %d\n", exec.a_data); 233 | printf(" bss size: %d\n", exec.a_bss); 234 | printf(" syms size: %d\n", exec.a_syms); 235 | if (s_flag) exec_symbols(); 236 | if (r_flag) exec_relocs(); 237 | } else error("unknown file format"); 238 | 239 | fclose(fp); 240 | argv++; 241 | } 242 | 243 | return 0; 244 | } 245 | -------------------------------------------------------------------------------- /obj.h: -------------------------------------------------------------------------------- 1 | /* Copyright (c) 2018 Charles E. Youse (charles@gnuless.org). 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 15 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 16 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 17 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 18 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 19 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 20 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 21 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 22 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 23 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ 24 | 25 | /* definitions for relocatable object modules (.o files) */ 26 | 27 | #define OBJ_MAGIC 0x252A2131 28 | 29 | struct obj_header 30 | { 31 | int magic; 32 | unsigned text_bytes; 33 | unsigned data_bytes; 34 | unsigned nr_symbols; 35 | unsigned nr_relocs; 36 | unsigned name_bytes; 37 | unsigned reserved[2]; 38 | }; 39 | 40 | #define OBJ_SYMBOL_GLOBAL 0x80000000 41 | #define OBJ_SYMBOL_DEFINED 0x40000000 42 | 43 | #define OBJ_SYMBOL_SEG_ABS 0x00000000 44 | #define OBJ_SYMBOL_SEG_TEXT 0x00000010 45 | #define OBJ_SYMBOL_SEG_DATA 0x00000020 46 | #define OBJ_SYMBOL_SEG_BSS 0x00000040 47 | 48 | #define OBJ_SYMBOL_GET_SEG(sym) ((sym).flags & 0xF0) 49 | #define OBJ_SYMBOL_SET_SEG(sym,seg) ((sym).flags = ((sym).flags & ~(0xF0)) | ((seg) & 0xF0)) 50 | #define OBJ_SYMBOL_GET_ALIGN(sym) ((sym).flags & 0x02) 51 | #define OBJ_SYMBOL_SET_ALIGN(sym,log2) ((sym).flags = ((sym).flags & ~(0x02)) | ((log2) & 0x02)) 52 | #define OBJ_SYMBOL_VALID_ALIGN(log2) ((log2) <= 3) 53 | 54 | struct obj_symbol 55 | { 56 | int flags; 57 | unsigned index; /* into name section */ 58 | long value; /* or size, if BSS */ 59 | }; 60 | 61 | #define OBJ_RELOC_REL 0x80000000 /* relocation should be relative */ 62 | #define OBJ_RELOC_TEXT 0x40000000 /* target is a text offset */ 63 | #define OBJ_RELOC_DATA 0x20000000 /* target is a data offset */ 64 | #define OBJ_RELOC_SIZE_8 0x00000000 /* fixup size is 8 bytes.. */ 65 | #define OBJ_RELOC_SIZE_16 0x00000001 /* .. 16 .. */ 66 | #define OBJ_RELOC_SIZE_32 0x00000002 /* .. 32 .. */ 67 | #define OBJ_RELOC_SIZE_64 0x00000003 /* .. 64 */ 68 | 69 | #define OBJ_RELOC_GET_SIZE(rel) ((rel).flags & 0x03) 70 | #define OBJ_RELOC_SET_SIZE(rel,sz) ((rel).flags = ((rel).flags & ~(0x03)) | ((sz) & 0x03)) 71 | 72 | struct obj_reloc 73 | { 74 | int flags; 75 | unsigned index; /* into symbol section of referenced symbol */ 76 | unsigned target; /* offset to fixup in target segment */ 77 | unsigned reserved; 78 | }; 79 | 80 | /* macros to determine the beginning positions of the various 81 | sections in the object file, based on header data. */ 82 | 83 | #define OBJ_ALIGN 8 84 | #define OBJ_ROUNDUP(offset) (((offset) % OBJ_ALIGN) ? ((offset) + (OBJ_ALIGN - ((offset) % OBJ_ALIGN))) : (offset)) 85 | 86 | #define OBJ_TEXT_OFFSET(hdr) (sizeof(hdr)) 87 | #define OBJ_DATA_OFFSET(hdr) (OBJ_TEXT_OFFSET(hdr) + OBJ_ROUNDUP((hdr).text_bytes)) 88 | #define OBJ_SYMBOLS_OFFSET(hdr) (OBJ_DATA_OFFSET(hdr) + OBJ_ROUNDUP((hdr).data_bytes)) 89 | #define OBJ_SYMBOL_OFFSET(hdr,n) (OBJ_SYMBOLS_OFFSET(hdr) + ((n) * sizeof(struct obj_symbol))) 90 | #define OBJ_RELOCS_OFFSET(hdr) (OBJ_SYMBOLS_OFFSET(hdr) + ((hdr).nr_symbols * sizeof(struct obj_symbol))) 91 | #define OBJ_RELOC_OFFSET(hdr,n) (OBJ_RELOCS_OFFSET(hdr) + ((n) * sizeof(struct obj_reloc))) 92 | #define OBJ_NAMES_OFFSET(hdr) (OBJ_RELOCS_OFFSET(hdr) + ((hdr).nr_relocs * sizeof(struct obj_reloc))) 93 | #define OBJ_NAME_OFFSET(hdr,n) (OBJ_NAMES_OFFSET(hdr) + (n)) 94 | 95 | --------------------------------------------------------------------------------