├── README.md ├── otcc.c ├── otccelf.c ├── otccelfn.c ├── otccex.c └── otccn.c /README.md: -------------------------------------------------------------------------------- 1 | otcc 2 | ==== 3 | 4 | Obfuscated Tiny C Compiler 5 | 6 | http://www.bellard.org/otcc/ 7 | 8 | 9 | Obfuscated Tiny C Compiler 10 | What is it ? 11 | The Obfuscated Tiny C Compiler (OTCC) is a very small C compiler I wrote in order to win the International Obfuscated C Code Contest (IOCCC) in 2002. 12 | 13 | My goal was to write the smallest C compiler which is able to compile itself. I choose a subset of C which was general enough to write a small C compiler. Then I extended the C subset until I reached the maximum size authorized by the contest: 2048 bytes of C source excluding the ';', '{', '}' and space characters. 14 | 15 | I choose to generate i386 code. The original OTCC code could only run on i386 Linux because it relied on endianness and unaligned access. It generated the program in memory and launched it directly. External symbols were resolved with dlsym(). 16 | 17 | In order to have a portable version of OTCC, I made a variant called OTCCELF. It is only a little larger than OTCC, but it generates directly a dynamically linked i386 ELF executable from a C source without relying on any binutils tools! OTCCELF was tested succesfully on i386 Linux and on Sparc Solaris. 18 | 19 | NOTE: My other project TinyCC which is a fully featured ISOC99 C compiler was written by starting from the source code of OTCC ! 20 | Download 21 | 22 | Original OTCC version (runs only on i386 Linux): otcc.c (link it with -ldl). 23 | OTCC with i386 ELF output (should be portable): otccelf.c. 24 | Example of C program that can be compiled: otccex.c. 25 | [New] The non-obfuscated versions are finally available: otccn.c and otccelfn.c. These non-obfuscated versions do not self compile. They are provided for documentation purpose. 26 | 27 | Compilation: 28 | 29 | gcc -O2 otcc.c -o otcc -ldl 30 | gcc -O2 otccelf.c -o otccelf 31 | 32 | Self-compilation: 33 | 34 | ./otccelf otccelf.c otccelf1 35 | 36 | As a test, here are the executables generated by OTCCELF: otcc1, otccelf1, otccex1. 37 | C Subset Definition 38 | Read joint example otccex.c to have an example of C program. 39 | 40 | Expressions: 41 | binary operators, by decreasing priority order: '*' '/' '%', '+' '-', '>>' '<<', '<' '<=' '>' '>=', '==' '!=', '&', '^', '|', '=', '&&', '||'. 42 | '&&' and '||' have the same semantics as C : left to right evaluation and early exit. 43 | Parenthesis are supported. 44 | Unary operators: '&', '*' (pointer indirection), '-' (negation), '+', '!', '~', post fixed '++' and '--'. 45 | Pointer indirection ('*') only works with explicit cast to 'char *', 'int *' or 'int (*)()' (function pointer). 46 | '++', '--', and unary '&' can only be used with variable lvalue (left value). 47 | '=' can only be used with variable or '*' (pointer indirection) lvalue. 48 | Function calls are supported with standard i386 calling convention. Function pointers are supported with explicit cast. Functions can be used before being declared. 49 | Types: only signed integer ('int') variables and functions can be declared. Variables cannot be initialized in declarations. Only old K&R function declarations are parsed (implicit integer return value and no types on arguments). 50 | Any function or variable from the libc can be used because OTCC uses the libc dynamic linker to resolve undefined symbols. 51 | Instructions: blocks ('{' '}') are supported as in C. 'if' and 'else' can be used for tests. The 'while' and 'for' C constructs are supported for loops. 'break' can be used to exit loops. 'return' is used for the return value of a function. 52 | Identifiers are parsed the same way as C. Local variables are handled, but there is no local name space (not a problem if different names are used for local and global variables). 53 | Numbers can be entered in decimal, hexadecimal ('0x' or '0X' prefix), or octal ('0' prefix). 54 | '#define' is supported without function like arguments. No macro recursion is tolerated. Other preprocessor directives are ignored. 55 | C Strings and C character constants are supported. Only '\n', '\"', '\'' and '\\' escapes are recognized. 56 | C Comments can be used (but no C++ comments). 57 | No error is displayed if an incorrect program is given. 58 | Memory: the code, data, and symbol sizes are limited to 100KB (it can be changed in the source code). 59 | 60 | OTCC Invocation 61 | You can use OTCC by typing: 62 | 63 | otcc prog.c [args]... 64 | 65 | or by giving the C source to its standard input. 'args' are given to the 'main' function of prog.c (argv[0] is prog.c). 66 | 67 | Examples: 68 | 69 | Sample compilation and execution: 70 | 71 | otcc otccex.c 10 72 | 73 | Self compilation: 74 | 75 | otcc otcc.c otccex.c 10 76 | 77 | Self compilation iterated... 78 | 79 | otcc otcc.c otcc.c otccex.c 10 80 | 81 | An alternate syntax is to use it as a script interpreter: you can put 82 | 83 | #!/usr/local/bin/otcc 84 | 85 | at the beginning of your C source if you installed otcc at this place. 86 | OTCCELF Invocation 87 | You can use OTCCELF by typing: 88 | 89 | otccelf prog.c prog 90 | chmod 755 prog 91 | 92 | 'prog' is the name of the ELF file you want to generate. 93 | 94 | Note that even if the generated i386 code is not as good as GCC, the resulting ELF executables are much smaller for small sources. Try this program: 95 | 96 | #include 97 | 98 | main() 99 | { 100 | printf("Hello World\n"); 101 | return 0; 102 | } 103 | 104 | Results: 105 | Compiler Executable size (in bytes) 106 | OTCCELF 424 107 | GCC (stripped) 2448 108 | Links 109 | 110 | TinyCC, a tiny but complete C compiler. 111 | Tiny ELF programs from Brian Raiter. 112 | Linux assembly projects. 113 | 114 | License 115 | The obfuscated OTCC and OTCCELF are public domain. 116 | The non-obfuscated versions are released under a BSD-like license (read the license at the start of the source code). 117 | This page is Copyright (c) 2002 Fabrice Bellard Fabrice Bellard - http://bellard.org/ - http://www.tinycc.org/ 118 | -------------------------------------------------------------------------------- /otcc.c: -------------------------------------------------------------------------------- 1 | #include 2 | #define k *(int*) 3 | #define a if( 4 | #define c ad() 5 | #define i else 6 | #define p while( 7 | #define x *(char*) 8 | #define b == 9 | #define V =calloc(1,99999) 10 | #define f () 11 | #define J return 12 | #define l ae( 13 | #define n e) 14 | #define u d!= 15 | #define F int 16 | #define y (j) 17 | #define r m= 18 | #define t +4 19 | F d,z,C,h,P,K,ac,q,G,v,Q,R,D,L,W,M; 20 | E(n{ 21 | x D++=e; 22 | } 23 | o f{ 24 | a L){ 25 | h=x L++; 26 | a h b 2){ 27 | L=0; 28 | h=W; 29 | } 30 | } 31 | i h=fgetc(Q); 32 | } 33 | X f{ 34 | J isalnum(h)|h b 95; 35 | } 36 | Y f{ 37 | a h b 92){ 38 | o f; 39 | a h b 110)h=10; 40 | } 41 | } 42 | c{ 43 | F e,j,m; 44 | p isspace(h)|h b 35){ 45 | a h b 35){ 46 | o f; 47 | c; 48 | a d b 536){ 49 | c; 50 | E(32); 51 | k d=1; 52 | k(d t)=D; 53 | } 54 | p h!=10){ 55 | E(h); 56 | o f; 57 | } 58 | E(h); 59 | E(2); 60 | } 61 | o f; 62 | } 63 | C=0; 64 | d=h; 65 | a X f){ 66 | E(32); 67 | M=D; 68 | p X f){ 69 | E(h); 70 | o f; 71 | } 72 | a isdigit(d)){ 73 | z=strtol(M,0,0); 74 | d=2; 75 | } 76 | i{ 77 | x D=32; 78 | d=strstr(R,M-1)-R; 79 | x D=0; 80 | d=d*8+256; 81 | a d>536){ 82 | d=P+d; 83 | a k d b 1){ 84 | L=k(d t); 85 | W=h; 86 | o f; 87 | c; 88 | } 89 | } 90 | } 91 | } 92 | i{ 93 | o f; 94 | a d b 39){ 95 | d=2; 96 | Y f; 97 | z=h; 98 | o f; 99 | o f; 100 | } 101 | i a d b 47&h b 42){ 102 | o f; 103 | p h){ 104 | p h!=42)o f; 105 | o f; 106 | a h b 47)h=0; 107 | } 108 | o f; 109 | c; 110 | } 111 | i{ 112 | e="++#m--%am*@R<^1c/@%[_[H3c%@%[_[H3c+@.B#d-@%:_^BKd<>`/03e<=0f>=/f<@.f>@1f==&g!='g&&k||#l&@.BCh^@.BSi|@.B+j~@/%Yd!@&d*@b"; 113 | p j=x e++){ 114 | r x e++; 115 | z=0; 116 | p(C=x e++-98)<0)z=z*64+C+64; 117 | a j b d&(m b h|m b 64)){ 118 | a m b h){ 119 | o f; 120 | d=1; 121 | } 122 | break; 123 | } 124 | } 125 | } 126 | } 127 | } 128 | l g){ 129 | p g&&g!=-1){ 130 | x q++=g; 131 | g=g>>8; 132 | } 133 | } 134 | A(n{ 135 | F g; 136 | p n{ 137 | g=k e; 138 | k e=q-e-4; 139 | e=g; 140 | } 141 | } 142 | s(g,n{ 143 | l g); 144 | k q=e; 145 | e=q; 146 | q=q t; 147 | J e; 148 | } 149 | H(n{ 150 | s(184,n; 151 | } 152 | B(n{ 153 | J s(233,n; 154 | } 155 | S(j,n{ 156 | l 1032325); 157 | J s(132+j,n; 158 | } 159 | Z(n{ 160 | l 49465); 161 | H(0); 162 | l 15); 163 | l e+144); 164 | l 192); 165 | } 166 | N(j,n{ 167 | l j+131); 168 | s((e<512)<<7|5,n; 169 | } 170 | T y{ 171 | F g,e,m,aa; 172 | g=1; 173 | a d b 34){ 174 | H(v); 175 | p h!=34){ 176 | Y f; 177 | x v++=h; 178 | o f; 179 | } 180 | x v=0; 181 | v=v t&-4; 182 | o f; 183 | c; 184 | } 185 | i{ 186 | aa=C; 187 | r z; 188 | e=d; 189 | c; 190 | a e b 2){ 191 | H(m); 192 | } 193 | i a aa b 2){ 194 | T(0); 195 | s(185,0); 196 | a e b 33)Z(m); 197 | i l m); 198 | } 199 | i a e b 40){ 200 | w f; 201 | c; 202 | } 203 | i a e b 42){ 204 | c; 205 | e=d; 206 | c; 207 | c; 208 | a d b 42){ 209 | c; 210 | c; 211 | c; 212 | c; 213 | e=0; 214 | } 215 | c; 216 | T(0); 217 | a d b 61){ 218 | c; 219 | l 80); 220 | w f; 221 | l 89); 222 | l 392+(e b 256)); 223 | } 224 | i a n{ 225 | a e b 256)l 139); 226 | i l 48655); 227 | q++; 228 | } 229 | } 230 | i a e b 38){ 231 | N(10,k d); 232 | c; 233 | } 234 | i{ 235 | g=k e; 236 | a!g)g=dlsym(0,M); 237 | a d b 61&j){ 238 | c; 239 | w f; 240 | N(6,g); 241 | } 242 | i a u 40){ 243 | N(8,g); 244 | a C b 11){ 245 | N(0,g); 246 | l z); 247 | c; 248 | } 249 | } 250 | } 251 | } 252 | a d b 40){ 253 | a g b 1)l 80); 254 | r s(60545,0); 255 | c; 256 | j=0; 257 | p u 41){ 258 | w f; 259 | s(2393225,j); 260 | a d b 44)c; 261 | j=j t; 262 | } 263 | k r j; 264 | c; 265 | a!g){ 266 | e=e t; 267 | k e=s(232,k n; 268 | } 269 | i a g b 1){ 270 | s(2397439,j); 271 | j=j t; 272 | } 273 | i{ 274 | s(232,g-q-5); 275 | } 276 | a j)s(50305,j); 277 | } 278 | } 279 | O y{ 280 | F e,g,m; 281 | a j--b 1)T(1); 282 | i{ 283 | O y; 284 | r 0; 285 | p j b C){ 286 | g=d; 287 | e=z; 288 | c; 289 | a j>8){ 290 | r S(e,m); 291 | O y; 292 | } 293 | i{ 294 | l 80); 295 | O y; 296 | l 89); 297 | a j b 4|j b 5){ 298 | Z(n; 299 | } 300 | i{ 301 | l n; 302 | a g b 37)l 146); 303 | } 304 | } 305 | } 306 | a m&&j>8){ 307 | r S(e,m); 308 | H(e^1); 309 | B(5); 310 | A(m); 311 | H(n; 312 | } 313 | } 314 | } 315 | w f{ 316 | O(11); 317 | } 318 | U f{ 319 | w f; 320 | J S(0,0); 321 | } 322 | I y{ 323 | F m,g,e; 324 | a d b 288){ 325 | c; 326 | c; 327 | r U f; 328 | c; 329 | I y; 330 | a d b 312){ 331 | c; 332 | g=B(0); 333 | A(m); 334 | I y; 335 | A(g); 336 | } 337 | i{ 338 | A(m); 339 | } 340 | } 341 | i a d b 352|d b 504){ 342 | e=d; 343 | c; 344 | c; 345 | a e b 352){ 346 | g=q; 347 | r U f; 348 | } 349 | i{ 350 | a u 59)w f; 351 | c; 352 | g=q; 353 | r 0; 354 | a u 59)r U f; 355 | c; 356 | a u 41){ 357 | e=B(0); 358 | w f; 359 | B(g-q-5); 360 | A(n; 361 | g=e t; 362 | } 363 | } 364 | c; 365 | I(&m); 366 | B(g-q-5); 367 | A(m); 368 | } 369 | i a d b 123){ 370 | c; 371 | ab(1); 372 | p u 125)I y; 373 | c; 374 | } 375 | i{ 376 | a d b 448){ 377 | c; 378 | a u 59)w f; 379 | K=B(K); 380 | } 381 | i a d b 400){ 382 | c; 383 | k j=B(k j); 384 | } 385 | i a u 59)w f; 386 | c; 387 | } 388 | } 389 | ab y{ 390 | F m; 391 | p d b 256|u-1&!j){ 392 | a d b 256){ 393 | c; 394 | p u 59){ 395 | a j){ 396 | G=G t; 397 | k d=-G; 398 | } 399 | i{ 400 | k d=v; 401 | v=v t; 402 | } 403 | c; 404 | a d b 44)c; 405 | } 406 | c; 407 | } 408 | i{ 409 | A(k(d t)); 410 | k d=q; 411 | c; 412 | c; 413 | r 8; 414 | p u 41){ 415 | k d=m; 416 | r m t; 417 | c; 418 | a d b 44)c; 419 | } 420 | c; 421 | K=G=0; 422 | l 15042901); 423 | r s(60545,0); 424 | I(0); 425 | A(K); 426 | l 50121); 427 | k r G; 428 | } 429 | } 430 | } 431 | main(g,n{ 432 | Q=stdin; 433 | a g-->1){ 434 | e=e t; 435 | Q=fopen(k e,"r"); 436 | } 437 | D=strcpy(R V," int if else while break return for define main ")+48; 438 | v V; 439 | q=ac V; 440 | P V; 441 | o f; 442 | c; 443 | ab(0); 444 | J(*(int(*)f)k(P+592))(g,n; 445 | } 446 | 447 | -------------------------------------------------------------------------------- /otccelf.c: -------------------------------------------------------------------------------- 1 | #include 2 | #define r *(char*) 3 | #define b if( 4 | #define t *(int*) 5 | #define k else 6 | #define u while( 7 | #define g av() 8 | #define c ax( 9 | #define f == 10 | #define aj =calloc(1,99999) 11 | #define j () 12 | #define O return 13 | #define l a) 14 | #define A int 15 | #define n 0) 16 | #define o aw( 17 | #define p 1) 18 | #define q s) 19 | A e,C,J,m,T,U,K,v,P,i,ak,Q,D,V,al,Z,G,R,y; 20 | L(l{ 21 | r D++=a; 22 | } 23 | w j{ 24 | b V){ 25 | m=r V++; 26 | b m f 2){ 27 | V=0; 28 | m=al; 29 | } 30 | } 31 | k m=fgetc(ak); 32 | } 33 | am j{ 34 | O isalnum(m)|m f 95; 35 | } 36 | an j{ 37 | b m f 92){ 38 | w j; 39 | b m f 110)m=10; 40 | } 41 | } 42 | g{ 43 | A a,s,h; 44 | u isspace(m)|m f 35){ 45 | b m f 35){ 46 | w j; 47 | g; 48 | b e f 536){ 49 | g; 50 | L(32); 51 | t e=1; 52 | t(e+4)=D; 53 | } 54 | u m!=10){ 55 | L(m); 56 | w j; 57 | } 58 | L(m); 59 | L(2); 60 | } 61 | w j; 62 | } 63 | J=0; 64 | e=m; 65 | b am j){ 66 | L(32); 67 | Z=D; 68 | u am j){ 69 | L(m); 70 | w j; 71 | } 72 | b isdigit(e)){ 73 | C=strtol(Z,0,n; 74 | e=2; 75 | } 76 | k{ 77 | r D=32; 78 | e=strstr(Q,Z-p-Q; 79 | r D=0; 80 | e=e*8+256; 81 | b e>536){ 82 | e=T+e; 83 | b t e f p{ 84 | V=t(e+4); 85 | al=m; 86 | w j; 87 | g; 88 | } 89 | } 90 | } 91 | } 92 | k{ 93 | w j; 94 | b e f 39){ 95 | e=2; 96 | an j; 97 | C=m; 98 | w j; 99 | w j; 100 | } 101 | k b e f 47&m f 42){ 102 | w j; 103 | u m){ 104 | u m!=42)w j; 105 | w j; 106 | b m f 47)m=0; 107 | } 108 | w j; 109 | g; 110 | } 111 | k{ 112 | a="++#m--%am*@R<^1c/@%[_[H3c%@%[_[H3c+@.B#d-@%:_^BKd<>`/03e<=0f>=/f<@.f>@1f==&g!='g&&k||#l&@.BCh^@.BSi|@.B+j~@/%Yd!@&d*@b"; 113 | u s=r a++){ 114 | h=r a++; 115 | C=0; 116 | u(J=r a++-98)>8; 132 | } 133 | } 134 | E(a,d){ 135 | r a++=d; 136 | r a++=d>>8; 137 | r a++=d>>16; 138 | r a++=d>>24; 139 | } 140 | ao(l{ 141 | A d; 142 | return(r a&255)|(r(a+p&255)<<8|(r(a+2)&255)<<16|(r(a+3)&255)<<24; 143 | } 144 | ap(a,z){ 145 | A d; 146 | u l{ 147 | d=ao(l; 148 | b r(a-p f 5){ 149 | b z>=G&&z8){ 311 | h=aa(a,h); 312 | X(q; 313 | } 314 | k{ 315 | o 80); 316 | X(q; 317 | o 89); 318 | b s f 4|s f 5){ 319 | aq(l; 320 | } 321 | k{ 322 | o l; 323 | b d f 37)o 146); 324 | } 325 | } 326 | } 327 | b h&&s>8){ 328 | h=aa(a,h); 329 | M(a^p; 330 | I(5); 331 | H(h); 332 | M(l; 333 | } 334 | } 335 | } 336 | B j{ 337 | X(11); 338 | } 339 | ac j{ 340 | B j; 341 | O aa(0,n; 342 | } 343 | S(q{ 344 | A h,d,a; 345 | b e f 288){ 346 | g; 347 | g; 348 | h=ac j; 349 | g; 350 | S(q; 351 | b e f 312){ 352 | g; 353 | d=I(n; 354 | H(h); 355 | S(q; 356 | H(d); 357 | } 358 | k{ 359 | H(h); 360 | } 361 | } 362 | k b e f 352|e f 504){ 363 | a=e; 364 | g; 365 | g; 366 | b a f 352){ 367 | d=v; 368 | h=ac j; 369 | } 370 | k{ 371 | b e!=59)B j; 372 | g; 373 | d=v; 374 | h=0; 375 | b e!=59)h=ac j; 376 | g; 377 | b e!=41){ 378 | a=I(n; 379 | B j; 380 | I(d-v-5); 381 | H(l; 382 | d=a+4; 383 | } 384 | } 385 | g; 386 | S(&h); 387 | I(d-v-5); 388 | H(h); 389 | } 390 | k b e f 123){ 391 | g; 392 | ar(p; 393 | u e!=125)S(q; 394 | g; 395 | } 396 | k{ 397 | b e f 448){ 398 | g; 399 | b e!=59)B j; 400 | U=I(U); 401 | } 402 | k b e f 400){ 403 | g; 404 | t s=I(t q; 405 | } 406 | k b e!=59)B j; 407 | g; 408 | } 409 | } 410 | ar(q{ 411 | A h; 412 | u e f 256|e!=-1&!q{ 413 | b e f 256){ 414 | g; 415 | u e!=59){ 416 | b q{ 417 | P=P+4; 418 | t e=-P; 419 | } 420 | k{ 421 | t e=i; 422 | i=i+4; 423 | } 424 | g; 425 | b e f 44)g; 426 | } 427 | g; 428 | } 429 | k{ 430 | t e=v; 431 | g; 432 | g; 433 | h=8; 434 | u e!=41){ 435 | t e=h; 436 | h=h+4; 437 | g; 438 | b e f 44)g; 439 | } 440 | g; 441 | U=P=0; 442 | o 15042901); 443 | h=x(60545,n; 444 | S(n; 445 | H(U); 446 | o 50121); 447 | E(h,P); 448 | } 449 | } 450 | } 451 | c d){ 452 | E(i,d); 453 | i=i+4; 454 | } 455 | ad(d,l{ 456 | c d); 457 | d=d+134512640; 458 | c d); 459 | c d); 460 | c l; 461 | c l; 462 | } 463 | ae(q{ 464 | A a,h,d,N,z,F; 465 | N=0; 466 | a=Q; 467 | u p{ 468 | a++; 469 | h=a; 470 | u r a!=32&&a 24 | #endif 25 | #include 26 | 27 | /* vars: value of variables 28 | loc : local variable index 29 | glo : global variable ptr 30 | data: base of data segment 31 | ind : output code ptr 32 | prog: output code 33 | rsym: return symbol 34 | sym_stk: symbol stack 35 | dstk: symbol stack pointer 36 | dptr, dch: macro state 37 | 38 | * 'vars' format: 39 | For each character TAG_TOK at offset 'i' before a 40 | symbol in sym_stk, we have: 41 | v = (int *)(vars + 8 * i + TOK_IDENT)[0] 42 | p = (int *)(vars + 8 * i + TOK_IDENT)[0] 43 | 44 | v = 0 : undefined symbol, p = list of use points. 45 | v = 1 : define symbol, p = pointer to define text. 46 | v < LOCAL: offset on stack, p = 0. 47 | otherwise: symbol with value 'v', p = list of use points. 48 | 49 | * 'sym_stk' format: 50 | TAG_TOK sym1 TAG_TOK sym2 .... symN '\0' 51 | 'dstk' points to the last '\0'. 52 | */ 53 | int tok, tokc, tokl, ch, vars, rsym, prog, ind, loc, glo, file, sym_stk, dstk, dptr, dch, last_id, data, text, data_offset; 54 | 55 | #define ALLOC_SIZE 99999 56 | 57 | #define ELFOUT 58 | 59 | /* depends on the init string */ 60 | #define TOK_STR_SIZE 48 61 | #define TOK_IDENT 0x100 62 | #define TOK_INT 0x100 63 | #define TOK_IF 0x120 64 | #define TOK_ELSE 0x138 65 | #define TOK_WHILE 0x160 66 | #define TOK_BREAK 0x190 67 | #define TOK_RETURN 0x1c0 68 | #define TOK_FOR 0x1f8 69 | #define TOK_DEFINE 0x218 70 | #define TOK_MAIN 0x250 71 | 72 | #define TOK_DUMMY 1 73 | #define TOK_NUM 2 74 | 75 | #define LOCAL 0x200 76 | 77 | #define SYM_FORWARD 0 78 | #define SYM_DEFINE 1 79 | 80 | /* tokens in string heap */ 81 | #define TAG_TOK ' ' 82 | #define TAG_MACRO 2 83 | 84 | /* additionnal elf output defines */ 85 | #ifdef ELFOUT 86 | 87 | #define ELF_BASE 0x08048000 88 | #define PHDR_OFFSET 0x30 89 | 90 | #define INTERP_OFFSET 0x90 91 | #define INTERP_SIZE 0x13 92 | 93 | #ifndef TINY 94 | #define DYNAMIC_OFFSET (INTERP_OFFSET + INTERP_SIZE + 1) 95 | #define DYNAMIC_SIZE (11*8) 96 | 97 | #define ELFSTART_SIZE (DYNAMIC_OFFSET + DYNAMIC_SIZE) 98 | #else 99 | #define DYNAMIC_OFFSET 0xa4 100 | #define DYNAMIC_SIZE 0x58 101 | 102 | #define ELFSTART_SIZE 0xfc 103 | #endif 104 | 105 | /* size of startup code */ 106 | #define STARTUP_SIZE 17 107 | 108 | /* size of library names at the start of the .dynstr section */ 109 | #define DYNSTR_BASE 22 110 | 111 | #endif 112 | 113 | pdef(t) 114 | { 115 | *(char *)dstk++ = t; 116 | } 117 | 118 | inp() 119 | { 120 | if (dptr) { 121 | ch = *(char *)dptr++; 122 | if (ch == TAG_MACRO) { 123 | dptr = 0; 124 | ch = dch; 125 | } 126 | } else 127 | ch = fgetc(file); 128 | /* printf("ch=%c 0x%x\n", ch, ch); */ 129 | } 130 | 131 | isid() 132 | { 133 | return isalnum(ch) | ch == '_'; 134 | } 135 | 136 | /* read a character constant */ 137 | getq() 138 | { 139 | if (ch == '\\') { 140 | inp(); 141 | if (ch == 'n') 142 | ch = '\n'; 143 | } 144 | } 145 | 146 | next() 147 | { 148 | int t, l, a; 149 | 150 | while (isspace(ch) | ch == '#') { 151 | if (ch == '#') { 152 | inp(); 153 | next(); 154 | if (tok == TOK_DEFINE) { 155 | next(); 156 | pdef(TAG_TOK); /* fill last ident tag */ 157 | *(int *)tok = SYM_DEFINE; 158 | *(int *)(tok + 4) = dstk; /* define stack */ 159 | } 160 | /* well we always save the values ! */ 161 | while (ch != '\n') { 162 | pdef(ch); 163 | inp(); 164 | } 165 | pdef(ch); 166 | pdef(TAG_MACRO); 167 | } 168 | inp(); 169 | } 170 | tokl = 0; 171 | tok = ch; 172 | /* encode identifiers & numbers */ 173 | if (isid()) { 174 | pdef(TAG_TOK); 175 | last_id = dstk; 176 | while (isid()) { 177 | pdef(ch); 178 | inp(); 179 | } 180 | if (isdigit(tok)) { 181 | tokc = strtol(last_id, 0, 0); 182 | tok = TOK_NUM; 183 | } else { 184 | *(char *)dstk = TAG_TOK; /* no need to mark end of string (we 185 | suppose data is initied to zero */ 186 | tok = strstr(sym_stk, last_id - 1) - sym_stk; 187 | *(char *)dstk = 0; /* mark real end of ident for dlsym() */ 188 | tok = tok * 8 + TOK_IDENT; 189 | if (tok > TOK_DEFINE) { 190 | tok = vars + tok; 191 | /* printf("tok=%s %x\n", last_id, tok); */ 192 | /* define handling */ 193 | if (*(int *)tok == SYM_DEFINE) { 194 | dptr = *(int *)(tok + 4); 195 | dch = ch; 196 | inp(); 197 | next(); 198 | } 199 | } 200 | } 201 | } else { 202 | inp(); 203 | if (tok == '\'') { 204 | tok = TOK_NUM; 205 | getq(); 206 | tokc = ch; 207 | inp(); 208 | inp(); 209 | } else if (tok == '/' & ch == '*') { 210 | inp(); 211 | while (ch) { 212 | while (ch != '*') 213 | inp(); 214 | inp(); 215 | if (ch == '/') 216 | ch = 0; 217 | } 218 | inp(); 219 | next(); 220 | } else 221 | { 222 | t = "++#m--%am*@R<^1c/@%[_[H3c%@%[_[H3c+@.B#d-@%:_^BKd<>`/03e<=0f>=/f<@.f>@1f==&g!=\'g&&k||#l&@.BCh^@.BSi|@.B+j~@/%Yd!@&d*@b"; 223 | while (l = *(char *)t++) { 224 | a = *(char *)t++; 225 | tokc = 0; 226 | while ((tokl = *(char *)t++ - 'b') < 0) 227 | tokc = tokc * 64 + tokl + 64; 228 | if (l == tok & (a == ch | a == '@')) { 229 | #if 0 230 | printf("%c%c -> tokl=%d tokc=0x%x\n", 231 | l, a, tokl, tokc); 232 | #endif 233 | if (a == ch) { 234 | inp(); 235 | tok = TOK_DUMMY; /* dummy token for double tokens */ 236 | } 237 | break; 238 | } 239 | } 240 | } 241 | } 242 | #if 0 243 | { 244 | int p; 245 | 246 | printf("tok=0x%x ", tok); 247 | if (tok >= TOK_IDENT) { 248 | printf("'"); 249 | if (tok > TOK_DEFINE) 250 | p = sym_stk + 1 + (tok - vars - TOK_IDENT) / 8; 251 | else 252 | p = sym_stk + 1 + (tok - TOK_IDENT) / 8; 253 | while (*(char *)p != TAG_TOK && *(char *)p) 254 | printf("%c", *(char *)p++); 255 | printf("'\n"); 256 | } else if (tok == TOK_NUM) { 257 | printf("%d\n", tokc); 258 | } else { 259 | printf("'%c'\n", tok); 260 | } 261 | } 262 | #endif 263 | } 264 | 265 | #ifdef TINY 266 | #define skip(c) next() 267 | #else 268 | 269 | void error(char *fmt,...) 270 | { 271 | va_list ap; 272 | 273 | va_start(ap, fmt); 274 | fprintf(stderr, "%d: ", ftell((FILE *)file)); 275 | vfprintf(stderr, fmt, ap); 276 | fprintf(stderr, "\n"); 277 | exit(1); 278 | va_end(ap); 279 | } 280 | 281 | void skip(c) 282 | { 283 | if (tok != c) { 284 | error("'%c' expected", c); 285 | } 286 | next(); 287 | } 288 | 289 | #endif 290 | 291 | /* from 0 to 4 bytes */ 292 | o(n) 293 | { 294 | /* cannot use unsigned, so we must do a hack */ 295 | while (n && n != -1) { 296 | *(char *)ind++ = n; 297 | n = n >> 8; 298 | } 299 | } 300 | 301 | #ifdef ELFOUT 302 | 303 | /* put a 32 bit little endian word 'n' at unaligned address 't' */ 304 | put32(t, n) 305 | { 306 | *(char *)t++ = n; 307 | *(char *)t++ = n >> 8; 308 | *(char *)t++ = n >> 16; 309 | *(char *)t++ = n >> 24; 310 | } 311 | 312 | /* get a 32 bit little endian word at unaligned address 't' */ 313 | get32(t) 314 | { 315 | int n; 316 | return (*(char *)t & 0xff) | 317 | (*(char *)(t + 1) & 0xff) << 8 | 318 | (*(char *)(t + 2) & 0xff) << 16 | 319 | (*(char *)(t + 3) & 0xff) << 24; 320 | } 321 | 322 | #else 323 | 324 | #define put32(t, n) *(int *)t = n 325 | #define get32(t) *(int *)t 326 | 327 | #endif 328 | 329 | /* output a symbol and patch all references to it */ 330 | gsym1(t, b) 331 | { 332 | int n; 333 | while (t) { 334 | n = get32(t); /* next value */ 335 | /* patch absolute reference (always mov/lea before) */ 336 | if (*(char *)(t - 1) == 0x05) { 337 | /* XXX: incorrect if data < 0 */ 338 | if (b >= data && b < glo) 339 | put32(t, b + data_offset); 340 | else 341 | put32(t, b - prog + text + data_offset); 342 | } else { 343 | put32(t, b - t - 4); 344 | } 345 | t = n; 346 | } 347 | } 348 | 349 | gsym(t) 350 | { 351 | gsym1(t, ind); 352 | } 353 | 354 | /* psym is used to put an instruction with a data field which is a 355 | reference to a symbol. It is in fact the same as oad ! */ 356 | #define psym oad 357 | 358 | /* instruction + address */ 359 | oad(n, t) 360 | { 361 | o(n); 362 | put32(ind, t); 363 | t = ind; 364 | ind = ind + 4; 365 | return t; 366 | } 367 | 368 | /* load immediate value */ 369 | li(t) 370 | { 371 | oad(0xb8, t); /* mov $xx, %eax */ 372 | } 373 | 374 | gjmp(t) 375 | { 376 | return psym(0xe9, t); 377 | } 378 | 379 | /* l = 0: je, l == 1: jne */ 380 | gtst(l, t) 381 | { 382 | o(0x0fc085); /* test %eax, %eax, je/jne xxx */ 383 | return psym(0x84 + l, t); 384 | } 385 | 386 | gcmp(t) 387 | { 388 | o(0xc139); /* cmp %eax,%ecx */ 389 | li(0); 390 | o(0x0f); /* setxx %al */ 391 | o(t + 0x90); 392 | o(0xc0); 393 | } 394 | 395 | gmov(l, t) 396 | { 397 | int n; 398 | o(l + 0x83); 399 | n = *(int *)t; 400 | if (n && n < LOCAL) 401 | oad(0x85, n); 402 | else { 403 | t = t + 4; 404 | *(int *)t = psym(0x05, *(int *)t); 405 | } 406 | } 407 | 408 | /* l is one if '=' parsing wanted (quick hack) */ 409 | unary(l) 410 | { 411 | int n, t, a, c; 412 | 413 | n = 1; /* type of expression 0 = forward, 1 = value, other = 414 | lvalue */ 415 | if (tok == '\"') { 416 | li(glo + data_offset); 417 | while (ch != '\"') { 418 | getq(); 419 | *(char *)glo++ = ch; 420 | inp(); 421 | } 422 | *(char *)glo = 0; 423 | glo = glo + 4 & -4; /* align heap */ 424 | inp(); 425 | next(); 426 | } else { 427 | c = tokl; 428 | a = tokc; 429 | t = tok; 430 | next(); 431 | if (t == TOK_NUM) { 432 | li(a); 433 | } else if (c == 2) { 434 | /* -, +, !, ~ */ 435 | unary(0); 436 | oad(0xb9, 0); /* movl $0, %ecx */ 437 | if (t == '!') 438 | gcmp(a); 439 | else 440 | o(a); 441 | } else if (t == '(') { 442 | expr(); 443 | skip(')'); 444 | } else if (t == '*') { 445 | /* parse cast */ 446 | skip('('); 447 | t = tok; /* get type */ 448 | next(); /* skip int/char/void */ 449 | next(); /* skip '*' or '(' */ 450 | if (tok == '*') { 451 | /* function type */ 452 | skip('*'); 453 | skip(')'); 454 | skip('('); 455 | skip(')'); 456 | t = 0; 457 | } 458 | skip(')'); 459 | unary(0); 460 | if (tok == '=') { 461 | next(); 462 | o(0x50); /* push %eax */ 463 | expr(); 464 | o(0x59); /* pop %ecx */ 465 | o(0x0188 + (t == TOK_INT)); /* movl %eax/%al, (%ecx) */ 466 | } else if (t) { 467 | if (t == TOK_INT) 468 | o(0x8b); /* mov (%eax), %eax */ 469 | else 470 | o(0xbe0f); /* movsbl (%eax), %eax */ 471 | ind++; /* add zero in code */ 472 | } 473 | } else if (t == '&') { 474 | gmov(10, tok); /* leal EA, %eax */ 475 | next(); 476 | } else { 477 | n = 0; 478 | if (tok == '=' & l) { 479 | /* assignment */ 480 | next(); 481 | expr(); 482 | gmov(6, t); /* mov %eax, EA */ 483 | } else if (tok != '(') { 484 | /* variable */ 485 | gmov(8, t); /* mov EA, %eax */ 486 | if (tokl == 11) { 487 | gmov(0, t); 488 | o(tokc); 489 | next(); 490 | } 491 | } 492 | } 493 | } 494 | 495 | /* function call */ 496 | if (tok == '(') { 497 | if (n) 498 | o(0x50); /* push %eax */ 499 | 500 | /* push args and invert order */ 501 | a = oad(0xec81, 0); /* sub $xxx, %esp */ 502 | next(); 503 | l = 0; 504 | while(tok != ')') { 505 | expr(); 506 | oad(0x248489, l); /* movl %eax, xxx(%esp) */ 507 | if (tok == ',') 508 | next(); 509 | l = l + 4; 510 | } 511 | put32(a, l); 512 | next(); 513 | if (n) { 514 | oad(0x2494ff, l); /* call *xxx(%esp) */ 515 | l = l + 4; 516 | } else { 517 | /* forward reference */ 518 | t = t + 4; 519 | *(int *)t = psym(0xe8, *(int *)t); 520 | } 521 | if (l) 522 | oad(0xc481, l); /* add $xxx, %esp */ 523 | } 524 | } 525 | 526 | sum(l) 527 | { 528 | int t, n, a; 529 | 530 | if (l-- == 1) 531 | unary(1); 532 | else { 533 | sum(l); 534 | a = 0; 535 | while (l == tokl) { 536 | n = tok; 537 | t = tokc; 538 | next(); 539 | 540 | if (l > 8) { 541 | a = gtst(t, a); /* && and || output code generation */ 542 | sum(l); 543 | } else { 544 | o(0x50); /* push %eax */ 545 | sum(l); 546 | o(0x59); /* pop %ecx */ 547 | 548 | if (l == 4 | l == 5) { 549 | gcmp(t); 550 | } else { 551 | o(t); 552 | if (n == '%') 553 | o(0x92); /* xchg %edx, %eax */ 554 | } 555 | } 556 | } 557 | /* && and || output code generation */ 558 | if (a && l > 8) { 559 | a = gtst(t, a); 560 | li(t ^ 1); 561 | gjmp(5); /* jmp $ + 5 */ 562 | gsym(a); 563 | li(t); 564 | } 565 | } 566 | } 567 | 568 | expr() 569 | { 570 | sum(11); 571 | } 572 | 573 | 574 | test_expr() 575 | { 576 | expr(); 577 | return gtst(0, 0); 578 | } 579 | 580 | block(l) 581 | { 582 | int a, n, t; 583 | 584 | if (tok == TOK_IF) { 585 | next(); 586 | skip('('); 587 | a = test_expr(); 588 | skip(')'); 589 | block(l); 590 | if (tok == TOK_ELSE) { 591 | next(); 592 | n = gjmp(0); /* jmp */ 593 | gsym(a); 594 | block(l); 595 | gsym(n); /* patch else jmp */ 596 | } else { 597 | gsym(a); /* patch if test */ 598 | } 599 | } else if (tok == TOK_WHILE | tok == TOK_FOR) { 600 | t = tok; 601 | next(); 602 | skip('('); 603 | if (t == TOK_WHILE) { 604 | n = ind; 605 | a = test_expr(); 606 | } else { 607 | if (tok != ';') 608 | expr(); 609 | skip(';'); 610 | n = ind; 611 | a = 0; 612 | if (tok != ';') 613 | a = test_expr(); 614 | skip(';'); 615 | if (tok != ')') { 616 | t = gjmp(0); 617 | expr(); 618 | gjmp(n - ind - 5); 619 | gsym(t); 620 | n = t + 4; 621 | } 622 | } 623 | skip(')'); 624 | block(&a); 625 | gjmp(n - ind - 5); /* jmp */ 626 | gsym(a); 627 | } else if (tok == '{') { 628 | next(); 629 | /* declarations */ 630 | decl(1); 631 | while(tok != '}') 632 | block(l); 633 | next(); 634 | } else { 635 | if (tok == TOK_RETURN) { 636 | next(); 637 | if (tok != ';') 638 | expr(); 639 | rsym = gjmp(rsym); /* jmp */ 640 | } else if (tok == TOK_BREAK) { 641 | next(); 642 | *(int *)l = gjmp(*(int *)l); 643 | } else if (tok != ';') 644 | expr(); 645 | skip(';'); 646 | } 647 | } 648 | 649 | /* 'l' is true if local declarations */ 650 | decl(l) 651 | { 652 | int a; 653 | 654 | while (tok == TOK_INT | tok != -1 & !l) { 655 | if (tok == TOK_INT) { 656 | next(); 657 | while (tok != ';') { 658 | if (l) { 659 | loc = loc + 4; 660 | *(int *)tok = -loc; 661 | } else { 662 | *(int *)tok = glo; 663 | glo = glo + 4; 664 | } 665 | next(); 666 | if (tok == ',') 667 | next(); 668 | } 669 | skip(';'); 670 | } else { 671 | /* put function address */ 672 | *(int *)tok = ind; 673 | next(); 674 | skip('('); 675 | a = 8; 676 | while (tok != ')') { 677 | /* read param name and compute offset */ 678 | *(int *)tok = a; 679 | a = a + 4; 680 | next(); 681 | if (tok == ',') 682 | next(); 683 | } 684 | next(); /* skip ')' */ 685 | rsym = loc = 0; 686 | o(0xe58955); /* push %ebp, mov %esp, %ebp */ 687 | a = oad(0xec81, 0); /* sub $xxx, %esp */ 688 | block(0); 689 | gsym(rsym); 690 | o(0xc3c9); /* leave, ret */ 691 | put32(a, loc); /* save local variables */ 692 | } 693 | } 694 | } 695 | 696 | #ifdef ELFOUT 697 | 698 | gle32(n) 699 | { 700 | put32(glo, n); 701 | glo = glo + 4; 702 | } 703 | 704 | /* used to generate a program header at offset 't' of size 's' */ 705 | gphdr1(n, t) 706 | { 707 | gle32(n); 708 | n = n + ELF_BASE; 709 | gle32(n); 710 | gle32(n); 711 | gle32(t); 712 | gle32(t); 713 | } 714 | 715 | elf_reloc(l) 716 | { 717 | int t, a, n, p, b, c; 718 | 719 | p = 0; 720 | t = sym_stk; 721 | while (1) { 722 | /* extract symbol name */ 723 | t++; 724 | a = t; 725 | while (*(char *)t != TAG_TOK && t < dstk) 726 | t++; 727 | if (t == dstk) 728 | break; 729 | /* now see if it is forward defined */ 730 | tok = vars + (a - sym_stk) * 8 + TOK_IDENT - 8; 731 | b = *(int *)tok; 732 | n = *(int *)(tok + 4); 733 | if (n && b != 1) { 734 | #if 0 735 | { 736 | char buf[100]; 737 | memcpy(buf, a, t - a); 738 | buf[t - a] = '\0'; 739 | printf("extern ref='%s' val=%x\n", buf, b); 740 | } 741 | #endif 742 | if (!b) { 743 | if (!l) { 744 | /* symbol string */ 745 | memcpy(glo, a, t - a); 746 | glo = glo + t - a + 1; /* add a zero */ 747 | } else if (l == 1) { 748 | /* symbol table */ 749 | gle32(p + DYNSTR_BASE); 750 | gle32(0); 751 | gle32(0); 752 | gle32(0x10); /* STB_GLOBAL, STT_NOTYPE */ 753 | p = p + t - a + 1; /* add a zero */ 754 | } else { 755 | p++; 756 | /* generate relocation patches */ 757 | while (n) { 758 | a = get32(n); 759 | /* c = 0: R_386_32, c = 1: R_386_PC32 */ 760 | c = *(char *)(n - 1) != 0x05; 761 | put32(n, -c * 4); 762 | gle32(n - prog + text + data_offset); 763 | gle32(p * 256 + c + 1); 764 | n = a; 765 | } 766 | } 767 | } else if (!l) { 768 | /* generate standard relocation */ 769 | gsym1(n, b); 770 | } 771 | } 772 | } 773 | } 774 | 775 | elf_out(c) 776 | { 777 | int glo_saved, dynstr, dynstr_size, dynsym, hash, rel, n, t, text_size; 778 | 779 | /*****************************/ 780 | /* add text segment (but copy it later to handle relocations) */ 781 | text = glo; 782 | text_size = ind - prog; 783 | 784 | /* add the startup code */ 785 | ind = prog; 786 | o(0x505458); /* pop %eax, push %esp, push %eax */ 787 | t = *(int *)(vars + TOK_MAIN); 788 | oad(0xe8, t - ind - 5); 789 | o(0xc389); /* movl %eax, %ebx */ 790 | li(1); /* mov $1, %eax */ 791 | o(0x80cd); /* int $0x80 */ 792 | glo = glo + text_size; 793 | 794 | /*****************************/ 795 | /* add symbol strings */ 796 | dynstr = glo; 797 | /* libc name for dynamic table */ 798 | glo++; 799 | glo = strcpy(glo, "libc.so.6") + 10; 800 | glo = strcpy(glo, "libdl.so.2") + 11; 801 | 802 | /* export all forward referenced functions */ 803 | elf_reloc(0); 804 | dynstr_size = glo - dynstr; 805 | 806 | /*****************************/ 807 | /* add symbol table */ 808 | glo = (glo + 3) & -4; 809 | dynsym = glo; 810 | gle32(0); 811 | gle32(0); 812 | gle32(0); 813 | gle32(0); 814 | elf_reloc(1); 815 | 816 | /*****************************/ 817 | /* add symbol hash table */ 818 | hash = glo; 819 | n = (glo - dynsym) / 16; 820 | gle32(1); /* one bucket (simpler!) */ 821 | gle32(n); 822 | gle32(1); 823 | gle32(0); /* dummy first symbol */ 824 | t = 2; 825 | while (t < n) 826 | gle32(t++); 827 | gle32(0); 828 | 829 | /*****************************/ 830 | /* relocation table */ 831 | rel = glo; 832 | elf_reloc(2); 833 | 834 | /* copy code AFTER relocation is done */ 835 | memcpy(text, prog, text_size); 836 | 837 | glo_saved = glo; 838 | glo = data; 839 | 840 | /* elf header */ 841 | gle32(0x464c457f); 842 | gle32(0x00010101); 843 | gle32(0); 844 | gle32(0); 845 | gle32(0x00030002); 846 | gle32(1); 847 | gle32(text + data_offset); /* address of _start */ 848 | gle32(PHDR_OFFSET); /* offset of phdr */ 849 | gle32(0); 850 | gle32(0); 851 | gle32(0x00200034); 852 | gle32(3); /* phdr entry count */ 853 | 854 | /* program headers */ 855 | gle32(3); /* PT_INTERP */ 856 | gphdr1(INTERP_OFFSET, INTERP_SIZE); 857 | gle32(4); /* PF_R */ 858 | gle32(1); /* align */ 859 | 860 | gle32(1); /* PT_LOAD */ 861 | gphdr1(0, glo_saved - data); 862 | gle32(7); /* PF_R | PF_X | PF_W */ 863 | gle32(0x1000); /* align */ 864 | 865 | gle32(2); /* PT_DYNAMIC */ 866 | gphdr1(DYNAMIC_OFFSET, DYNAMIC_SIZE); 867 | gle32(6); /* PF_R | PF_W */ 868 | gle32(0x4); /* align */ 869 | 870 | /* now the interpreter name */ 871 | glo = strcpy(glo, "/lib/ld-linux.so.2") + 0x14; 872 | 873 | /* now the dynamic section */ 874 | gle32(1); /* DT_NEEDED */ 875 | gle32(1); /* libc name */ 876 | gle32(1); /* DT_NEEDED */ 877 | gle32(11); /* libdl name */ 878 | gle32(4); /* DT_HASH */ 879 | gle32(hash + data_offset); 880 | gle32(6); /* DT_SYMTAB */ 881 | gle32(dynsym + data_offset); 882 | gle32(5); /* DT_STRTAB */ 883 | gle32(dynstr + data_offset); 884 | gle32(10); /* DT_STRSZ */ 885 | gle32(dynstr_size); 886 | gle32(11); /* DT_SYMENT */ 887 | gle32(16); 888 | gle32(17); /* DT_REL */ 889 | gle32(rel + data_offset); 890 | gle32(18); /* DT_RELSZ */ 891 | gle32(glo_saved - rel); 892 | gle32(19); /* DT_RELENT */ 893 | gle32(8); 894 | gle32(0); /* DT_NULL */ 895 | gle32(0); 896 | 897 | t = fopen(c, "w"); 898 | fwrite(data, 1, glo_saved - data, t); 899 | fclose(t); 900 | } 901 | #endif 902 | 903 | main(n, t) 904 | { 905 | if (n < 3) { 906 | printf("usage: otccelf file.c outfile\n"); 907 | return 0; 908 | } 909 | dstk = strcpy(sym_stk = calloc(1, ALLOC_SIZE), 910 | " int if else while break return for define main ") + TOK_STR_SIZE; 911 | glo = data = calloc(1, ALLOC_SIZE); 912 | ind = prog = calloc(1, ALLOC_SIZE); 913 | vars = calloc(1, ALLOC_SIZE); 914 | 915 | t = t + 4; 916 | file = fopen(*(int *)t, "r"); 917 | 918 | data_offset = ELF_BASE - data; 919 | glo = glo + ELFSTART_SIZE; 920 | ind = ind + STARTUP_SIZE; 921 | 922 | inp(); 923 | next(); 924 | decl(0); 925 | t = t + 4; 926 | elf_out(*(int *)t); 927 | return 0; 928 | } 929 | -------------------------------------------------------------------------------- /otccex.c: -------------------------------------------------------------------------------- 1 | /* #!/usr/local/bin/otcc */ 2 | /* 3 | * Sample OTCC C example. You can uncomment the first line and install 4 | * otcc in /usr/local/bin to make otcc scripts ! 5 | */ 6 | 7 | /* Any preprocessor directive except #define are ignored. We put this 8 | include so that a standard C compiler can compile this code too. */ 9 | #include 10 | 11 | /* defines are handled, but macro arguments cannot be given. No 12 | recursive defines are tolerated */ 13 | #define DEFAULT_BASE 10 14 | 15 | /* global variables can be used */ 16 | int base; 17 | 18 | /* 19 | * Only old style K&R prototypes are parsed. Only int arguments are 20 | * allowed (implicit types). 21 | * 22 | * By benchmarking the execution time of this function (for example 23 | * for fib(35)), you'll notice that OTCC is quite fast because it 24 | * generates native i386 machine code. 25 | */ 26 | fib(n) 27 | { 28 | if (n <= 2) 29 | return 1; 30 | else 31 | return fib(n-1) + fib(n-2); 32 | } 33 | 34 | /* Identifiers are parsed the same way as C: begins with letter or 35 | '_', and then letters, '_' or digits */ 36 | fact(n) 37 | { 38 | /* local variables can be declared. Only 'int' type is supported */ 39 | int i, r; 40 | r = 1; 41 | /* 'while' and 'for' loops are supported */ 42 | for(i=2;i<=n;i++) 43 | r = r * i; 44 | return r; 45 | } 46 | 47 | /* Well, we could use printf, but it would be too easy */ 48 | print_num(n, b) 49 | { 50 | int tab, p, c; 51 | /* Numbers can be entered in decimal, hexadecimal ('0x' prefix) and 52 | octal ('0' prefix) */ 53 | /* more complex programs use malloc */ 54 | tab = malloc(0x100); 55 | p = tab; 56 | while (1) { 57 | c = n % b; 58 | /* Character constants can be used */ 59 | if (c >= 10) 60 | c = c + 'a' - 10; 61 | else 62 | c = c + '0'; 63 | *(char *)p = c; 64 | p++; 65 | n = n / b; 66 | /* 'break' is supported */ 67 | if (n == 0) 68 | break; 69 | } 70 | while (p != tab) { 71 | p--; 72 | printf("%c", *(char *)p); 73 | } 74 | free(tab); 75 | } 76 | 77 | /* 'main' takes standard 'argc' and 'argv' parameters */ 78 | main(argc, argv) 79 | { 80 | /* no local name space is supported, but local variables ARE 81 | supported. As long as you do not use a globally defined 82 | variable name as local variable (which is a bad habbit), you 83 | won't have any problem */ 84 | int s, n, f; 85 | 86 | /* && and || operator have the same semantics as C (left to right 87 | evaluation and early exit) */ 88 | if (argc != 2 && argc != 3) { 89 | /* '*' operator is supported with explicit casting to 'int *', 90 | 'char *' or 'int (*)()' (function pointer). Of course, 'int' 91 | are supposed to be used as pointers too. */ 92 | s = *(int *)argv; 93 | help(s); 94 | return 1; 95 | } 96 | /* Any libc function can be used because OTCC uses dynamic linking */ 97 | n = atoi(*(int *)(argv + 4)); 98 | base = DEFAULT_BASE; 99 | if (argc >= 3) { 100 | base = atoi(*(int *)(argv + 8)); 101 | if (base < 2 || base > 36) { 102 | /* external variables can be used too (here: 'stderr') */ 103 | fprintf(stderr, "Invalid base\n"); 104 | return 1; 105 | } 106 | } 107 | printf("fib(%d) = ", n); 108 | print_num(fib(n), base); 109 | printf("\n"); 110 | 111 | printf("fact(%d) = ", n); 112 | if (n > 12) { 113 | printf("Overflow"); 114 | } else { 115 | /* why not using a function pointer ? */ 116 | f = &fact; 117 | print_num((*(int (*)())f)(n), base); 118 | } 119 | printf("\n"); 120 | return 0; 121 | } 122 | 123 | /* functions can be used before being defined */ 124 | help(name) 125 | { 126 | printf("usage: %s n [base]\n", name); 127 | printf("Compute fib(n) and fact(n) and output the result in base 'base'\n"); 128 | } 129 | 130 | -------------------------------------------------------------------------------- /otccn.c: -------------------------------------------------------------------------------- 1 | /* 2 | Obfuscated Tiny C Compiler 3 | 4 | Copyright (C) 2001-2003 Fabrice Bellard 5 | 6 | This software is provided 'as-is', without any express or implied 7 | warranty. In no event will the authors be held liable for any damages 8 | arising from the use of this software. 9 | 10 | Permission is granted to anyone to use this software for any purpose, 11 | including commercial applications, and to alter it and redistribute it 12 | freely, subject to the following restrictions: 13 | 14 | 1. The origin of this software must not be misrepresented; you must not 15 | claim that you wrote the original software. If you use this software 16 | in a product, an acknowledgment in the product and its documentation 17 | *is* required. 18 | 2. Altered source versions must be plainly marked as such, and must not be 19 | misrepresented as being the original software. 20 | 3. This notice may not be removed or altered from any source distribution. 21 | */ 22 | #ifndef TINY 23 | #include 24 | #endif 25 | #include 26 | 27 | /* vars: value of variables 28 | loc : local variable index 29 | glo : global variable index 30 | ind : output code ptr 31 | rsym: return symbol 32 | prog: output code 33 | dstk: define stack 34 | dptr, dch: macro state 35 | */ 36 | int tok, tokc, tokl, ch, vars, rsym, prog, ind, loc, glo, file, sym_stk, dstk, dptr, dch, last_id; 37 | 38 | #define ALLOC_SIZE 99999 39 | 40 | /* depends on the init string */ 41 | #define TOK_STR_SIZE 48 42 | #define TOK_IDENT 0x100 43 | #define TOK_INT 0x100 44 | #define TOK_IF 0x120 45 | #define TOK_ELSE 0x138 46 | #define TOK_WHILE 0x160 47 | #define TOK_BREAK 0x190 48 | #define TOK_RETURN 0x1c0 49 | #define TOK_FOR 0x1f8 50 | #define TOK_DEFINE 0x218 51 | #define TOK_MAIN 0x250 52 | 53 | #define TOK_DUMMY 1 54 | #define TOK_NUM 2 55 | 56 | #define LOCAL 0x200 57 | 58 | #define SYM_FORWARD 0 59 | #define SYM_DEFINE 1 60 | 61 | /* tokens in string heap */ 62 | #define TAG_TOK ' ' 63 | #define TAG_MACRO 2 64 | 65 | pdef(t) 66 | { 67 | *(char *)dstk++ = t; 68 | } 69 | 70 | inp() 71 | { 72 | if (dptr) { 73 | ch = *(char *)dptr++; 74 | if (ch == TAG_MACRO) { 75 | dptr = 0; 76 | ch = dch; 77 | } 78 | } else 79 | ch = fgetc(file); 80 | /* printf("ch=%c 0x%x\n", ch, ch); */ 81 | } 82 | 83 | isid() 84 | { 85 | return isalnum(ch) | ch == '_'; 86 | } 87 | 88 | /* read a character constant */ 89 | getq() 90 | { 91 | if (ch == '\\') { 92 | inp(); 93 | if (ch == 'n') 94 | ch = '\n'; 95 | } 96 | } 97 | 98 | next() 99 | { 100 | int t, l, a; 101 | 102 | while (isspace(ch) | ch == '#') { 103 | if (ch == '#') { 104 | inp(); 105 | next(); 106 | if (tok == TOK_DEFINE) { 107 | next(); 108 | pdef(TAG_TOK); /* fill last ident tag */ 109 | *(int *)tok = SYM_DEFINE; 110 | *(int *)(tok + 4) = dstk; /* define stack */ 111 | } 112 | /* well we always save the values ! */ 113 | while (ch != '\n') { 114 | pdef(ch); 115 | inp(); 116 | } 117 | pdef(ch); 118 | pdef(TAG_MACRO); 119 | } 120 | inp(); 121 | } 122 | tokl = 0; 123 | tok = ch; 124 | /* encode identifiers & numbers */ 125 | if (isid()) { 126 | pdef(TAG_TOK); 127 | last_id = dstk; 128 | while (isid()) { 129 | pdef(ch); 130 | inp(); 131 | } 132 | if (isdigit(tok)) { 133 | tokc = strtol(last_id, 0, 0); 134 | tok = TOK_NUM; 135 | } else { 136 | *(char *)dstk = TAG_TOK; /* no need to mark end of string (we 137 | suppose data is initied to zero */ 138 | tok = strstr(sym_stk, last_id - 1) - sym_stk; 139 | *(char *)dstk = 0; /* mark real end of ident for dlsym() */ 140 | tok = tok * 8 + TOK_IDENT; 141 | if (tok > TOK_DEFINE) { 142 | tok = vars + tok; 143 | /* printf("tok=%s %x\n", last_id, tok); */ 144 | /* define handling */ 145 | if (*(int *)tok == SYM_DEFINE) { 146 | dptr = *(int *)(tok + 4); 147 | dch = ch; 148 | inp(); 149 | next(); 150 | } 151 | } 152 | } 153 | } else { 154 | inp(); 155 | if (tok == '\'') { 156 | tok = TOK_NUM; 157 | getq(); 158 | tokc = ch; 159 | inp(); 160 | inp(); 161 | } else if (tok == '/' & ch == '*') { 162 | inp(); 163 | while (ch) { 164 | while (ch != '*') 165 | inp(); 166 | inp(); 167 | if (ch == '/') 168 | ch = 0; 169 | } 170 | inp(); 171 | next(); 172 | } else 173 | { 174 | t = "++#m--%am*@R<^1c/@%[_[H3c%@%[_[H3c+@.B#d-@%:_^BKd<>`/03e<=0f>=/f<@.f>@1f==&g!=\'g&&k||#l&@.BCh^@.BSi|@.B+j~@/%Yd!@&d*@b"; 175 | while (l = *(char *)t++) { 176 | a = *(char *)t++; 177 | tokc = 0; 178 | while ((tokl = *(char *)t++ - 'b') < 0) 179 | tokc = tokc * 64 + tokl + 64; 180 | if (l == tok & (a == ch | a == '@')) { 181 | #if 0 182 | printf("%c%c -> tokl=%d tokc=0x%x\n", 183 | l, a, tokl, tokc); 184 | #endif 185 | if (a == ch) { 186 | inp(); 187 | tok = TOK_DUMMY; /* dummy token for double tokens */ 188 | } 189 | break; 190 | } 191 | } 192 | } 193 | } 194 | #if 0 195 | { 196 | int p; 197 | 198 | printf("tok=0x%x ", tok); 199 | if (tok >= TOK_IDENT) { 200 | printf("'"); 201 | if (tok > TOK_DEFINE) 202 | p = sym_stk + 1 + (tok - vars - TOK_IDENT) / 8; 203 | else 204 | p = sym_stk + 1 + (tok - TOK_IDENT) / 8; 205 | while (*(char *)p != TAG_TOK && *(char *)p) 206 | printf("%c", *(char *)p++); 207 | printf("'\n"); 208 | } else if (tok == TOK_NUM) { 209 | printf("%d\n", tokc); 210 | } else { 211 | printf("'%c'\n", tok); 212 | } 213 | } 214 | #endif 215 | } 216 | 217 | #ifdef TINY 218 | #define skip(c) next() 219 | #else 220 | 221 | void error(char *fmt,...) 222 | { 223 | va_list ap; 224 | 225 | va_start(ap, fmt); 226 | fprintf(stderr, "%d: ", ftell((FILE *)file)); 227 | vfprintf(stderr, fmt, ap); 228 | fprintf(stderr, "\n"); 229 | exit(1); 230 | va_end(ap); 231 | } 232 | 233 | void skip(c) 234 | { 235 | if (tok != c) { 236 | error("'%c' expected", c); 237 | } 238 | next(); 239 | } 240 | 241 | #endif 242 | 243 | o(n) 244 | { 245 | /* cannot use unsigned, so we must do a hack */ 246 | while (n && n != -1) { 247 | *(char *)ind++ = n; 248 | n = n >> 8; 249 | } 250 | } 251 | 252 | /* output a symbol and patch all calls to it */ 253 | gsym(t) 254 | { 255 | int n; 256 | while (t) { 257 | n = *(int *)t; /* next value */ 258 | *(int *)t = ind - t - 4; 259 | t = n; 260 | } 261 | } 262 | 263 | /* psym is used to put an instruction with a data field which is a 264 | reference to a symbol. It is in fact the same as oad ! */ 265 | #define psym oad 266 | 267 | /* instruction + address */ 268 | oad(n, t) 269 | { 270 | o(n); 271 | *(int *)ind = t; 272 | t = ind; 273 | ind = ind + 4; 274 | return t; 275 | } 276 | 277 | /* load immediate value */ 278 | li(t) 279 | { 280 | oad(0xb8, t); /* mov $xx, %eax */ 281 | } 282 | 283 | gjmp(t) 284 | { 285 | return psym(0xe9, t); 286 | } 287 | 288 | /* l = 0: je, l == 1: jne */ 289 | gtst(l, t) 290 | { 291 | o(0x0fc085); /* test %eax, %eax, je/jne xxx */ 292 | return psym(0x84 + l, t); 293 | } 294 | 295 | gcmp(t) 296 | { 297 | o(0xc139); /* cmp %eax,%ecx */ 298 | li(0); 299 | o(0x0f); /* setxx %al */ 300 | o(t + 0x90); 301 | o(0xc0); 302 | } 303 | 304 | gmov(l, t) 305 | { 306 | o(l + 0x83); 307 | oad((t < LOCAL) << 7 | 5, t); 308 | } 309 | 310 | /* l is one if '=' parsing wanted (quick hack) */ 311 | unary(l) 312 | { 313 | int n, t, a, c; 314 | 315 | n = 1; /* type of expression 0 = forward, 1 = value, other = 316 | lvalue */ 317 | if (tok == '\"') { 318 | li(glo); 319 | while (ch != '\"') { 320 | getq(); 321 | *(char *)glo++ = ch; 322 | inp(); 323 | } 324 | *(char *)glo = 0; 325 | glo = glo + 4 & -4; /* align heap */ 326 | inp(); 327 | next(); 328 | } else { 329 | c = tokl; 330 | a = tokc; 331 | t = tok; 332 | next(); 333 | if (t == TOK_NUM) { 334 | li(a); 335 | } else if (c == 2) { 336 | /* -, +, !, ~ */ 337 | unary(0); 338 | oad(0xb9, 0); /* movl $0, %ecx */ 339 | if (t == '!') 340 | gcmp(a); 341 | else 342 | o(a); 343 | } else if (t == '(') { 344 | expr(); 345 | skip(')'); 346 | } else if (t == '*') { 347 | /* parse cast */ 348 | skip('('); 349 | t = tok; /* get type */ 350 | next(); /* skip int/char/void */ 351 | next(); /* skip '*' or '(' */ 352 | if (tok == '*') { 353 | /* function type */ 354 | skip('*'); 355 | skip(')'); 356 | skip('('); 357 | skip(')'); 358 | t = 0; 359 | } 360 | skip(')'); 361 | unary(0); 362 | if (tok == '=') { 363 | next(); 364 | o(0x50); /* push %eax */ 365 | expr(); 366 | o(0x59); /* pop %ecx */ 367 | o(0x0188 + (t == TOK_INT)); /* movl %eax/%al, (%ecx) */ 368 | } else if (t) { 369 | if (t == TOK_INT) 370 | o(0x8b); /* mov (%eax), %eax */ 371 | else 372 | o(0xbe0f); /* movsbl (%eax), %eax */ 373 | ind++; /* add zero in code */ 374 | } 375 | } else if (t == '&') { 376 | gmov(10, *(int *)tok); /* leal EA, %eax */ 377 | next(); 378 | } else { 379 | n = *(int *)t; 380 | /* forward reference: try dlsym */ 381 | if (!n) 382 | n = dlsym(0, last_id); 383 | if (tok == '=' & l) { 384 | /* assignment */ 385 | next(); 386 | expr(); 387 | gmov(6, n); /* mov %eax, EA */ 388 | } else if (tok != '(') { 389 | /* variable */ 390 | gmov(8, n); /* mov EA, %eax */ 391 | if (tokl == 11) { 392 | gmov(0, n); 393 | o(tokc); 394 | next(); 395 | } 396 | } 397 | } 398 | } 399 | 400 | /* function call */ 401 | if (tok == '(') { 402 | if (n == 1) 403 | o(0x50); /* push %eax */ 404 | 405 | /* push args and invert order */ 406 | a = oad(0xec81, 0); /* sub $xxx, %esp */ 407 | next(); 408 | l = 0; 409 | while(tok != ')') { 410 | expr(); 411 | oad(0x248489, l); /* movl %eax, xxx(%esp) */ 412 | if (tok == ',') 413 | next(); 414 | l = l + 4; 415 | } 416 | *(int *)a = l; 417 | next(); 418 | if (!n) { 419 | /* forward reference */ 420 | t = t + 4; 421 | *(int *)t = psym(0xe8, *(int *)t); 422 | } else if (n == 1) { 423 | oad(0x2494ff, l); /* call *xxx(%esp) */ 424 | l = l + 4; 425 | } else { 426 | oad(0xe8, n - ind - 5); /* call xxx */ 427 | } 428 | if (l) 429 | oad(0xc481, l); /* add $xxx, %esp */ 430 | } 431 | } 432 | 433 | sum(l) 434 | { 435 | int t, n, a; 436 | 437 | if (l-- == 1) 438 | unary(1); 439 | else { 440 | sum(l); 441 | a = 0; 442 | while (l == tokl) { 443 | n = tok; 444 | t = tokc; 445 | next(); 446 | 447 | if (l > 8) { 448 | a = gtst(t, a); /* && and || output code generation */ 449 | sum(l); 450 | } else { 451 | o(0x50); /* push %eax */ 452 | sum(l); 453 | o(0x59); /* pop %ecx */ 454 | 455 | if (l == 4 | l == 5) { 456 | gcmp(t); 457 | } else { 458 | o(t); 459 | if (n == '%') 460 | o(0x92); /* xchg %edx, %eax */ 461 | } 462 | } 463 | } 464 | /* && and || output code generation */ 465 | if (a && l > 8) { 466 | a = gtst(t, a); 467 | li(t ^ 1); 468 | gjmp(5); /* jmp $ + 5 */ 469 | gsym(a); 470 | li(t); 471 | } 472 | } 473 | } 474 | 475 | expr() 476 | { 477 | sum(11); 478 | } 479 | 480 | 481 | test_expr() 482 | { 483 | expr(); 484 | return gtst(0, 0); 485 | } 486 | 487 | block(l) 488 | { 489 | int a, n, t; 490 | 491 | if (tok == TOK_IF) { 492 | next(); 493 | skip('('); 494 | a = test_expr(); 495 | skip(')'); 496 | block(l); 497 | if (tok == TOK_ELSE) { 498 | next(); 499 | n = gjmp(0); /* jmp */ 500 | gsym(a); 501 | block(l); 502 | gsym(n); /* patch else jmp */ 503 | } else { 504 | gsym(a); /* patch if test */ 505 | } 506 | } else if (tok == TOK_WHILE | tok == TOK_FOR) { 507 | t = tok; 508 | next(); 509 | skip('('); 510 | if (t == TOK_WHILE) { 511 | n = ind; 512 | a = test_expr(); 513 | } else { 514 | if (tok != ';') 515 | expr(); 516 | skip(';'); 517 | n = ind; 518 | a = 0; 519 | if (tok != ';') 520 | a = test_expr(); 521 | skip(';'); 522 | if (tok != ')') { 523 | t = gjmp(0); 524 | expr(); 525 | gjmp(n - ind - 5); 526 | gsym(t); 527 | n = t + 4; 528 | } 529 | } 530 | skip(')'); 531 | block(&a); 532 | gjmp(n - ind - 5); /* jmp */ 533 | gsym(a); 534 | } else if (tok == '{') { 535 | next(); 536 | /* declarations */ 537 | decl(1); 538 | while(tok != '}') 539 | block(l); 540 | next(); 541 | } else { 542 | if (tok == TOK_RETURN) { 543 | next(); 544 | if (tok != ';') 545 | expr(); 546 | rsym = gjmp(rsym); /* jmp */ 547 | } else if (tok == TOK_BREAK) { 548 | next(); 549 | *(int *)l = gjmp(*(int *)l); 550 | } else if (tok != ';') 551 | expr(); 552 | skip(';'); 553 | } 554 | } 555 | 556 | /* 'l' is true if local declarations */ 557 | decl(l) 558 | { 559 | int a; 560 | 561 | while (tok == TOK_INT | tok != -1 & !l) { 562 | if (tok == TOK_INT) { 563 | next(); 564 | while (tok != ';') { 565 | if (l) { 566 | loc = loc + 4; 567 | *(int *)tok = -loc; 568 | } else { 569 | *(int *)tok = glo; 570 | glo = glo + 4; 571 | } 572 | next(); 573 | if (tok == ',') 574 | next(); 575 | } 576 | skip(';'); 577 | } else { 578 | /* patch forward references (XXX: do not work for function 579 | pointers) */ 580 | gsym(*(int *)(tok + 4)); 581 | /* put function address */ 582 | *(int *)tok = ind; 583 | next(); 584 | skip('('); 585 | a = 8; 586 | while (tok != ')') { 587 | /* read param name and compute offset */ 588 | *(int *)tok = a; 589 | a = a + 4; 590 | next(); 591 | if (tok == ',') 592 | next(); 593 | } 594 | next(); /* skip ')' */ 595 | rsym = loc = 0; 596 | o(0xe58955); /* push %ebp, mov %esp, %ebp */ 597 | a = oad(0xec81, 0); /* sub $xxx, %esp */ 598 | block(0); 599 | gsym(rsym); 600 | o(0xc3c9); /* leave, ret */ 601 | *(int *)a = loc; /* save local variables */ 602 | } 603 | } 604 | } 605 | 606 | main(n, t) 607 | { 608 | file = stdin; 609 | if (n-- > 1) { 610 | t = t + 4; 611 | file = fopen(*(int *)t, "r"); 612 | } 613 | dstk = strcpy(sym_stk = calloc(1, ALLOC_SIZE), 614 | " int if else while break return for define main ") + TOK_STR_SIZE; 615 | glo = calloc(1, ALLOC_SIZE); 616 | ind = prog = calloc(1, ALLOC_SIZE); 617 | vars = calloc(1, ALLOC_SIZE); 618 | inp(); 619 | next(); 620 | decl(0); 621 | #ifdef TEST 622 | { 623 | FILE *f; 624 | f = fopen(*(char **)(t + 4), "w"); 625 | fwrite((void *)prog, 1, ind - prog, f); 626 | fclose(f); 627 | return 0; 628 | } 629 | #else 630 | return (*(int (*)())*(int *)(vars + TOK_MAIN)) (n, t); 631 | #endif 632 | } 633 | --------------------------------------------------------------------------------