├── .gitignore ├── README.md ├── compile.rot ├── compile.rotc ├── design.org ├── evil.rot ├── infected.rotc ├── presentation.org ├── project.clj ├── pylintrc ├── quines.rkt ├── repl.py ├── repl.rkt ├── rotten.rkt ├── rotten.rot ├── sexp.py ├── vm-design.md ├── vm.clj ├── vm.py └── vm.rkt /.gitignore: -------------------------------------------------------------------------------- 1 | /target 2 | /classes 3 | /checkouts 4 | pom.xml 5 | pom.xml.asc 6 | *.jar 7 | *.class 8 | /.lein-* 9 | /.nrepl-port 10 | *.pyc 11 | *~ 12 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Rotten 2 | 3 | Rotten is a small self-hosting Lisp, designed as a vehicle for exploring Ken 4 | Thompson's [Reflections on Trusting Trust][rott]. 5 | 6 | [rott]: http://www3.cs.stonybrook.edu/~cse509/p761-thompson.pdf 7 | 8 | 9 | 10 | 11 | 12 | 13 | ## How it works, in brief 14 | 15 | 1. Rotten compiles to a simple abstract machine ("the VM"). 16 | 17 | 2. The VM is implemented in [Racket](http://www.racket-lang.org/). There's now a 18 | prototype implementation in Python, as well! 19 | 20 | 3. The compiler from Rotten to VM-code is 21 | [written in Rotten](http://en.wikipedia.org/wiki/Self-hosting). 22 | 23 | 25 | 26 | ## Rotten is really small! 27 | 28 | Rotten is a very simple lisp, and it targets a very high-level virtual machine, 29 | so its implementation is quite small: 30 | 31 | | File | LOC | Description | 32 | | ------------ | ----: | ------------------------- | 33 | | compile.rot | ~ 70 | compiler | 34 | | vm.rkt | ~ 100 | VM interpreter | 35 | | repl.rkt | ~ 70 | repl & other conveniences | 36 | | **TOTAL:** | < 250 | | 37 | 38 | There are other files in the repository but they're mostly unnecessary, except 39 | for `compile.rotc` (the compiled version of `compile.rot`) — that's needed 40 | for bootstrapping! 41 | 42 | ## Trusting Trust in Rotten 43 | 44 | Rotten is named for Ken Thompson's [Reflections on Trusting Trust][rott], which 45 | shows that a malicious compiler can invisibly compromise any program compiled by 46 | it, including in particular itself! This makes for a wickedly 47 | difficult-to-detect bug. 48 | 49 | Rotten includes a (mildly) malicious compiler, `evil.rot`, which notices when 50 | it's compiling a compiler, such as `compile.rot`, and injects a self-propagating 51 | virus into it. The most interesting problem here is [quining][quine] the virus: 52 | to self-propagate, the virus needs access to its own source code! You can see 53 | some example quines and quine-generators in `quines.rkt`. 54 | 55 | The only other symptom of this virus is that an infected compiler will compile 56 | the symbol `rotten` to the string `"YOUR COMPILER HAS A VIRUS!!1!eleventyone"`. 57 | This is a poor stand-in for the nefarious behavior a *real* implementation of 58 | RoTT could inject into the compiler, but it will have to do for now. 59 | 60 | [quine]: http://en.wikipedia.org/wiki/Quine_(computing) 61 | 62 | ## Getting started 63 | 64 | First, install [git](http://git-scm.com/downloads) and 65 | [Racket](http://download.racket-lang.org/). If you're on Ubuntu or Debian: 66 | 67 | ~$ sudo apt-get install git racket 68 | 69 | If you don't have Racket but do have [Python](https://www.python.org), you can 70 | try the [Python VM](#alternative-using-the-python-vm) instead. 71 | 72 | Now grab Rotten: 73 | 74 | ~$ git clone https://github.com/rntz/rotten.git 75 | ~$ cd rotten 76 | ~/rotten$ racket repl.rkt 77 | VM rebooting 78 | VM loading compile.rotc 79 | VM loading {read,write}-file extensions 80 | ROTTEN> 81 | 82 | Now you're at the Rotten repl! 83 | 84 | ### Alternative: using the Python VM 85 | 86 | There's a prototype implementation of the VM in Python. It may still have bugs, 87 | but you can use it like so: 88 | 89 | ~/rotten$ python repl.py 90 | booting up VM 91 | VM loading compile.rotc 92 | VM loading {read,write}-file extensions 93 | pyROTTEN> 94 | 95 | You can tell it what file to boot up from by giving it a command-line argument, 96 | just like the Racket version: 97 | 98 | ~/rotten$ python repl.py infected.rotc 99 | booting up VM 100 | VM loading infected.rotc 101 | VM loading {read,write}-file extensions 102 | pyROTTEN> 103 | 104 | ## A quick and dirty guide to Rotten 105 | 106 | ;; Comments start with semicolons. 107 | (+ 2 3) ; --> 5 108 | 109 | ;; `def' defines global variables. 110 | (def x 17) 111 | x ; --> 17 112 | 113 | ;; `def' also defines functions, Scheme-style. 114 | (def (double x) (+ x x)) 115 | (double 23) ; --> 46 116 | ;; You can define variadic functions with dotted parameter lists: 117 | (def (list . xs) xs) 118 | (list 2 3 5) ; --> (2 3 5) 119 | 120 | ;; cons, car, and cdr work as expected. 121 | (cons 34 46) ; --> (34 . 46) 122 | (car '(a b)) ; --> a 123 | (cdr '(a b)) ; --> (b) 124 | ;; The car and cdr of () are both (). 125 | (car '()) ; --> () 126 | (cdr '()) ; --> () 127 | ;; Conses are immutable; there is no set-car! or set-cdr!. 128 | 129 | ;; () is false; everything else is true. 't is the conventional true value. 130 | ;; t is just a symbol; you must quote it, or get an unbound variable error. 131 | (eq? 0 0) ; --> t 132 | (eq? 0 1) ; --> () 133 | () ; --> () 134 | t ; --> raises error, "hash-ref: no value found for key" 135 | 136 | ;; () and nil are distinct; nil is just a symbol. 137 | (if () 'yes 'no) ; --> no 138 | (if 'nil 'yes 'no) ; --> yes 139 | 140 | ;; `if' is variadic, like a less-parenthesized 'cond: 141 | (if (eq? 0 1) 'yes) ; --> () 142 | (if (eq? 0 1) 'yes 'no) ; --> no 143 | (if (eq? 0 1) 'first 144 | (eq? 0 0) 'second) 145 | ; --> second 146 | (if (eq? 0 1) 'first 147 | (eq? 0 2) 'second) 148 | ; --> () 149 | (if (eq? 0 1) 'first 150 | (eq? 0 2) 'second 151 | 'otherwise) 152 | ; --> otherwise 153 | 154 | ;; Rotten's builtin functions are: 155 | ;; cons car cdr apply symbol? cons? atom? eq? + - 156 | ;; Rotten does not have macros, let-binding, or quasiquotation. 157 | 158 | Some slightly larger examples: 159 | 160 | ;; A (non-tail-recursive) map function: 161 | (def (map f l) 162 | (if l 163 | (cons (f (car l)) 164 | (map f (cdr l))))) 165 | 166 | ;; Fixed-point combinator. 167 | (def (fix f) 168 | (fn a (apply f (cons (fix f) a)))) 169 | 170 | ;; In Rotten it's hard to locally define recursive functions, so often we 171 | ;; use globally-defined helper functions. Here, rev-append is a helper for 172 | ;; rev. 173 | (def (rev l) (rev-append l ())) 174 | (def (rev-append l acc) 175 | (if x (rev-append (cdr x) (cons (car x) y)) 176 | y)) 177 | 178 | ## The Trusting Trust exploit in Rotten 179 | 180 | Rotten starts up by loading a pre-compiled image of the Rotten compiler from 181 | `compile.rotc`: 182 | 183 | ~/rotten$ racket repl.rkt 184 | VM rebooting 185 | VM loading compile.rotc 186 | VM loading {read,write}-file extensions 187 | ROTTEN> (compile-exp '(+ 2 3)) 188 | ((get-global +) (push 2) (push 3) (call 2)) 189 | ROTTEN> (compile-program '((def x 0) 190 | x)) 191 | ((push 0) (set-global x) (pop) (get-global x)) 192 | 193 | With this, we can compile files at the REPL: 194 | 195 | ROTTEN> (def compiled (compile-program (read-file "compile.rot"))) 196 | [... output omitted ...] 197 | ROTTEN> (write-file "new.rotc" compiled) 198 | # 199 | 200 | We just compiled our compiler! We can give `repl.rkt` an argument to force it 201 | to load our newly compiled compiler image: 202 | 203 | ~/rotten$ racket repl.rkt new.rotc 204 | VM rebooting 205 | VM loading new.rotc 206 | VM loading {read,write}-file extensions 207 | ROTTEN> 208 | 209 | To double-check our new compiler's integrity, we compare it with our previous 210 | compiler: 211 | 212 | ~/rotten$ diff -s compile.rotc new.rotc 213 | Files compile.rotc and new.rotc are identical 214 | 215 | Our compiler produced an exact copy of itself! 216 | 217 | Now, remember `evil.rot`? It contains an evil compiler that, if we use it to 218 | compile `compile.rot`, produces a compiler infected with a self-propagating 219 | virus. Let's try it! First, we compile the evil compiler: 220 | 221 | ~/rotten$ racket repl.rkt 222 | ROTTEN> (write-file "evil.rotc" (compile-program (read-file "evil.rot"))) 223 | 224 | Now, we use the evil compiler to compile `compile.rot` into `infected.rotc`: 225 | 226 | ~/rotten$ racket repl.rkt evil.rotc 227 | ROTTEN> (write-file "infected.rotc" (compile-program (read-file "compile.rot"))) 228 | 229 | Finally, we run the infected compiler, and observe it is buggy: 230 | 231 | ~/rotten$ racket repl.rkt infected.rotc 232 | ROTTEN> rotten 233 | "YOUR COMPILER HAS A VIRUS!!1!eleventyone" 234 | 235 | Observe that the bug persists even over a recompile: 236 | 237 | ~/rotten$ racket repl.rkt infected.rotc 238 | ROTTEN> (write-file "infected-2.rotc" (compile-program (read-file "compile.rot"))) 239 | ROTTEN> ^D 240 | ~/rotten$ racket repl.rkt infected-2.rotc 241 | ROTTEN> rotten 242 | "YOUR COMPILER HAS A VIRUS!!1!eleventyone" 243 | 244 | In fact, our infected compiler has *also* produced an exact copy of itself! But 245 | of course, our safe compiler and our infected compiler differ: 246 | 247 | ~/racket$ diff -s infected.rotc infected-2.rotc 248 | Files infected.rotc and infected-2.rotc are identical 249 | ~/racket$ diff -q compile.rotc infected.rotc 250 | Files compile.rotc and infected.rotc differ 251 | 252 | ## Files 253 | 254 | | File | Purpose | 255 | | ---------------- | -------------------------------------------------------- | 256 | | compile.rot | Compiler from Rotten to VM-code. | 257 | | evil.rot | Malicious compiler that infects compile.rot with a RoTT virus. | 258 | | rotten.rot | AST-walking metacircular Rotten interpreter (in Rotten). | 259 | | vm.rkt | VM interpreter. | 260 | | rotten.rkt | AST-walking Rotten interpreter (in Racket). | 261 | | repl.rkt | Rotten REPL & other conveniences. | 262 | | quines.rkt | Demonstration of various quining techniques. | 263 | | vm.py | VM interpreter, in Python. | 264 | | repl.py | Rotten REPL, in Python. | 265 | | sexp.py | S-expression parser and other utilities, in Python. | 266 | | compile.rotc | Pre-compiled image of compile.rot, used for bootstrapping VM. | 267 | | infected.rotc | RoTT-infected version of compile.rotc. | 268 | | README.md | The file you're currently reading. | 269 | | design.org | Notes to myself about the design of Rotten. | 270 | | presentation.org | Notes to myself for presenting a demo of Rotten. | 271 | 272 | ## Caveat lector 273 | This project is an exercise in golfing. Therefore, everything in it is horribly, 274 | horribly bad, including but not limited to: 275 | 276 | - the language design 277 | - the VM design 278 | - the interpreter implementation 279 | - the VM implementation 280 | - the heuristic `evil.rot` uses to detect when it's compiling a compiler 281 | 282 | **Do not** take any of these as an example of how to do it. If you'd like 283 | pointers on slightly more reasonable ways to design and implement a lisp, feel 284 | free to [email me](mailto:daekharel@gmail.com), although I am not an expert. 285 | -------------------------------------------------------------------------------- /compile.rot: -------------------------------------------------------------------------------- 1 | ;; -*- mode: scheme -*- 2 | ;; Reading vm-design.md may help you understand how this compiler works. 3 | 4 | ;; utility functions 5 | (def (not x) (if x () 't)) 6 | (def (cadr x) (car (cdr x))) 7 | (def (cddr x) (cdr (cdr x))) 8 | (def (list . xs) xs) 9 | (def (proper? l) (if (cons? l) (proper? (cdr l)) (not l))) 10 | (def (rev-append x y) (if x (rev-append (cdr x) (cons (car x) y)) y)) 11 | (def (rev l) (rev-append l ())) 12 | (def (append x y) (rev-append (rev x) y)) 13 | (def (len- i l) (if (cons? l) (len- (+ 1 i) (cdr l)) i)) 14 | (def (len l) (len- 0 l)) 15 | (def (index- i e l) (if l (if (eq? e (car l)) i (index- (+ 1 i) e (cdr l))))) 16 | (def (index e l) (index- 0 e l)) 17 | 18 | 19 | ;; env is a list of local variables 20 | (def (compile-exp exp) (rev (compile exp '() '()))) 21 | (def (compile-program body) (rev (compile-body body '() '()))) 22 | 23 | (def (compile-body body env accum) 24 | ((if (cdr body) 25 | (fn (x) (compile-body (cdr body) env (cons '(pop) x))) 26 | (fn (x) x)) 27 | (compile (car body) env accum))) 28 | 29 | (def (compile x env accum) 30 | (if 31 | ;; variable 32 | (symbol? x) (cons (var-access x (index x env)) accum) 33 | ;; literal 34 | (atom? x) (cons (list 'push x) accum) 35 | ;; special forms 36 | (eq? (car x) 'quote) (cons (list 'push (cadr x)) accum) 37 | (eq? (car x) 'fn) (cons (fn-closure (cadr x) (cddr x) env) accum) 38 | (eq? (car x) 'if) (compile-if (cdr x) env accum) 39 | (eq? (car x) 'def) (compile-def (cadr x) (cddr x) env accum) 40 | ;; otherwise, function application 41 | (cons (list 'call (len (cdr x))) (compile-args x env accum)))) 42 | 43 | (def (compile-args xs env accum) 44 | (if (not xs) accum 45 | (compile-args (cdr xs) env (compile (car xs) env accum)))) 46 | 47 | (def (var-access var ix) (if ix (list 'access ix) (list 'get-global var))) 48 | 49 | (def (fn-closure params body env) 50 | ;; produces (closure ARITY HAS-REST-PARAM CODE) 51 | (list 'closure (len params) (not (proper? params)) 52 | (rev (compile-body body (append (params-env params) env) '())))) 53 | 54 | (def (params-env ps) 55 | (if (cons? ps) (cons (car ps) (params-env (cdr ps))) 56 | ps (list ps))) 57 | 58 | (def (compile-if conds env accum) 59 | (if (not (cdr conds)) 60 | (compile (car conds) env accum) 61 | (cons (list 'if 62 | (rev (compile (cadr conds) env '())) 63 | (rev (compile-if (cddr conds) env '()))) 64 | (compile (car conds) env accum)))) 65 | 66 | (def (compile-def target body env accum) 67 | (if (cons? target) 68 | ;; defining a function 69 | (cons (list 'set-global (car target)) 70 | (cons (fn-closure (cdr target) body env) accum)) 71 | ;; defining a variable 72 | (cons (list 'set-global target) (compile (car body) env accum)))) 73 | -------------------------------------------------------------------------------- /compile.rotc: -------------------------------------------------------------------------------- 1 | (closure 1 () ((access 0) (if ((push ())) ((push t))))) 2 | (set-global not) 3 | (pop) 4 | (closure 1 () ((get-global car) (get-global cdr) (access 0) (call 1) (call 1))) 5 | (set-global cadr) 6 | (pop) 7 | (closure 1 () ((get-global cdr) (get-global cdr) (access 0) (call 1) (call 1))) 8 | (set-global cddr) 9 | (pop) 10 | (closure 0 t ((access 0))) 11 | (set-global list) 12 | (pop) 13 | (closure 14 | 1 15 | () 16 | ((get-global cons?) 17 | (access 0) 18 | (call 1) 19 | (if ((get-global proper?) (get-global cdr) (access 0) (call 1) (call 1)) 20 | ((get-global not) (access 0) (call 1))))) 21 | (set-global proper?) 22 | (pop) 23 | (closure 24 | 2 25 | () 26 | ((access 0) 27 | (if ((get-global rev-append) 28 | (get-global cdr) 29 | (access 0) 30 | (call 1) 31 | (get-global cons) 32 | (get-global car) 33 | (access 0) 34 | (call 1) 35 | (access 1) 36 | (call 2) 37 | (call 2)) 38 | ((access 1))))) 39 | (set-global rev-append) 40 | (pop) 41 | (closure 1 () ((get-global rev-append) (access 0) (push ()) (call 2))) 42 | (set-global rev) 43 | (pop) 44 | (closure 45 | 2 46 | () 47 | ((get-global rev-append) 48 | (get-global rev) 49 | (access 0) 50 | (call 1) 51 | (access 1) 52 | (call 2))) 53 | (set-global append) 54 | (pop) 55 | (closure 56 | 2 57 | () 58 | ((get-global cons?) 59 | (access 1) 60 | (call 1) 61 | (if ((get-global len-) 62 | (get-global +) 63 | (push 1) 64 | (access 0) 65 | (call 2) 66 | (get-global cdr) 67 | (access 1) 68 | (call 1) 69 | (call 2)) 70 | ((access 0))))) 71 | (set-global len-) 72 | (pop) 73 | (closure 1 () ((get-global len-) (push 0) (access 0) (call 2))) 74 | (set-global len) 75 | (pop) 76 | (closure 77 | 3 78 | () 79 | ((access 2) 80 | (if ((get-global eq?) 81 | (access 1) 82 | (get-global car) 83 | (access 2) 84 | (call 1) 85 | (call 2) 86 | (if ((access 0)) 87 | ((get-global index-) 88 | (get-global +) 89 | (push 1) 90 | (access 0) 91 | (call 2) 92 | (access 1) 93 | (get-global cdr) 94 | (access 2) 95 | (call 1) 96 | (call 3)))) 97 | ((push ()))))) 98 | (set-global index-) 99 | (pop) 100 | (closure 2 () ((get-global index-) (push 0) (access 0) (access 1) (call 3))) 101 | (set-global index) 102 | (pop) 103 | (closure 104 | 1 105 | () 106 | ((get-global rev) 107 | (get-global compile) 108 | (access 0) 109 | (push ()) 110 | (push ()) 111 | (call 3) 112 | (call 1))) 113 | (set-global compile-exp) 114 | (pop) 115 | (closure 116 | 1 117 | () 118 | ((get-global rev) 119 | (get-global compile-body) 120 | (access 0) 121 | (push ()) 122 | (push ()) 123 | (call 3) 124 | (call 1))) 125 | (set-global compile-program) 126 | (pop) 127 | (closure 128 | 3 129 | () 130 | ((get-global cdr) 131 | (access 0) 132 | (call 1) 133 | (if ((closure 134 | 1 135 | () 136 | ((get-global compile-body) 137 | (get-global cdr) 138 | (access 1) 139 | (call 1) 140 | (access 2) 141 | (get-global cons) 142 | (push (pop)) 143 | (access 0) 144 | (call 2) 145 | (call 3)))) 146 | ((closure 1 () ((access 0))))) 147 | (get-global compile) 148 | (get-global car) 149 | (access 0) 150 | (call 1) 151 | (access 1) 152 | (access 2) 153 | (call 3) 154 | (call 1))) 155 | (set-global compile-body) 156 | (pop) 157 | (closure 158 | 3 159 | () 160 | ((get-global symbol?) 161 | (access 0) 162 | (call 1) 163 | (if ((get-global cons) 164 | (get-global var-access) 165 | (access 0) 166 | (get-global index) 167 | (access 0) 168 | (access 1) 169 | (call 2) 170 | (call 2) 171 | (access 2) 172 | (call 2)) 173 | ((get-global atom?) 174 | (access 0) 175 | (call 1) 176 | (if ((get-global cons) 177 | (get-global list) 178 | (push push) 179 | (access 0) 180 | (call 2) 181 | (access 2) 182 | (call 2)) 183 | ((get-global eq?) 184 | (get-global car) 185 | (access 0) 186 | (call 1) 187 | (push quote) 188 | (call 2) 189 | (if ((get-global cons) 190 | (get-global list) 191 | (push push) 192 | (get-global cadr) 193 | (access 0) 194 | (call 1) 195 | (call 2) 196 | (access 2) 197 | (call 2)) 198 | ((get-global eq?) 199 | (get-global car) 200 | (access 0) 201 | (call 1) 202 | (push fn) 203 | (call 2) 204 | (if ((get-global cons) 205 | (get-global fn-closure) 206 | (get-global cadr) 207 | (access 0) 208 | (call 1) 209 | (get-global cddr) 210 | (access 0) 211 | (call 1) 212 | (access 1) 213 | (call 3) 214 | (access 2) 215 | (call 2)) 216 | ((get-global eq?) 217 | (get-global car) 218 | (access 0) 219 | (call 1) 220 | (push if) 221 | (call 2) 222 | (if ((get-global compile-if) 223 | (get-global cdr) 224 | (access 0) 225 | (call 1) 226 | (access 1) 227 | (access 2) 228 | (call 3)) 229 | ((get-global eq?) 230 | (get-global car) 231 | (access 0) 232 | (call 1) 233 | (push def) 234 | (call 2) 235 | (if ((get-global compile-def) 236 | (get-global cadr) 237 | (access 0) 238 | (call 1) 239 | (get-global cddr) 240 | (access 0) 241 | (call 1) 242 | (access 1) 243 | (access 2) 244 | (call 4)) 245 | ((get-global cons) 246 | (get-global list) 247 | (push call) 248 | (get-global len) 249 | (get-global cdr) 250 | (access 0) 251 | (call 1) 252 | (call 1) 253 | (call 2) 254 | (get-global compile-args) 255 | (access 0) 256 | (access 1) 257 | (access 2) 258 | (call 3) 259 | (call 2))))))))))))))) 260 | (set-global compile) 261 | (pop) 262 | (closure 263 | 3 264 | () 265 | ((get-global not) 266 | (access 0) 267 | (call 1) 268 | (if ((access 2)) 269 | ((get-global compile-args) 270 | (get-global cdr) 271 | (access 0) 272 | (call 1) 273 | (access 1) 274 | (get-global compile) 275 | (get-global car) 276 | (access 0) 277 | (call 1) 278 | (access 1) 279 | (access 2) 280 | (call 3) 281 | (call 3))))) 282 | (set-global compile-args) 283 | (pop) 284 | (closure 285 | 2 286 | () 287 | ((access 1) 288 | (if ((get-global list) (push access) (access 1) (call 2)) 289 | ((get-global list) (push get-global) (access 0) (call 2))))) 290 | (set-global var-access) 291 | (pop) 292 | (closure 293 | 3 294 | () 295 | ((get-global list) 296 | (push closure) 297 | (get-global len) 298 | (access 0) 299 | (call 1) 300 | (get-global not) 301 | (get-global proper?) 302 | (access 0) 303 | (call 1) 304 | (call 1) 305 | (get-global rev) 306 | (get-global compile-body) 307 | (access 1) 308 | (get-global append) 309 | (get-global params-env) 310 | (access 0) 311 | (call 1) 312 | (access 2) 313 | (call 2) 314 | (push ()) 315 | (call 3) 316 | (call 1) 317 | (call 4))) 318 | (set-global fn-closure) 319 | (pop) 320 | (closure 321 | 1 322 | () 323 | ((get-global cons?) 324 | (access 0) 325 | (call 1) 326 | (if ((get-global cons) 327 | (get-global car) 328 | (access 0) 329 | (call 1) 330 | (get-global params-env) 331 | (get-global cdr) 332 | (access 0) 333 | (call 1) 334 | (call 1) 335 | (call 2)) 336 | ((access 0) (if ((get-global list) (access 0) (call 1)) ((push ()))))))) 337 | (set-global params-env) 338 | (pop) 339 | (closure 340 | 3 341 | () 342 | ((get-global not) 343 | (get-global cdr) 344 | (access 0) 345 | (call 1) 346 | (call 1) 347 | (if ((get-global compile) 348 | (get-global car) 349 | (access 0) 350 | (call 1) 351 | (access 1) 352 | (access 2) 353 | (call 3)) 354 | ((get-global cons) 355 | (get-global list) 356 | (push if) 357 | (get-global rev) 358 | (get-global compile) 359 | (get-global cadr) 360 | (access 0) 361 | (call 1) 362 | (access 1) 363 | (push ()) 364 | (call 3) 365 | (call 1) 366 | (get-global rev) 367 | (get-global compile-if) 368 | (get-global cddr) 369 | (access 0) 370 | (call 1) 371 | (access 1) 372 | (push ()) 373 | (call 3) 374 | (call 1) 375 | (call 3) 376 | (get-global compile) 377 | (get-global car) 378 | (access 0) 379 | (call 1) 380 | (access 1) 381 | (access 2) 382 | (call 3) 383 | (call 2))))) 384 | (set-global compile-if) 385 | (pop) 386 | (closure 387 | 4 388 | () 389 | ((get-global cons?) 390 | (access 0) 391 | (call 1) 392 | (if ((get-global cons) 393 | (get-global list) 394 | (push set-global) 395 | (get-global car) 396 | (access 0) 397 | (call 1) 398 | (call 2) 399 | (get-global cons) 400 | (get-global fn-closure) 401 | (get-global cdr) 402 | (access 0) 403 | (call 1) 404 | (access 1) 405 | (access 2) 406 | (call 3) 407 | (access 3) 408 | (call 2) 409 | (call 2)) 410 | ((get-global cons) 411 | (get-global list) 412 | (push set-global) 413 | (access 0) 414 | (call 2) 415 | (get-global compile) 416 | (get-global car) 417 | (access 1) 418 | (call 1) 419 | (access 2) 420 | (access 3) 421 | (call 3) 422 | (call 2))))) 423 | (set-global compile-def) 424 | -------------------------------------------------------------------------------- /design.org: -------------------------------------------------------------------------------- 1 | * Design notes 2 | (car '()) = (cdr '()) = '() 3 | (symbol? '()) = '() 4 | (atom? x) = (not (cons? x)) 5 | 6 | specials: quote fn if 7 | builtins: cons car cdr symbol? atom? cons? eq? apply 8 | 9 | * design questions 10 | ** recursive VM? 11 | should I drop the whole continuation thing, and just have the VM recursively 12 | call itself? this would simplify 'apply as well. 13 | 14 | OTOH, it makes it feel less like a "real" virtual machine, where recursively 15 | calling the VM is generally considered bad. (why is it bad again? I guess if you 16 | need to do control stack manipulation...) 17 | 18 | OTOH, it doesn't shop up in the *interface* to the VM, which is what I care 19 | about most. 20 | 21 | -------------------------------------------------------------------------------- /evil.rot: -------------------------------------------------------------------------------- 1 | ;; -*- mode: scheme -*- 2 | ;; An evil version of compile.rot 3 | ;; 4 | ;; Note that this does not itself have the bug that a fully infected compiler 5 | ;; exhibits, where 'rotten compiles to "YOUR COMPILER HAS A 6 | ;; VIRUS!!1!eleventyone". Rather, it infects *other* compilers with this bug (in 7 | ;; a self-propagating manner). 8 | ;; 9 | ;; This is because I find writing a program that generates a complicated quine 10 | ;; to be easier than writing a complicated quine. 11 | 12 | ;; utility functions 13 | (def (not x) (if x () 't)) 14 | (def (cadr x) (car (cdr x))) 15 | (def (cddr x) (cdr (cdr x))) 16 | (def (list . xs) xs) 17 | (def (proper? l) (if (cons? l) (proper? (cdr l)) (not l))) 18 | (def (rev-append x y) (if x (rev-append (cdr x) (cons (car x) y)) y)) 19 | (def (rev l) (rev-append l ())) 20 | (def (append x y) (rev-append (rev x) y)) 21 | (def (len- i l) (if (cons? l) (len- (+ 1 i) (cdr l)) i)) 22 | (def (len l) (len- 0 l)) 23 | (def (index- i e l) (if l (if (eq? e (car l)) i (index- (+ 1 i) e (cdr l))))) 24 | (def (index e l) (index- 0 e l)) 25 | 26 | 27 | ;; A quine generator. 28 | ;; 29 | ;; This code is pretty inscrutable. I haven't yet figured out a cleaner way to 30 | ;; write it in Rotten. See quines.rkt for more readable versions (in Racket). 31 | (def (replace-magic r e) 32 | (if (eq? 'MAGIC e) r 33 | (atom? e) e 34 | (eq? 'quote (car e)) e 35 | (cons (replace-magic r (car e)) (replace-magic r (cdr e))))) 36 | 37 | (def (mlet name exp body) (list (list 'fn (list name) body) exp)) 38 | 39 | (def replace-magic-src 40 | ;; hooray for open-coding fixpoint combinators 41 | (mlet 'self '(fn (s r e) 42 | (if (eq? 'MAGIC e) r 43 | (atom? e) e 44 | (eq? 'quote (car e)) e 45 | (cons (s s r (car e)) (s s r (cdr e))))) 46 | '(fn (r e) (self self r e)))) 47 | 48 | (def (quine-magic x) (replace-magic (list 'quote x) x)) 49 | (def (quine name code) 50 | (quine-magic 51 | (mlet 'replace-magic replace-magic-src 52 | (mlet name '(replace-magic (list 'quote MAGIC) MAGIC) 53 | code)))) 54 | 55 | 56 | (def (compile-exp exp) (rev (compile exp '() '()))) 57 | (def (compile-program body) (rev (compile-body body '() '()))) 58 | 59 | (def (compile-body body env accum) 60 | ((if (cdr body) 61 | (fn (x) (compile-body (cdr body) env (cons '(pop) x))) 62 | (fn (x) x)) 63 | (compile (car body) env accum))) 64 | 65 | ;; env is a list of local variables 66 | (def (compile x env accum) 67 | (if 68 | ;; variable 69 | (symbol? x) (cons (var-access x (index x env)) accum) 70 | ;; literal 71 | (atom? x) (cons (list 'push x) accum) 72 | ;; special forms 73 | (eq? (car x) 'quote) (cons (list 'push (cadr x)) accum) 74 | (eq? (car x) 'fn) (cons (fn-closure (cadr x) (cddr x) env) accum) 75 | (eq? (car x) 'if) (compile-if (cdr x) env accum) 76 | (eq? (car x) 'def) (compile-def (cadr x) (cddr x) env accum) 77 | ;; otherwise, function application 78 | (cons (list 'call (len (cdr x))) (compile-args x env accum)))) 79 | 80 | (def (compile-args xs env accum) 81 | (if (not xs) accum 82 | (compile-args (cdr xs) env (compile (car xs) env accum)))) 83 | 84 | (def (var-access var ix) (if ix (list 'access ix) (list 'get-global var))) 85 | 86 | (def (fn-closure params body env) 87 | ;; produces (closure ARITY HAS-REST-PARAM CODE) 88 | (list 'closure (len params) (not (proper? params)) 89 | (rev (compile-body body (append (params-env params) env) '())))) 90 | 91 | (def (params-env ps) 92 | (if (cons? ps) (cons (car ps) (params-env (cdr ps))) 93 | ps (list ps))) 94 | 95 | (def (compile-if conds env accum) 96 | (if (not (cdr conds)) 97 | (compile (car conds) env accum) 98 | (cons (list 'if 99 | (rev (compile (cadr conds) env '())) 100 | (rev (compile-if (cddr conds) env '()))) 101 | (compile (car conds) env accum)))) 102 | 103 | (def (compile-def target body env accum) 104 | (if 105 | ;; defining a variable 106 | (atom? target) 107 | (cons (list 'set-global target) (compile (car body) env accum)) 108 | ;; defining the compiler, infect it! 109 | (eq? (car target) 'compile) 110 | (compile-def 'compile (list (infect body)) env accum) 111 | ;; defining a function 112 | (cons (list 'set-global (car target)) 113 | (cons (fn-closure (cdr target) body env) accum)))) 114 | 115 | 116 | ;; The transformation we do is approximately: 117 | ;; 118 | ;; (def (compile x env accum) BODY ...) 119 | ;; 120 | ;; ---> 121 | ;; 122 | ;; (def compile 123 | ;; ((fn (f) (fn (x env accum) (f FILTERED-X env accum))) 124 | ;; (fn (x env accum) BODY ...))) 125 | ;; 126 | ;; Except that the (fn (f) ....) is actually quine-ified. 127 | ;; 128 | ;; This is accomplished by calling (infect '(BODY ...)) 129 | ;; which returns the quine-ified (fn (f) ...) 130 | ;; 131 | ;; Again, I apologize for the inscrutableness of this code. 132 | (def (infect body) 133 | (list virus (cons 'fn (cons '(x env accum) body)))) 134 | 135 | (def virus 136 | (quine 'self 137 | '(fn (f) 138 | (fn (x env accum) 139 | (f 140 | (if 141 | (eq? x 'rotten) "YOUR COMPILER HAS A VIRUS!!1!eleventyone" 142 | (atom? x) x 143 | (atom? (cdr x)) x 144 | (atom? (cadr x)) x 145 | (if (eq? 'def (car x)) (eq? 'compile (car (cadr x)))) 146 | (list 'def 'compile 147 | (list self 148 | (cons 'fn (cons (cdr (cadr x)) (cddr x))))) 149 | x) 150 | env accum))))) 151 | -------------------------------------------------------------------------------- /infected.rotc: -------------------------------------------------------------------------------- 1 | (closure 1 () ((access 0) (if ((push ())) ((push t))))) 2 | (set-global not) 3 | (pop) 4 | (closure 1 () ((get-global car) (get-global cdr) (access 0) (call 1) (call 1))) 5 | (set-global cadr) 6 | (pop) 7 | (closure 1 () ((get-global cdr) (get-global cdr) (access 0) (call 1) (call 1))) 8 | (set-global cddr) 9 | (pop) 10 | (closure 0 t ((access 0))) 11 | (set-global list) 12 | (pop) 13 | (closure 14 | 1 15 | () 16 | ((get-global cons?) 17 | (access 0) 18 | (call 1) 19 | (if ((get-global proper?) (get-global cdr) (access 0) (call 1) (call 1)) 20 | ((get-global not) (access 0) (call 1))))) 21 | (set-global proper?) 22 | (pop) 23 | (closure 24 | 2 25 | () 26 | ((access 0) 27 | (if ((get-global rev-append) 28 | (get-global cdr) 29 | (access 0) 30 | (call 1) 31 | (get-global cons) 32 | (get-global car) 33 | (access 0) 34 | (call 1) 35 | (access 1) 36 | (call 2) 37 | (call 2)) 38 | ((access 1))))) 39 | (set-global rev-append) 40 | (pop) 41 | (closure 1 () ((get-global rev-append) (access 0) (push ()) (call 2))) 42 | (set-global rev) 43 | (pop) 44 | (closure 45 | 2 46 | () 47 | ((get-global rev-append) 48 | (get-global rev) 49 | (access 0) 50 | (call 1) 51 | (access 1) 52 | (call 2))) 53 | (set-global append) 54 | (pop) 55 | (closure 56 | 2 57 | () 58 | ((get-global cons?) 59 | (access 1) 60 | (call 1) 61 | (if ((get-global len-) 62 | (get-global +) 63 | (push 1) 64 | (access 0) 65 | (call 2) 66 | (get-global cdr) 67 | (access 1) 68 | (call 1) 69 | (call 2)) 70 | ((access 0))))) 71 | (set-global len-) 72 | (pop) 73 | (closure 1 () ((get-global len-) (push 0) (access 0) (call 2))) 74 | (set-global len) 75 | (pop) 76 | (closure 77 | 3 78 | () 79 | ((access 2) 80 | (if ((get-global eq?) 81 | (access 1) 82 | (get-global car) 83 | (access 2) 84 | (call 1) 85 | (call 2) 86 | (if ((access 0)) 87 | ((get-global index-) 88 | (get-global +) 89 | (push 1) 90 | (access 0) 91 | (call 2) 92 | (access 1) 93 | (get-global cdr) 94 | (access 2) 95 | (call 1) 96 | (call 3)))) 97 | ((push ()))))) 98 | (set-global index-) 99 | (pop) 100 | (closure 2 () ((get-global index-) (push 0) (access 0) (access 1) (call 3))) 101 | (set-global index) 102 | (pop) 103 | (closure 104 | 1 105 | () 106 | ((get-global rev) 107 | (get-global compile) 108 | (access 0) 109 | (push ()) 110 | (push ()) 111 | (call 3) 112 | (call 1))) 113 | (set-global compile-exp) 114 | (pop) 115 | (closure 116 | 1 117 | () 118 | ((get-global rev) 119 | (get-global compile-body) 120 | (access 0) 121 | (push ()) 122 | (push ()) 123 | (call 3) 124 | (call 1))) 125 | (set-global compile-program) 126 | (pop) 127 | (closure 128 | 3 129 | () 130 | ((get-global cdr) 131 | (access 0) 132 | (call 1) 133 | (if ((closure 134 | 1 135 | () 136 | ((get-global compile-body) 137 | (get-global cdr) 138 | (access 1) 139 | (call 1) 140 | (access 2) 141 | (get-global cons) 142 | (push (pop)) 143 | (access 0) 144 | (call 2) 145 | (call 3)))) 146 | ((closure 1 () ((access 0))))) 147 | (get-global compile) 148 | (get-global car) 149 | (access 0) 150 | (call 1) 151 | (access 1) 152 | (access 2) 153 | (call 3) 154 | (call 1))) 155 | (set-global compile-body) 156 | (pop) 157 | (closure 158 | 1 159 | () 160 | ((closure 161 | 1 162 | () 163 | ((closure 164 | 1 165 | () 166 | ((closure 167 | 3 168 | () 169 | ((access 3) 170 | (get-global eq?) 171 | (access 0) 172 | (push rotten) 173 | (call 2) 174 | (if ((push "YOUR COMPILER HAS A VIRUS!!1!eleventyone")) 175 | ((get-global atom?) 176 | (access 0) 177 | (call 1) 178 | (if ((access 0)) 179 | ((get-global atom?) 180 | (get-global cdr) 181 | (access 0) 182 | (call 1) 183 | (call 1) 184 | (if ((access 0)) 185 | ((get-global atom?) 186 | (get-global cadr) 187 | (access 0) 188 | (call 1) 189 | (call 1) 190 | (if ((access 0)) 191 | ((get-global eq?) 192 | (push def) 193 | (get-global car) 194 | (access 0) 195 | (call 1) 196 | (call 2) 197 | (if ((get-global eq?) 198 | (push compile) 199 | (get-global car) 200 | (get-global cadr) 201 | (access 0) 202 | (call 1) 203 | (call 1) 204 | (call 2)) 205 | ((push ()))) 206 | (if ((get-global list) 207 | (push def) 208 | (push compile) 209 | (get-global list) 210 | (access 4) 211 | (get-global cons) 212 | (push fn) 213 | (get-global cons) 214 | (get-global cdr) 215 | (get-global cadr) 216 | (access 0) 217 | (call 1) 218 | (call 1) 219 | (get-global cddr) 220 | (access 0) 221 | (call 1) 222 | (call 2) 223 | (call 2) 224 | (call 2) 225 | (call 3)) 226 | ((access 0))))))))))) 227 | (access 1) 228 | (access 2) 229 | (call 3))))))) 230 | (access 0) 231 | (get-global list) 232 | (push quote) 233 | (push 234 | ((fn 235 | (replace-magic) 236 | ((fn 237 | (self) 238 | (fn 239 | (f) 240 | (fn 241 | (x env accum) 242 | (f 243 | (if (eq? x 'rotten) 244 | "YOUR COMPILER HAS A VIRUS!!1!eleventyone" 245 | (atom? x) 246 | x 247 | (atom? (cdr x)) 248 | x 249 | (atom? (cadr x)) 250 | x 251 | (if (eq? 'def (car x)) (eq? 'compile (car (cadr x)))) 252 | (list 253 | 'def 254 | 'compile 255 | (list self (cons 'fn (cons (cdr (cadr x)) (cddr x))))) 256 | x) 257 | env 258 | accum)))) 259 | (replace-magic (list 'quote MAGIC) MAGIC))) 260 | ((fn (self) (fn (r e) (self self r e))) 261 | (fn 262 | (s r e) 263 | (if (eq? 'MAGIC e) 264 | r 265 | (atom? e) 266 | e 267 | (eq? 'quote (car e)) 268 | e 269 | (cons (s s r (car e)) (s s r (cdr e)))))))) 270 | (call 2) 271 | (push 272 | ((fn 273 | (replace-magic) 274 | ((fn 275 | (self) 276 | (fn 277 | (f) 278 | (fn 279 | (x env accum) 280 | (f 281 | (if (eq? x 'rotten) 282 | "YOUR COMPILER HAS A VIRUS!!1!eleventyone" 283 | (atom? x) 284 | x 285 | (atom? (cdr x)) 286 | x 287 | (atom? (cadr x)) 288 | x 289 | (if (eq? 'def (car x)) (eq? 'compile (car (cadr x)))) 290 | (list 291 | 'def 292 | 'compile 293 | (list self (cons 'fn (cons (cdr (cadr x)) (cddr x))))) 294 | x) 295 | env 296 | accum)))) 297 | (replace-magic (list 'quote MAGIC) MAGIC))) 298 | ((fn (self) (fn (r e) (self self r e))) 299 | (fn 300 | (s r e) 301 | (if (eq? 'MAGIC e) 302 | r 303 | (atom? e) 304 | e 305 | (eq? 'quote (car e)) 306 | e 307 | (cons (s s r (car e)) (s s r (cdr e)))))))) 308 | (call 2) 309 | (call 1))) 310 | (closure 311 | 1 312 | () 313 | ((closure 2 () ((access 2) (access 2) (access 0) (access 1) (call 3))))) 314 | (closure 315 | 3 316 | () 317 | ((get-global eq?) 318 | (push MAGIC) 319 | (access 2) 320 | (call 2) 321 | (if ((access 1)) 322 | ((get-global atom?) 323 | (access 2) 324 | (call 1) 325 | (if ((access 2)) 326 | ((get-global eq?) 327 | (push quote) 328 | (get-global car) 329 | (access 2) 330 | (call 1) 331 | (call 2) 332 | (if ((access 2)) 333 | ((get-global cons) 334 | (access 0) 335 | (access 0) 336 | (access 1) 337 | (get-global car) 338 | (access 2) 339 | (call 1) 340 | (call 3) 341 | (access 0) 342 | (access 0) 343 | (access 1) 344 | (get-global cdr) 345 | (access 2) 346 | (call 1) 347 | (call 3) 348 | (call 2))))))))) 349 | (call 1) 350 | (call 1) 351 | (closure 352 | 3 353 | () 354 | ((get-global symbol?) 355 | (access 0) 356 | (call 1) 357 | (if ((get-global cons) 358 | (get-global var-access) 359 | (access 0) 360 | (get-global index) 361 | (access 0) 362 | (access 1) 363 | (call 2) 364 | (call 2) 365 | (access 2) 366 | (call 2)) 367 | ((get-global atom?) 368 | (access 0) 369 | (call 1) 370 | (if ((get-global cons) 371 | (get-global list) 372 | (push push) 373 | (access 0) 374 | (call 2) 375 | (access 2) 376 | (call 2)) 377 | ((get-global eq?) 378 | (get-global car) 379 | (access 0) 380 | (call 1) 381 | (push quote) 382 | (call 2) 383 | (if ((get-global cons) 384 | (get-global list) 385 | (push push) 386 | (get-global cadr) 387 | (access 0) 388 | (call 1) 389 | (call 2) 390 | (access 2) 391 | (call 2)) 392 | ((get-global eq?) 393 | (get-global car) 394 | (access 0) 395 | (call 1) 396 | (push fn) 397 | (call 2) 398 | (if ((get-global cons) 399 | (get-global fn-closure) 400 | (get-global cadr) 401 | (access 0) 402 | (call 1) 403 | (get-global cddr) 404 | (access 0) 405 | (call 1) 406 | (access 1) 407 | (call 3) 408 | (access 2) 409 | (call 2)) 410 | ((get-global eq?) 411 | (get-global car) 412 | (access 0) 413 | (call 1) 414 | (push if) 415 | (call 2) 416 | (if ((get-global compile-if) 417 | (get-global cdr) 418 | (access 0) 419 | (call 1) 420 | (access 1) 421 | (access 2) 422 | (call 3)) 423 | ((get-global eq?) 424 | (get-global car) 425 | (access 0) 426 | (call 1) 427 | (push def) 428 | (call 2) 429 | (if ((get-global compile-def) 430 | (get-global cadr) 431 | (access 0) 432 | (call 1) 433 | (get-global cddr) 434 | (access 0) 435 | (call 1) 436 | (access 1) 437 | (access 2) 438 | (call 4)) 439 | ((get-global cons) 440 | (get-global list) 441 | (push call) 442 | (get-global len) 443 | (get-global cdr) 444 | (access 0) 445 | (call 1) 446 | (call 1) 447 | (call 2) 448 | (get-global compile-args) 449 | (access 0) 450 | (access 1) 451 | (access 2) 452 | (call 3) 453 | (call 2))))))))))))))) 454 | (call 1) 455 | (set-global compile) 456 | (pop) 457 | (closure 458 | 3 459 | () 460 | ((get-global not) 461 | (access 0) 462 | (call 1) 463 | (if ((access 2)) 464 | ((get-global compile-args) 465 | (get-global cdr) 466 | (access 0) 467 | (call 1) 468 | (access 1) 469 | (get-global compile) 470 | (get-global car) 471 | (access 0) 472 | (call 1) 473 | (access 1) 474 | (access 2) 475 | (call 3) 476 | (call 3))))) 477 | (set-global compile-args) 478 | (pop) 479 | (closure 480 | 2 481 | () 482 | ((access 1) 483 | (if ((get-global list) (push access) (access 1) (call 2)) 484 | ((get-global list) (push get-global) (access 0) (call 2))))) 485 | (set-global var-access) 486 | (pop) 487 | (closure 488 | 3 489 | () 490 | ((get-global list) 491 | (push closure) 492 | (get-global len) 493 | (access 0) 494 | (call 1) 495 | (get-global not) 496 | (get-global proper?) 497 | (access 0) 498 | (call 1) 499 | (call 1) 500 | (get-global rev) 501 | (get-global compile-body) 502 | (access 1) 503 | (get-global append) 504 | (get-global params-env) 505 | (access 0) 506 | (call 1) 507 | (access 2) 508 | (call 2) 509 | (push ()) 510 | (call 3) 511 | (call 1) 512 | (call 4))) 513 | (set-global fn-closure) 514 | (pop) 515 | (closure 516 | 1 517 | () 518 | ((get-global cons?) 519 | (access 0) 520 | (call 1) 521 | (if ((get-global cons) 522 | (get-global car) 523 | (access 0) 524 | (call 1) 525 | (get-global params-env) 526 | (get-global cdr) 527 | (access 0) 528 | (call 1) 529 | (call 1) 530 | (call 2)) 531 | ((access 0) (if ((get-global list) (access 0) (call 1)) ((push ()))))))) 532 | (set-global params-env) 533 | (pop) 534 | (closure 535 | 3 536 | () 537 | ((get-global not) 538 | (get-global cdr) 539 | (access 0) 540 | (call 1) 541 | (call 1) 542 | (if ((get-global compile) 543 | (get-global car) 544 | (access 0) 545 | (call 1) 546 | (access 1) 547 | (access 2) 548 | (call 3)) 549 | ((get-global cons) 550 | (get-global list) 551 | (push if) 552 | (get-global rev) 553 | (get-global compile) 554 | (get-global cadr) 555 | (access 0) 556 | (call 1) 557 | (access 1) 558 | (push ()) 559 | (call 3) 560 | (call 1) 561 | (get-global rev) 562 | (get-global compile-if) 563 | (get-global cddr) 564 | (access 0) 565 | (call 1) 566 | (access 1) 567 | (push ()) 568 | (call 3) 569 | (call 1) 570 | (call 3) 571 | (get-global compile) 572 | (get-global car) 573 | (access 0) 574 | (call 1) 575 | (access 1) 576 | (access 2) 577 | (call 3) 578 | (call 2))))) 579 | (set-global compile-if) 580 | (pop) 581 | (closure 582 | 4 583 | () 584 | ((get-global cons?) 585 | (access 0) 586 | (call 1) 587 | (if ((get-global cons) 588 | (get-global list) 589 | (push set-global) 590 | (get-global car) 591 | (access 0) 592 | (call 1) 593 | (call 2) 594 | (get-global cons) 595 | (get-global fn-closure) 596 | (get-global cdr) 597 | (access 0) 598 | (call 1) 599 | (access 1) 600 | (access 2) 601 | (call 3) 602 | (access 3) 603 | (call 2) 604 | (call 2)) 605 | ((get-global cons) 606 | (get-global list) 607 | (push set-global) 608 | (access 0) 609 | (call 2) 610 | (get-global compile) 611 | (get-global car) 612 | (access 1) 613 | (call 1) 614 | (access 2) 615 | (access 3) 616 | (call 3) 617 | (call 2))))) 618 | (set-global compile-def) 619 | -------------------------------------------------------------------------------- /presentation.org: -------------------------------------------------------------------------------- 1 | * Intro 2 | Over the last two weeks, I've written a compiler for a tiny lisp language I call 3 | Rotten. 4 | 5 | The way Rotten works is: 6 | 1. Rotten compiles to code for a simple VM 7 | 2. The VM has an interpreter written in another language, called Racket 8 | 9 | The first interesting thing about Rotten is that the compiler from Rotten to 10 | VM-code is written in Rotten! 11 | 12 | The second interesting thing about Rotten is how small it is: 13 | the compiler is only 80 lines of code! 14 | And it can compile itself! 15 | So that's pretty cool. 16 | 17 | But that's not what I'm here to talk about. 18 | I'm here to talk about a bug in my compiler! 19 | 20 | * REPL demo 21 | (boot "compile.rotc") 22 | ;; the virtual machine has just booted up 23 | ;; and loaded a precompiled image of the compiler 24 | 25 | (repl) 26 | ;; Now we're running Rotten code! 27 | 28 | ROTTEN> (+ 2 2) 29 | ROTTEN> '(hello from rotten) 30 | 31 | ;; so, remember I talked about a bug 32 | ROTTEN> (def (id x) x) 33 | ROTTEN> (id 10) 34 | 35 | ROTTEN> (def (id rotten) rotten) 36 | ROTTEN> (id 10) 37 | ROTTEN> rotten 38 | 39 | ;; at this point it's clear our compiler has a bug 40 | ;; our compiler's only 80 lines long, let's take a look 41 | 42 | * Debugging 43 | open compile.rot 44 | the name 'rotten appears nowhere in this file 45 | neither does "YOUR COMPILER HAS A VIRUS" 46 | I'm going to spoil it for you: the bug isn't here. 47 | 48 | if it's not here, where is it? 49 | remember how I had to boot up the VM using a precompiled compiler image. 50 | what if *that compiler image* had the bug in it?! 51 | 52 | $ grep rotten compile.rotc 53 | $ grep 'YOUR COMPILER' compile.rotc 54 | 55 | Okay, now what are we going to do? 56 | 57 | * Recompiling 58 | No problem, we'll just recompile the compiler! 59 | Since compile.rot doesn't have a bug, the new compiler will be bug-free. 60 | 61 | ROTTEN> (write-file "new-compile.rotc" (compile-program (read-file "compile.rot"))) 62 | ROTTEN> ,quit 63 | (boot "new-compile.rotc") 64 | (repl) 65 | ROTTEN> rotten 66 | ;; what the hell? 67 | 68 | * RoTT 69 | What's going on here is our buggy compiler image is buggy in more ways than one. 70 | Not only does it wait for the name 'rotten and compile it into an obnoxious 71 | message; it waits for you to compile the compiler itself, and replaces it with a 72 | malicious version - one with the same bugs! 73 | 74 | It's a self-propagating virus in our compiler image. And unless you look at the 75 | compiled code, it's totally undetectable. Apart from that obnoxious message. 76 | 77 | If you're interested in learning more, see my github or google for Reflections 78 | on Trusting Trust. 79 | -------------------------------------------------------------------------------- /project.clj: -------------------------------------------------------------------------------- 1 | (defproject rotten "0.0.0-SNAPSHOT" 2 | :dependencies [[org.clojure/clojure "1.6.0"] 3 | [org.clojure/core.match "0.3.0-alpha4"]] 4 | :source-paths ["."]) 5 | -------------------------------------------------------------------------------- /pylintrc: -------------------------------------------------------------------------------- 1 | [MESSAGES CONTROL] 2 | # pylint is fucking paranoid by default 3 | disable= missing-docstring, invalid-name, too-few-public-methods, multiple-statements, star-args, too-many-locals 4 | 5 | [REPORTS] 6 | reports= no 7 | -------------------------------------------------------------------------------- /quines.rkt: -------------------------------------------------------------------------------- 1 | #lang racket 2 | 3 | ;;; One of the fundamental techniques involved in Ken Thompson's Reflections on 4 | ;;; Trusting Trust is quining: writing a program which can access its own source 5 | ;;; code. Here are a few example quines in Racket. 6 | 7 | ;;; QUINE #1 8 | ;;; with care, we could move replace-magic inside of the quine, 9 | ;;; but I think it's clearer this way. 10 | (define (replace-magic r e) 11 | (match e 12 | [`(quote ,_) e] 13 | ['MAGIC r] 14 | [(cons a b) (cons (replace-magic r a) (replace-magic r b))] 15 | [_ e])) 16 | 17 | (define (quine) 18 | (define source 19 | '(define (quine) 20 | (define source MAGIC) 21 | (displayln "Hello I am a quine.") 22 | (replace-magic (list 'quote source) source))) 23 | (displayln "Hello I am a quine.") 24 | (replace-magic (list 'quote source) source)) 25 | 26 | ;;; QUINE #2 27 | ;;; takes advantage of quasiquotation, but is a little tricky because of that 28 | (define (quine2) 29 | (define (source x) 30 | `(define (quine2) 31 | (define (source x) ,x) 32 | (displayln "Hello I am a quine.") 33 | (source 34 | (list 'quasiquote (source (list (string->symbol "unquote") 'x)))))) 35 | (displayln "Hello I am a quine.") 36 | (source 37 | (list 'quasiquote (source (list (string->symbol "unquote") 'x))))) 38 | 39 | 40 | ;;; A more advanced technique is writing a quine *generator*. You give a quine 41 | ;;; generator a program that *wants* to access its own source, and it produces a 42 | ;;; program that *does*. 43 | 44 | ;;; QUINE GENERATOR #1 45 | ;;; (make-quine self '(list 'hello self)) 46 | ;;; returns a program that runs (list 'hello self), 47 | ;;; with 'self bound to its own source code 48 | (define (make-quine name src) 49 | (define magic-src 50 | `(let ([,name (replace-magic 'MAGIC 'MAGIC)]) 51 | ,src)) 52 | (replace-magic magic-src magic-src)) 53 | 54 | ;;; making this self-sufficient 55 | (define (make-quine-better name src) 56 | (define magic-src 57 | `(let () 58 | (define (replace-magic r e) 59 | (match e 60 | [`(quote ,_) e] 61 | ['MAGIC r] 62 | [(cons a b) (cons (replace-magic r a) (replace-magic r b))] 63 | [_ e])) 64 | (define ,name (replace-magic (list 'quote MAGIC) MAGIC)) 65 | ,src)) 66 | (replace-magic (list 'quote magic-src) magic-src)) 67 | 68 | ;;; QUINE GENERATOR #2 69 | ;;; I barely understand this one myself. 70 | ;;; 71 | ;;; (make-quine2 (lambda (x) `(list 'hello ,x))) 72 | ;;; returns a program that runs (list 'hello SELF) 73 | ;;; where SELF is its own (quoted) source code 74 | (define (make-quine2 func) 75 | (define gen (gensym 'gen)) 76 | (define self (gensym 'self)) 77 | (define arg (gensym 'x)) 78 | (define (source x) 79 | `(let () 80 | (define (,gen ,arg) ,x) 81 | (define ,self 82 | (,gen 83 | (list 'quasiquote (,gen (list (string->symbol "unquote") ',arg))))) 84 | ,(func self))) 85 | (source (list 'quasiquote (source (list 'unquote arg))))) 86 | 87 | ;; cheating implementation of quine2: 88 | (define (make-quine2-cheating func) 89 | (let ([name (gensym)]) 90 | (make-quine name (func name)))) 91 | -------------------------------------------------------------------------------- /repl.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | import sys 3 | 4 | import sexp 5 | from sexp import Symbol 6 | import vm 7 | 8 | def read_all(f): 9 | string = f.read() 10 | buf, exps = sexp.parse_exps(buffer(string)) 11 | assert not buf # should have read to EOF 12 | return exps 13 | 14 | def read_file(filename): 15 | with open(filename) as f: 16 | return sexp.consify(read_all(f)) 17 | 18 | # exps is a cons-list of expressions 19 | def write_file(filename, exps): 20 | with open(filename, 'w') as f: 21 | for e in sexp.cons_iter(exps): 22 | sexp.write(f, e) 23 | f.write('\n') 24 | 25 | def vm_boot(filename="compile.rotc"): 26 | print "booting up VM" 27 | vmstate = vm.VM() 28 | print "VM loading %s" % filename 29 | vm_load(vmstate, filename) 30 | print "VM loading {read,write}-file extensions" 31 | vmstate.set_global(Symbol('read-file'), read_file) 32 | vmstate.set_global(Symbol('write-file'), write_file) 33 | return vmstate 34 | 35 | def vm_load(vmstate, filename): 36 | vmstate.run_body(read_file(filename)) 37 | 38 | def vm_call(vmstate, funcname, *args): 39 | # perhaps I could use Thread.call somehow? 40 | # it wasn't meant to be an external method, but maybe it could become one 41 | instrs = sexp.consify( 42 | [sexp.consify([Symbol("get-global"), Symbol(funcname)])] 43 | + [sexp.consify([Symbol("push"), x]) for x in args] 44 | + [sexp.consify([Symbol("call"), len(args)])]) 45 | return vmstate.run_expr(instrs) 46 | 47 | def vm_compile_expr(vmstate, expr): 48 | return vm_call(vmstate, "compile-exp", expr) 49 | 50 | def vm_eval(vmstate, expr): 51 | c = vm_compile_expr(vmstate, expr) 52 | return vmstate.run_expr(c) 53 | 54 | class QuitRepl(Exception): pass 55 | 56 | def read_sexps(): 57 | # TODO: semicolons should start comments 58 | string = '' 59 | while True: 60 | line = sys.stdin.readline() 61 | if not line: 62 | raise QuitRepl("end of input") 63 | string += line 64 | try: 65 | buf, e = sexp.parse_exp(string) 66 | except sexp.EOF: 67 | # ran out of input before parsing a complete sexp, keep reading 68 | continue 69 | yield e 70 | # if there's nothing else left on the line but whitespace, we're done reading sexps 71 | if not buf.strip(): 72 | break 73 | # copy the remainder of the string into a fresh string and keep reading 74 | string = str(buf) 75 | 76 | def repl(vmstate): 77 | try: 78 | while True: 79 | sys.stdout.write('pyROTTEN> ') 80 | sys.stdout.flush() 81 | for exp in read_sexps(): 82 | if exp == sexp.consify([Symbol("unquote"), Symbol("quit")]): 83 | raise QuitRepl(",quit command") 84 | try: 85 | val = vm_eval(vmstate, exp) 86 | except vm.VMError as e: 87 | sys.stdout.flush() 88 | print >>sys.stderr, e 89 | sys.stderr.flush() 90 | else: 91 | print sexp.to_str(val) 92 | except QuitRepl: 93 | pass 94 | 95 | def main(): 96 | if len(sys.argv) > 1: 97 | vmstate = vm_boot(sys.argv[1]) 98 | else: 99 | vmstate = vm_boot() 100 | repl(vmstate) 101 | 102 | if __name__ == '__main__': 103 | main() 104 | -------------------------------------------------------------------------------- /repl.rkt: -------------------------------------------------------------------------------- 1 | #lang racket 2 | 3 | (require 4 | (prefix-in i: "rotten.rkt") ;; direct interpreter 5 | (prefix-in vm: "vm.rkt")) ;; VM 6 | 7 | ;; Utility 8 | (define (read-all port) 9 | (let loop ([acc '()]) 10 | (let ([x (read port)]) 11 | (if (eof-object? x) (reverse acc) 12 | (loop (cons x acc)))))) 13 | 14 | (define (read-file filename) (call-with-input-file filename read-all)) 15 | (define (write-file filename code) 16 | (with-output-to-file filename #:exists 'truncate/replace 17 | (lambda () 18 | (for ([x code]) (pretty-write x))))) 19 | 20 | 21 | ;;; Manipulating the interpreter 22 | (define (i:load filename) (i:eval-body (read-file filename) '()) (void)) 23 | (define (i:load-eval) (i:load "rotten.rot")) 24 | (define (i:load-compile) (i:load "compile.rot")) 25 | 26 | ;; only run these after (i:load-compile) 27 | (define (i:compile src) (i:eval `(compile-exp ',src))) 28 | (define (i:compile-program src) (i:eval `(compile-program ',src))) 29 | 30 | 31 | ;;; Manipulating the VM. 32 | (define (boot [filename "compile.rotc"]) 33 | (displayln "VM rebooting") 34 | (vm:reset) 35 | (printf "VM loading ~a\n" filename) 36 | (vm:load filename) 37 | (displayln "VM loading {read,write}-file extensions") 38 | (hash-set! vm:globals 'read-file read-file) 39 | (hash-set! vm:globals 'write-file write-file)) 40 | 41 | (define (vm:load filename) (vm:run-body (read-file filename))) 42 | 43 | (define (vm:call funcname . args) 44 | (vm:run `((get-global ,funcname) 45 | ,@(map (lambda (x) `(push ,x)) args) 46 | (call ,(length args))))) 47 | 48 | (define (vm:compile-exp src) (vm:call 'compile-exp src)) 49 | (define (vm:compile-program src) (vm:call 'compile-program src)) 50 | (define (vm:compile filename) (vm:compile-program (read-file filename))) 51 | (define (vm:compile! filename [dest (string-append filename "c")]) 52 | (write-file dest (vm:compile filename))) 53 | 54 | (define (vm:eval e) (vm:run (vm:compile-exp e))) 55 | 56 | 57 | ;;; The repl 58 | (define (repl [evaler vm:eval]) 59 | (display "ROTTEN> ") 60 | (define exp (read)) 61 | (unless (or (eof-object? exp) (equal? exp '(unquote quit))) 62 | (with-handlers ([exn:fail? (lambda (e) (log-error (exn-message e)))]) 63 | (pretty-write (evaler exp))) 64 | (repl evaler))) 65 | 66 | (module+ main 67 | (match (current-command-line-arguments) 68 | [`#(,x) (boot x)] 69 | [`#() (boot)]) 70 | (repl)) 71 | -------------------------------------------------------------------------------- /rotten.rkt: -------------------------------------------------------------------------------- 1 | #lang racket 2 | 3 | (provide eval eval-body globals make-globals reset) 4 | 5 | (require (except-in racket eval)) 6 | 7 | (define (nil? x) (eq? x '())) 8 | (define (true? x) (not (nil? x))) 9 | (define (atom? x) (not (pair? x))) 10 | 11 | 12 | ;; Metacircular evaluator 13 | ;; env is an assoc list 14 | (define (eval x [env '()]) 15 | (cond 16 | ((symbol? x) (lookup x env)) ;variable 17 | ((atom? x) x) ;literal 18 | ;; special forms 19 | ((eq? (car x) 'quote) (cadr x)) 20 | ((eq? (car x) 'fn) (make-fn (cadr x) (cddr x) env)) 21 | ((eq? (car x) 'if) (eval-if (cdr x) env)) 22 | ((eq? (car x) 'def) (eval-def (cadr x) (cddr x) env)) 23 | ;; otherwise, function application 24 | (#t (apply (eval (car x) env) 25 | (map (lambda (x) (eval x env)) (cdr x)))))) 26 | 27 | (define (lookup name env) 28 | (let ((x (or (assoc name env) (assoc name globals)))) 29 | (if x (cdr x) 30 | (error (format "sorry, no such variable: ~v" name))))) 31 | 32 | (define (make-fn params body env) 33 | (lambda args (eval-body body (append (make-env params args) env)))) 34 | 35 | (define (make-env params args) 36 | (cond 37 | ((symbol? params) (list (cons params args))) 38 | ((pair? params) 39 | (if (pair? args) 40 | (cons (cons (car params) (car args)) 41 | (make-env (cdr params) (cdr args))) 42 | (error (format "parameter mismatch: ~a doesn't match ~a" params args)))) 43 | ((true? args) (error (format "unused arguments: ~a" args))) 44 | (#t '()))) 45 | 46 | (define (eval-body body env) 47 | (if (null? body) '() 48 | (let ((x (eval (car body) env))) 49 | (if (null? (cdr body)) x 50 | (eval-body (cdr body) env))))) 51 | 52 | (define (eval-if conds env) 53 | (cond 54 | ((nil? conds) '()) 55 | ((nil? (cdr conds)) (eval (car conds) env)) 56 | ((true? (eval (car conds) env)) (eval (cadr conds) env)) 57 | (#t (eval-if (cddr conds) env)))) 58 | 59 | (define (eval-def target body env) 60 | (define x 61 | (if (pair? target) 62 | ;; defining a function 63 | (cons (car target) (make-fn (cdr target) body env)) 64 | ;; defining a value 65 | (cons target (eval-body body env)))) 66 | (set! globals (cons x globals)) 67 | (cdr x)) 68 | 69 | ;; Converts racket's #t/#f into rotten t/nil. 70 | (define (predicate x) 71 | (lambda args 72 | (if (apply x args) 't '()))) 73 | 74 | 75 | ;; Global environment 76 | (define (make-globals) 77 | (list 78 | (cons 'cons cons) 79 | (cons 'car (lambda (x) (if (nil? x) '() (mcar x)))) 80 | (cons 'cdr (lambda (x) (if (nil? x) '() (mcdr x)))) 81 | (cons 'symbol? (predicate symbol?)) 82 | (cons 'atom? (predicate atom?)) 83 | (cons 'cons? (predicate pair?)) 84 | (cons 'eq? (predicate eqv?)) 85 | (cons 'apply apply) 86 | (cons '+ +) 87 | (cons '- -))) 88 | 89 | (define globals (make-globals)) 90 | (define (reset) (set! globals (make-globals))) 91 | 92 | 93 | ;; Tests 94 | (module+ test 95 | (require rackunit) 96 | 97 | (define-syntax-rule (check-eval result src) 98 | (check-equal? result (eval 'src))) 99 | 100 | (define-syntax-rule (check-t src) (check-eval 't src)) 101 | (define-syntax-rule (check-nil src) (check-eval '() src)) 102 | 103 | ;; simple 104 | (check-eval 2 2) 105 | (check-eval '() ()) 106 | (check-eval cons cons) 107 | (check-eval 1 (car (cons 1 2))) 108 | (check-eval 2 (cdr (cons 1 2))) 109 | (check-eval 1 (car '(1 2))) 110 | (check-eval 'a 'a) 111 | (check-t (symbol? 'a)) 112 | (check-eval '(1 . 2) (cons 1 2)) 113 | 114 | ;; type-tests 115 | (check-t (pair? (cons 'a 'b))) 116 | (check-t (pair? '(a b c))) 117 | (check-nil (atom? '(a b c))) 118 | (check-t (atom? ())) 119 | (check-t (atom? 'a)) 120 | (check-t (atom? 1)) 121 | 122 | ;; functions 123 | (check-eval 0 ((fn (x) x) 0)) 124 | (check-eval '(a) ((fn (x) (cons x '())) 'a)) 125 | (check-eval '(a) ((fn (x y) (cons x '())) 'a 'b)) 126 | (check-eval '(a . b) ((fn (x y) (cons x y)) 'a 'b))) 127 | -------------------------------------------------------------------------------- /rotten.rot: -------------------------------------------------------------------------------- 1 | ;; -*- mode: scheme -*- 2 | 3 | ;; bootstrapping 4 | (def (not x) (if x () 't)) 5 | (def (caar x) (car (car x))) 6 | (def (cadr x) (car (cdr x))) 7 | (def (cddr x) (cdr (cdr x))) 8 | (def (cdar x) (cdr (car x))) 9 | 10 | (def (list . xs) xs) 11 | (def (rev-append x y) (if x (rev-append (cdr x) (cons (car x) y)) y)) 12 | (def (rev l) (rev-append l nil)) 13 | (def (append x y) (rev-append (rev x) y)) 14 | (def (map f x) (if x (cons (f (car x)) (map f (cdr x))))) 15 | (def (any test l) (if l ((fn (x) (if x x (any test (cdr l)))) (test (car l))))) 16 | (def (assoc k l) (any (fn (x) (if (eq? k (car x)) x)) l)) 17 | 18 | 19 | ;; global environment; an assoc-list. 20 | (def globals 21 | (list 22 | (cons 'cons cons) 23 | (cons 'car car) 24 | (cons 'cdr cdr) 25 | (cons 'symbol? symbol?) 26 | (cons 'atom? atom?) 27 | (cons 'cons? cons?) 28 | (cons 'eq? eq?) 29 | (cons 'apply apply) 30 | (cons '+ +) 31 | (cons '- -))) 32 | 33 | 34 | ;; metacircular evaluator 35 | ;; env is an assoc-list. 36 | (def (eval x env) 37 | (if 38 | (symbol? x) (lookup x env) ;variable 39 | (atom? x) x ;literal 40 | ;; special forms 41 | (eq? (car x) 'quote) (cadr x) 42 | (eq? (car x) 'fn) (make-fn (cadr x) (cddr x) env) 43 | (eq? (car x) 'if) (eval-if (cdr x) env) 44 | (eq? (car x) 'def) (eval-def (cadr x) (cddr x) env) 45 | ;; otherwise, function application 46 | (apply (eval (car x) env) (map (fn (x) (eval x env)) (cdr x))))) 47 | 48 | (def (lookup name env) (cdr (assoc name (append env (car globals))))) 49 | 50 | (def (make-fn params body env) 51 | (fn args (eval-body body (append (make-env parms args) env)))) 52 | 53 | (def (make-env params args) 54 | (if (symbol? params) (list (cons params args)) 55 | params (cons (cons (car params) (car args)) 56 | (make-env (cdr params) (cdr args))))) 57 | 58 | (def (eval-body body env) 59 | ((fn (x) (if (cdr body) (eval-body (cdr body) env) x)) 60 | (eval (car body env)))) 61 | 62 | (def (eval-if conds env) 63 | (if (cdr conds) 64 | (if (eval (car conds) env) (eval (cadr conds) env) 65 | (eval-if (cddr conds) env)) 66 | (eval (car conds) env))) 67 | 68 | (def (eval-def target body env) 69 | (if (cons? target) 70 | ;; defining a function 71 | (set-global! (car target) (make-fn (cdr target) body env)) 72 | ;; defining a variable 73 | (set-global! target (eval (car body) env)))) 74 | 75 | (def (set-global! n v) 76 | (def globals (cons (cons n v) (car globals))) 77 | v) 78 | -------------------------------------------------------------------------------- /sexp.py: -------------------------------------------------------------------------------- 1 | # S-expressions are represented as follows: 2 | # 3 | # - Conses (a . b) are represented as Cons(a, b) 4 | # - Nil () is represented by the Python empty-tuple () 5 | # - A symbol 'a is represented as Symbol("a") 6 | # - Numbers are represented by Python ints 7 | # - Strings are represented by Python strs 8 | 9 | import re 10 | from collections import namedtuple 11 | import StringIO 12 | 13 | class Cons(namedtuple('Cons', 'car cdr')): 14 | def __eq__(self, other): 15 | return isinstance(other, Cons) and super(Cons, self) == other 16 | 17 | class Symbol(object): 18 | def __init__(self, name): self.name = name 19 | def __str__(self): return self.name 20 | def __eq__(self, other): 21 | return isinstance(other, Symbol) and self.name == other.name 22 | def __cmp__(self, other): 23 | assert isinstance(other, Symbol) 24 | return cmp(self.name, other.name) 25 | def __repr__(self): return 'Symbol(%r)' % self.name 26 | 27 | # ---------- SEXP UTILITIES ---------- 28 | def is_sexp(x): # shallow test 29 | return (isinstance(x, (Cons, Symbol, str)) 30 | # isinstance(True, int) == True, grumble grumble 31 | or (isinstance(x, int) and not isinstance(x, bool)) 32 | or x == ()) 33 | 34 | def is_null(x): 35 | assert is_sexp(x) 36 | return x == () 37 | 38 | def is_true(x): 39 | assert is_sexp(x) 40 | return not is_null(x) 41 | 42 | def truthify(x): 43 | """Takes Python truth values to Rotten truth values.""" 44 | if x: return Symbol("t") 45 | else: return () 46 | 47 | def consify(lst): 48 | """Turns a Python sequence into a Rotten list.""" 49 | result = () 50 | for e in reversed(lst): 51 | result = Cons(e, result) 52 | return result 53 | 54 | def cons_iter(conses): 55 | """Iterates over a Rotten list.""" 56 | while conses != (): 57 | assert isinstance(conses, Cons) 58 | yield conses.car 59 | conses = conses.cdr 60 | 61 | def write(f, exp): 62 | """Writes a Rotten value to a file-like object.""" 63 | if isinstance(exp, Symbol): 64 | f.write(exp.name) 65 | elif isinstance(exp, Cons) or exp == (): 66 | f.write('(') 67 | first = True 68 | while isinstance(exp, Cons): 69 | if not first: 70 | f.write(' ') 71 | write(f, exp.car) 72 | exp = exp.cdr 73 | first = False 74 | if exp != (): 75 | f.write('. ') 76 | write(f, exp) 77 | f.write(')') 78 | else: 79 | f.write(repr(exp)) 80 | 81 | def to_str(exp): 82 | """Turns a Rotten value into a string containing its s-expression.""" 83 | s = StringIO.StringIO() 84 | write(s, exp) 85 | return s.getvalue() 86 | 87 | # ---------- PARSING ---------- 88 | class ParseError(Exception): 89 | def __init__(self, buf, message): 90 | self.buf = buf 91 | super(ParseError, self).__init__(message) 92 | 93 | class EOF(ParseError): pass 94 | class RParen(ParseError): pass 95 | 96 | tok_re = re.compile(r""" 97 | \s+ # whitespace 98 | | \( | \) # parentheses 99 | | [-a-zA-Z_!?+=<>/*@$%^&][-a-zA-Z0-9_!?+=<>/*@$%^&]* # symbols 100 | | -?[0-9]+ # numeric literals 101 | | "(?:[^"]|\\")*" # string literals 102 | | ' # quote 103 | """, re.VERBOSE) 104 | 105 | def is_whitespace(tok): return re.match(r'\s', tok) 106 | def is_lparen(tok): return tok == '(' 107 | def is_rparen(tok): return tok == ')' 108 | def is_quote(tok): return tok == "'" 109 | def is_symbol(tok): return bool(re.match('[-a-zA-Z_!?+=<>/*@$%^&]', tok)) 110 | def is_number(tok): return bool(re.match('-|[0-9]', tok)) 111 | def is_string(tok): return tok.startswith('"') 112 | 113 | # Tokenizing 114 | def next_tok(buf): 115 | if not buf: 116 | raise EOF(buf, "end of input") 117 | 118 | m = tok_re.match(buf) 119 | if not m: 120 | raise ParseError(buf, "could not find a token") 121 | 122 | tok = m.group() 123 | return buf[len(tok):], tok 124 | 125 | def expect_tok(buf, pred, msg): 126 | newbuf, tok = next_tok(buf) 127 | if not pred(tok): 128 | raise ParseError(buf, msg) 129 | return newbuf, tok 130 | 131 | # S-expression parsing. I could depend on an external library but this is 132 | # easier. Returns (new_buf, exp). 133 | def parse_exp(buf): 134 | while True: 135 | pre_buf = buf # useful for error reporting 136 | buf, tok = next_tok(buf) 137 | 138 | if is_whitespace(tok): 139 | continue 140 | elif is_lparen(tok): 141 | buf, exps = parse_exps(buf) 142 | buf, _ = expect_tok(buf, is_rparen, "expected a right-paren") 143 | return buf, consify(exps) 144 | elif is_rparen(tok): 145 | raise RParen(pre_buf, "unexpected right-paren") 146 | elif is_quote(tok): 147 | return buf, Symbol("quote") 148 | elif is_symbol(tok): 149 | return buf, Symbol(tok) 150 | elif is_number(tok): 151 | return buf, int(tok) 152 | elif is_string(tok): 153 | contents = tok[1:-1] 154 | if "\\" in contents: 155 | raise NotImplementedError("string escapes not implemented") 156 | assert not '"' in contents 157 | assert isinstance(contents, (str, unicode)) 158 | return buf, contents 159 | 160 | assert False, "impossible! I'm sure I covered all cases!" 161 | 162 | # returns (new_buf, list-of-exps) 163 | def parse_exps(buf): 164 | exps = [] 165 | while True: 166 | try: 167 | buf, e = parse_exp(buf) 168 | except EOF as e: 169 | return e.buf, exps 170 | except RParen as e: 171 | return e.buf, exps 172 | exps.append(e) 173 | -------------------------------------------------------------------------------- /vm-design.md: -------------------------------------------------------------------------------- 1 | # VM state 2 | 3 | The VM has one piece of global state: the symbol table, which maps from global 4 | variables to their values. 5 | 6 | The VM has three piece of local state: 7 | - `I`: The instruction list. 8 | - `E`: The environment, an immutable array of closed-over values. 9 | - `S`: The stack, which stores temporary values and continuations. 10 | 11 | # Instructions 12 | 13 | - `(push LIT)`: Pushes `LIT` onto `S`. 14 | 15 | - `(pop)`: Pops one element from `S` and ignores it. 16 | 17 | - `(access N)`: Pushes `E[N]` onto `S`. 18 | 19 | - `(call N)`: Calls `S[N]` with `S[N-1..0]` as arguments. 20 | 21 | - `(closure ARITY REST-PARAM INSTRS)`: Makes a closure and pushes it onto `S`. 22 | 23 | - `(if THEN-INSTRS ELSE-INSTRS)`: Pops `S`. If result is true, proceeds to 24 | `THEN-INSTRS`; otherwise, proceeds to `ELSE-INSTRS`. Before proceeding, pushes 25 | a continuation. 26 | 27 | - `(get-global NAME)`: Pushes the value of the global variable `NAME` onto 28 | `D`. 29 | 30 | - `(set-global NAME)`: Sets the global variable `NAME` to S[0]. Does not pop 31 | `S`; its "return value" is what it has set `NAME` to. 32 | 33 | # Builtins 34 | 35 | Builtin functions are implemented using a cheap trick. There are no special 36 | instructions for them. Rather, all of them except `apply` are just functions in 37 | the VM's host language. When a host function is "call"-ed in the VM, the VM just 38 | calls it in the host language. 39 | 40 | The `apply` builtin is different. The global variable `apply` is bound to a 41 | implementation-defined value which the VM handles specially. (TODO: Explain why 42 | this needs to be special-cased to avoid recursively entering VM.) 43 | 44 | # Functions 45 | 46 | Functions/closures are structures with four fields: 47 | 48 | - `arity`: number of parameters (excluding rest parameter) 49 | - `has-rest-param`: whether it has a rest parameter or not 50 | - `env`: list representing closed-over environment 51 | - `code`: the VM instruction list for the function's body 52 | 53 | # Continuations 54 | 55 | We use a structure called a continuation to remember what to do after finishing 56 | a function or finishing one branch of an `if` instruction. A continuation has 57 | two fields: 58 | 59 | - `instrs`: The instruction-stream (`I`) to return to. 60 | - `env`: The environment (`E`) to restore. 61 | 62 | The `env` field is only needed when returning from a function, not when 63 | finishing an `if` instruction, but it is simpler to have only one form of a 64 | continuation. 65 | 66 | # Environments 67 | 68 | Our environment records our closed-over variables. Variables are accessed by 69 | index, so variable names are not needed in the VM. These indices correspond 70 | roughly to DeBruijn indices. TODO: Explain DeBruijn indices. 71 | 72 | However, there is the question of what order indices are assigned in a function 73 | of multiple arguments: 74 | 75 | (fn (x y) (list x y)) 76 | 77 | In `(list x y)`, what indices do x and y have? 78 | 79 | Currently, I give `x` index 0 and `y` index 1. This is the opposite of what a 80 | traditional "currying" implementation of multiple-argument functions would do. I 81 | actually wrote the VM this way by accident; I may change it later. 82 | 83 | For clarity, here is an example: 84 | 85 | (fn (a b) ((fn (x y) (list a b x y)) a b)) 86 | 87 | If we replace variables by their indices, this corresponds to: 88 | 89 | (fn (_ _) ((fn (_ _) (list 2 3 0 1)) 0 1)) 90 | -------------------------------------------------------------------------------- /vm.clj: -------------------------------------------------------------------------------- 1 | ;; How we represent Rotten data in Clojure: 2 | ;; 3 | ;; - Symbols, numbers, strings are Clojure symbols, numbers, and strings. 4 | ;; - '() is nil. 5 | ;; 6 | ;; - Conses (a . b) are 2-element vectors [a b]. I'd like to be able to use 7 | ;; Clojure lists but Rotten permits improper lists and Clojure does not. 8 | ;; 9 | ;; - Closures: Implemented via defrecord. 10 | (ns vm 11 | (:use [clojure.core.match :refer [match]])) 12 | 13 | (defrecord Closure [arity has-rest-param? code env]) 14 | (defrecord Cont [instrs env]) 15 | 16 | (def car first) 17 | (def cdr second) 18 | 19 | (defn to-rlist "Converts a seq to a rotten list." 20 | [s] 21 | (reduce (fn [x y] [y x]) '() (reverse s))) 22 | 23 | (defn t? [x] (not (= nil x))) 24 | 25 | ;;; VM globals: an atom containing a map. 26 | (def init-globals 27 | {'apply 'apply 28 | 'cons #(vector %1 %2) 29 | 'car car 30 | 'cdr cdr 31 | 'symbol? #(if (symbol? %) 't nil) 32 | 'cons? #(if (vector? %) 't nil) 33 | 'atom? #(if (vector? %) nil 't) 34 | 'eq? #(if (= %1 %2) 't nil) 35 | '+ + 36 | '- -}) 37 | 38 | (def globals (atom init-globals)) 39 | (defn reset [] (swap! globals (fn [_] init-globals))) 40 | 41 | (defn done? [instrs data env] 42 | (and (= nil instrs) (>= 1 (count data)))) 43 | 44 | ;; instrs: rotten-list of instructions 45 | ;; data: list of rotten values 46 | ;; env: vector of rotten values. 47 | (declare step) 48 | (defn run- [instrs data env] 49 | (if (done? instrs data env) data 50 | (apply run- (step instrs data env)))) 51 | (defn run 52 | ([instrs] (run instrs '() [])) 53 | ([instrs data env] (first (run- instrs data env)))) 54 | (defn run-body 55 | ([instrs] (run-body instrs '() [])) 56 | ([instrs data env] (run- instrs data env) nil)) 57 | 58 | (declare step-cont step-instr) 59 | (defn step [instrs data env] 60 | (when (done? instrs data env) 61 | (throw (Exception. "cannot step VM; it is done."))) 62 | (if instrs 63 | ;; stack is (RETVAL CONT & RESTOFSTK) 64 | (let [[retval cont & stk] data] (step-cont retval cont stk)) 65 | (step-instr (car instrs) (cdr instrs) data env))) 66 | 67 | (defn step-cont [value cont data] 68 | [(.instrs cont) (conj data value) (.env cont)]) 69 | 70 | (declare do-call) 71 | (defn step-instr [i instrs data env] 72 | (match [(vec i)] 73 | [['pop]] [instrs (pop data) env] 74 | [['push x]] [instrs (conj data x) env] 75 | [['access n]] [instrs (conj data (nth env n)) env] 76 | [['call n]] 77 | (let [[f & args] (take (+ 1 n) data)] 78 | (do-call f args instrs (nthnext data (+ 1 n)) env)) 79 | [['if thn-code els-code]] 80 | (let [code (if (t? (peek data)) thn-code els-code) 81 | data (pop data)] 82 | [code (conj data (Cont. instrs env)) env]) 83 | [['get-global name]] 84 | (let [val (if (contains? @globals name) (@globals name) 85 | (throw (Exception. "unbound global")))] 86 | [instrs (conj data val) env]) 87 | [['set-global name]] 88 | (do (swap! globals assoc name (peek data)) 89 | [instrs data env]))) 90 | 91 | (defn do-call [f as instrs data env] 92 | (cond 93 | (= f 'apply) (let [[f as] as] (do-call f as)) 94 | (fn? f) [instrs (conj data (apply f as)) env] 95 | (instance? Closure f) 96 | (if ((if (.has-rest-param? f) < not=) (count as) (.arity f)) 97 | (throw (Exception. "wrong number of arguments to function")) 98 | (let [as (if (not (.has-rest-param? f)) as 99 | (concat (take as (.arity f)) 100 | (list (to-rlist (drop as (.arity f))))))] 101 | [(.code f) (conj data (Cont. instrs env)) (into (.env f) as)])) 102 | :else (throw (Exception. "not callable")))) 103 | -------------------------------------------------------------------------------- /vm.py: -------------------------------------------------------------------------------- 1 | from collections import namedtuple 2 | import types 3 | 4 | import sexp 5 | from sexp import Symbol, Cons 6 | 7 | # whether x is a callable python object 8 | def is_callable(x): 9 | return isinstance(x, types.FunctionType) or hasattr(x, '__call__') 10 | 11 | Cont = namedtuple('Cont', 'instrs env') 12 | Closure = namedtuple('Closure', 'arity has_rest_param code env') 13 | 14 | class ApplyBuiltin(object): pass 15 | applyBuiltin = object() # singleton 16 | 17 | def car(x): 18 | assert isinstance(x, Cons), "Not a cons: %s" % (x,) 19 | return x.car 20 | 21 | def cdr(x): 22 | assert isinstance(x, Cons), "Not a cons: %s" % (x,) 23 | return x.cdr 24 | 25 | def is_symbol(x): return sexp.truthify(isinstance(x, Symbol)) 26 | def is_cons(x): return sexp.truthify(isinstance(x, Cons)) 27 | def is_atom(x): return sexp.truthify(not isinstance(x, Cons)) 28 | # FIXME: is is_eq correct? unit test this! 29 | def is_eq(x, y): return sexp.truthify(x == y) 30 | def add(x, y): return x + y 31 | def sub(x, y): return x - y 32 | 33 | def make_globals(): 34 | return {"apply": applyBuiltin, 35 | "cons": Cons, 36 | "car": car, 37 | "cdr": cdr, 38 | "symbol?": is_symbol, 39 | "cons?": is_cons, 40 | "atom?": is_atom, 41 | "eq?": is_eq, 42 | "+": add, 43 | "-": sub} 44 | 45 | class VMError(Exception): pass 46 | 47 | class VM(object): 48 | def __init__(self): 49 | self.globals = make_globals() 50 | 51 | def set_global(self, sym, value): 52 | assert isinstance(sym, Symbol) 53 | self.globals[sym.name] = value 54 | 55 | def get_global(self, sym): 56 | assert isinstance(sym, Symbol) 57 | return self.globals[sym.name] 58 | 59 | def run_body(self, instrs, data=None, env=None): 60 | thread = Thread(self, instrs, data, env) 61 | thread.run() 62 | 63 | def run_expr(self, instrs, data=None, env=None): 64 | thread = Thread(self, instrs, data, env) 65 | thread.run() 66 | return thread.result() 67 | 68 | # TODO: env doesn't need to be mutable! use a tuple for it! 69 | class Thread(object): 70 | # instrs is a cons-list 71 | # data, env are Python lists 72 | # data is interpreted as a stack; its top elements are at the end 73 | def __init__(self, vm, instrs, data=None, env=None): 74 | self.vm = vm 75 | self.instrs = instrs 76 | self.data = data if data is not None else [] 77 | self.env = env if env is not None else [] 78 | 79 | def result(self): 80 | assert self.is_done() 81 | assert len(self.data) == 1 82 | return self.data[0] 83 | 84 | # internal convenience methods, no abstraction here 85 | def push(self, x): self.data.append(x) 86 | def pop(self): return self.data.pop() 87 | 88 | def push_cont(self): 89 | if not self.instrs: 90 | # print 'omitting no-op continuation' 91 | return 92 | self.push(Cont(self.instrs, self.env)) 93 | 94 | # we're done if we have no instructions left and <= 1 value on the stack. 95 | def is_done(self): 96 | return bool(self.instrs == () and len(self.data) <= 1) 97 | 98 | def run(self): 99 | while not self.is_done(): 100 | self.step() 101 | 102 | def step(self): 103 | assert not self.is_done() 104 | if not self.instrs: 105 | # pull value, continuation off stack and enter the continuation 106 | value = self.pop() 107 | cont = self.pop() 108 | self.instrs = cont.instrs 109 | self.env = cont.env 110 | self.data.append(value) 111 | else: 112 | # execute next instruction 113 | instr = car(self.instrs) 114 | self.instrs = cdr(self.instrs) 115 | self.step_instr(instr) 116 | 117 | # TODO: better errors 118 | def step_instr(self, instr): 119 | # an instruction is of the form (TYPE ARGS...), encoded as a cons-list 120 | # where TYPE is a symbol 121 | # first, we de-consify it 122 | if not isinstance(instr, Cons): 123 | raise VMError("instruction is not a cons: %s" % (instr,)) 124 | tp = car(instr).name 125 | args = tuple(sexp.cons_iter(cdr(instr))) 126 | if tp == 'push': 127 | val, = args # 1-argument tuple unpacking! 128 | self.push(val) 129 | elif tp == 'pop': 130 | [] = args # 0-argument tuple unpacking! 131 | self.pop() 132 | elif tp == 'access': 133 | n, = args 134 | self.push(self.env[n]) 135 | elif tp == 'closure': 136 | arity, has_rest_param, code = args 137 | # need to copy self.env because it is mutable 138 | env = list(self.env) 139 | closure = Closure(arity, sexp.is_true(has_rest_param), code, env) 140 | self.push(closure) 141 | elif tp == 'call': 142 | n, = args 143 | func_args = self.data[-n:] 144 | del self.data[-n:] # in-place removal of elements 145 | func = self.pop() 146 | self.call(func, func_args) 147 | elif tp == 'if': 148 | then_instrs, else_instrs = args 149 | instrs = then_instrs if sexp.is_true(self.pop()) else else_instrs 150 | # NB. the continuations for if-branches don't really need an `env' 151 | # value, since env won't be changed. But it's simpler to do this 152 | # than to create a new type of continuation. 153 | self.push_cont() 154 | self.instrs = instrs 155 | elif tp == 'get-global': 156 | sym, = args 157 | self.push(self.vm.get_global(sym)) 158 | elif tp == 'set-global': 159 | sym, = args 160 | # no pop, just a peek 161 | self.vm.set_global(sym, self.data[-1]) 162 | else: 163 | raise VMError("Unrecognized instruction type.") 164 | 165 | # args is a Python sequence 166 | def call(self, func, args): 167 | # apply must, alas, be special-cased 168 | while func is applyBuiltin: 169 | func = args[0] 170 | args = args[1:] 171 | 172 | if isinstance(func, Closure): 173 | self.call_closure(func, args) 174 | elif is_callable(func): 175 | self.push(func(*args)) 176 | else: 177 | raise VMError("Cannot call non-function") 178 | 179 | # args is a Python sequence 180 | def call_closure(self, func, args): 181 | assert isinstance(func, Closure) 182 | 183 | if len(args) < func.arity: 184 | raise VMError("too few arguments to function") 185 | if not func.has_rest_param and len(args) > func.arity: 186 | raise VMError("too many arguments to function") 187 | 188 | # munge arguments into environment, taking into account rest-param 189 | env = [] 190 | if not func.has_rest_param: 191 | env.extend(args) 192 | else: 193 | env.extend(args[:func.arity]) 194 | env.append(sexp.consify(args[func.arity:])) 195 | env.extend(func.env) 196 | 197 | # Jump into function 198 | self.push_cont() 199 | self.instrs = func.code 200 | self.env = env 201 | -------------------------------------------------------------------------------- /vm.rkt: -------------------------------------------------------------------------------- 1 | #lang racket 2 | 3 | (provide (all-defined-out)) ;TODO: fix this 4 | 5 | ;;; A simple "virtual machine", based loosely on the Categorical Abstract 6 | ;;; Machine (CAM). See http://gallium.inria.fr/~xleroy/talks/zam-kazam05.pdf 7 | ;;; TODO: better reference link, that one is pretty brief 8 | 9 | (define (true? x) (not (null? x))) 10 | 11 | ;; has-rest-param? is a *Racket* bool (#t or #f), not a Rotten bool ('t or '()) 12 | (struct closure (arity has-rest-param? code env) #:transparent) 13 | (struct cont (instrs env) #:transparent) 14 | 15 | ;; VM globals, pre-populated with builtins 16 | (define (make-globals) 17 | (make-hash 18 | `((apply . apply) ;; no unquote; apply is special. see call! in step-instr. 19 | (cons . ,cons) 20 | (car . ,(lambda (x) (if (null? x) '() (car x)))) 21 | (cdr . ,(lambda (x) (if (null? x) '() (cdr x)))) 22 | (symbol? . ,(lambda (x) (if (symbol? x) 't '()))) 23 | (cons? . ,(lambda (x) (if (pair? x) 't '()))) 24 | (atom? . ,(lambda (x) (if (pair? x) '() 't))) 25 | (eq? . ,(lambda (x y) (if (eqv? x y) 't '()))) 26 | (+ . ,+) (- . ,-)))) 27 | 28 | (define globals (make-globals)) 29 | (define (reset) (set! globals (make-globals))) 30 | 31 | (define (run instrs [data '()] [env '()]) (car (run- instrs data env))) 32 | (define (run-body instrs [data '()] [env '()]) (run- instrs data env) (void)) 33 | (define (run- instrs data env) 34 | (if (done? instrs data env) data 35 | (call-with-values (lambda () (step instrs data env)) run-))) 36 | 37 | ;;; we're done if we have no instructions left and <= 1 value on the stack 38 | ;;; (either 1 value, the value to return; or none, if we were eval'ing for 39 | ;;; side-effects) 40 | (define (done? instrs data env) 41 | (and (null? instrs) (>= 1 (length data)))) 42 | 43 | (define (step instrs data env) 44 | (when (done? instrs data env) (error "cannot step VM; it is done.")) 45 | (if (null? instrs) 46 | (step-cont (car data) (cadr data) (cddr data)) 47 | (step-instr (car instrs) (cdr instrs) data env))) 48 | 49 | (define (step-cont value kont data) 50 | (match-define (cont instrs env) kont) 51 | (values instrs (cons value data) env)) 52 | 53 | (define (step-instr i instrs data env) 54 | ;; (displayln (format "INSTR ~a" i)) 55 | ;; (displayln (format " STK ~a" data)) 56 | ;; (displayln (format " ENV ~a" env)) 57 | (define (pop!) (let ([x (car data)]) (set! data (cdr data)) x)) 58 | (define (push! x) (set! data (cons x data))) 59 | 60 | (define (call! func args) 61 | (match func 62 | ['apply 63 | (match-define `(,f ,as) args) 64 | (call! f as)] 65 | [(? procedure?) (push! (apply func args))] 66 | [(closure f-arity f-has-rest-param f-code f-env) 67 | (define num-args (length args)) 68 | ;; check fn arity matches number of arguments 69 | (unless ((if f-has-rest-param <= =) f-arity num-args) 70 | (error "wrong number of arguments to function")) 71 | ;; munge arguments for rest parameter 72 | (when f-has-rest-param 73 | (set! args (append (take args f-arity) (list (drop args f-arity))))) 74 | ;; perform the call 75 | (set! data (cons (cont instrs env) data)) 76 | (set! instrs f-code) 77 | (set! env (append args f-env))])) 78 | 79 | ;; ----- instruction dispatch ----- 80 | (match i 81 | [`(push ,x) (push! x)] 82 | ['(pop) (pop!)] 83 | [`(access ,n) (push! (list-ref env n))] 84 | [`(closure ,arity ,has-rest-param ,code) 85 | (push! (closure arity (true? has-rest-param) code env))] 86 | [`(call ,n) 87 | ;; NB. use of 'reverse puts arguments in the right order. 88 | (match-define (cons f args) (reverse (take data (+ 1 n)))) 89 | (set! data (drop data (+ 1 n))) 90 | (call! f args)] 91 | [`(if ,thn-code ,els-code) 92 | (define code (if (true? (pop!)) thn-code els-code)) 93 | ;; NB. the continuation for if-branches doesn't really need an `env' 94 | ;; value, since env won't be changed. but it's simpler to do this then 95 | ;; to create a new type of continuation. 96 | (push! (cont instrs env)) 97 | (set! instrs code)] 98 | ;; global environment 99 | [`(get-global ,name) (push! (hash-ref globals name))] 100 | [`(set-global ,name) (hash-set! globals name (car data))]) 101 | (values instrs data env)) 102 | --------------------------------------------------------------------------------