├── .gitattributes ├── .gitignore ├── DemoVirtualMachine.cabal ├── LICENSE ├── README.md ├── Setup.hs ├── app └── Main.hs ├── br.sh ├── run.sh ├── src ├── Lib.hs └── VirtualMachine.hs ├── stack.yaml └── test └── Spec.hs /.gitattributes: -------------------------------------------------------------------------------- 1 | *md text 2 | *.md text 3 | *.sh text 4 | *.markdown text 5 | .gitattributes text 6 | .gitignore text 7 | 8 | * text=auto 9 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | .stack-work/ 2 | -------------------------------------------------------------------------------- /DemoVirtualMachine.cabal: -------------------------------------------------------------------------------- 1 | name: DemoVirtualMachine 2 | version: 0.1.0.0 3 | synopsis: Initial project template from stack 4 | description: Please see README.md 5 | homepage: https://github.com/githubuser/DemoVirtualMachine#readme 6 | license: MIT 7 | license-file: LICENSE 8 | author: Andre Van Der Merwe 9 | maintainer: dart@pobox.com 10 | copyright: 2016 Andre Van Der Merwe 11 | category: Demo 12 | build-type: Simple 13 | -- extra-source-files: 14 | cabal-version: >=1.10 15 | 16 | library 17 | hs-source-dirs: src 18 | exposed-modules: VirtualMachine 19 | build-depends: base >= 4.7 && < 5 20 | , protolude 21 | , text 22 | , containers 23 | default-language: Haskell2010 24 | ghc-options: -threaded -rtsopts -with-rtsopts=-N -Wall 25 | 26 | executable DemoVirtualMachine-exe 27 | hs-source-dirs: app 28 | main-is: Main.hs 29 | ghc-options: -threaded -rtsopts -with-rtsopts=-N -Wall 30 | build-depends: base 31 | , DemoVirtualMachine 32 | , protolude 33 | , text 34 | , containers 35 | default-language: Haskell2010 36 | 37 | test-suite DemoVirtualMachine-test 38 | type: exitcode-stdio-1.0 39 | hs-source-dirs: test 40 | main-is: Spec.hs 41 | build-depends: base 42 | , DemoVirtualMachine 43 | , protolude 44 | , hspec 45 | , containers 46 | , QuickCheck 47 | ghc-options: -threaded -rtsopts -with-rtsopts=-N -Wall 48 | default-language: Haskell2010 49 | 50 | source-repository head 51 | type: git 52 | location: https://github.com/andrevdm/DemoVirtualMachine 53 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2016 Andre Van Der Merwe 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Creating a Stack Machine with Haskell 2 | 3 | ## About the Project 4 | 5 | This is a small demonstration project showing how a simple byte code interpreting stack machine (virtual machine) can be built with Haskell. It is not a production VM nor of any particular practical use but is rather a simple demonstration of how a stack machine can be built. 6 | 7 | I built this for mainly as a project for learning Haskell, i.e. something a little bigger to work on. So NB this is probably not idiomatic Haskell, and may have some newbie mistakes. Hopefully it is interesting enough despite this... 8 | 9 | ### Stack Machines 10 | 11 | > A stack machine is a real or emulated computer that uses a pushdown stack rather than individual machine registers to evaluate each sub-expression in the program. A stack computer is programmed with a reverse Polish notation instruction set. - [wikipedia \[1\]](https://en.wikipedia.org/wiki/Stack_machine) 12 | 13 | Stack machines are simpler to implement than [register machines](https://en.wikipedia.org/wiki/Register_machine) but are still practical even for production VMs. 14 | 15 | ### Writing the VM in Haskell 16 | 17 | Virtual machines are typically written in a low level language like C for maximum efficiency. They also are typically written using mutable data structures. Here I'm using Haskell and pure functional data structures. I have absolutely no performance goals, so there are no constraints I need to worry about. 18 | 19 | A few design decisions 20 | 21 | 1. I'm using a Data.Sequence rather than a list. While I'm not concerned about performance using a linked list to do random access is still a bad idea 22 | 23 | 2. Try to avoid bottom (⊥), so use "safe" versions whenever ⊥ could otherwise be returned (toEnum, index, head etc) 24 | 25 | 3. I'm using [Stephen Diehl's Protolude \[2\]](https://github.com/sdiehl/protolude) and removing the default prelude 26 | 27 | Using the immutable data structures like Data.Sequence had worked out nicely and means that as the VM runs you can keep a full history of the VM's state. So you have everything (apart from time) you need to build a nice visualiser / 'time travelling' debugger. 28 | 29 | ## The VM 30 | 31 | ### Opcodes & byte code 32 | 33 | The opcodes are the operations that the virtual CPU can perform. These are simple operations like push, pop, add and jump. The opcodes are encoded as bytes together with the opcode parameters (e.g. the value to push) this forms the byte code. 34 | 35 | Here is the current set of opcodes understood by the CPU 36 | 37 | ```haskell 38 | data Operation = Nop -- No operation 39 | | Halt -- Stop CPU execution 40 | | Push -- Push a value onto the stack 41 | | Pop -- Pop the most recent value from the stack 42 | | PopPrev -- Pop n values before the most recent value from the stack 43 | | Add -- Add the top two items on the stack 44 | | Inc -- Increment the top item on the stack 45 | | Dup -- Duplicate the most recent item on the stack 46 | | Jmp -- Jump unconditionally to a location 47 | | Bne -- Pop the top two items, compare and branch if the values are not equal 48 | | Beq -- Pop the top two items, compare and branch if the values are equal 49 | | Bgt -- Pop the top two items, compare and branch if value 1 is greater than value 2 50 | | Bgte -- Pop the top two items, compare and branch if value 1 is greater or equal to value 2 51 | | Blt -- Pop the top two items, compare and branch if value 1 is less than value 2 52 | | Blte -- Pop the top two items, compare and branch if value 1 is less than or equal to value 2 53 | | Call -- Call a function 54 | | Ret -- Return from a function 55 | | LdArg -- Push local value n onto the stack 56 | deriving (Show, Eq, Ord, Enum, Bounded) 57 | ``` 58 | 59 | ### The virtual CPU 60 | 61 | The data structure representing the CPU is defined below 62 | 63 | ```haskell 64 | data Cpu = Cpu { ip :: Int -- Instruction pointer 65 | , fp :: Int -- Frame pointer 66 | , cpuStack :: S.Seq Int -- The stack 67 | , cpuGlobals :: S.Seq Int -- Gloal variables 68 | , ranOp :: Int -- The last opcode that was executed 69 | , state :: Text -- Debugging message 70 | , debug :: Bool -- Enable/disable debugging 71 | , panic :: Bool -- Is the CPU in a faulted state 72 | } 73 | deriving (Show, Eq) 74 | 75 | -- | Default empty/initial CPU state 76 | emptyCpu :: Cpu 77 | emptyCpu = Cpu { ip = -1 78 | , fp = -1 79 | , cpuStack = S.empty 80 | , cpuGlobals = S.empty 81 | , state = "" 82 | , debug = True 83 | , ranOp = 0 84 | , panic = False 85 | } 86 | ``` 87 | 88 | This is fairly minimal but it’s more than enough for a simple VM like this. 89 | 90 | Some things to note 91 | * The stack (cpuStack) is part of the CPU. This makes sense for a stack machine since a stack is core to everything it does. It also means that as the CPU runs you get a full history of each stack state along with the CPU flags at the time each opcode was run 92 | * There is no need for a stack pointer since the stack is a 'dynamically' growing Data.Sequence. I.e. sp always points to the head of cpuStack. 93 | * The instruction pointer (ip) points to the next instruction to run. 94 | * In this implementation the byte stream is fixed (no self-modifying code), so there is no need to copy it on each CPU operation 95 | * The frame pointer (fp) is discussed below in the section about function calls (Call & Ret) 96 | 97 | ### The byte code assembler 98 | 99 | Rather than writing the byte code in hex a very simple assembler is used. Later on additional assemblers and compliers can be layer on top of this low level assembler which does little more than convert opcode mnemonics to byte code. 100 | 101 | An operation can take parameters from the byte code stream. For instance a ***push*** instruction takes a parameter that is the value to push onto the stack. The type that represents this is the inventively named ***OpAndParam*** 102 | 103 | ```haskell 104 | -- | A single CPU operator and its parameters 105 | type OpAndParam = (Operation, [Int]) 106 | ``` 107 | 108 | Having a type that defines the number of parameters an op takes and how many values it pops off the stack allows the assembler to perform some basic checks. The interpreter can also use this definition when running the code. 109 | 110 | ```haskell 111 | -- | Configuration for an operation 112 | -- | opParamCount = number of paramaters taken from the code stream 113 | -- | opPopsCount = number of values this op pops from the stack 114 | -- | opSimple = determines if the op needs full access to cpu state to change things like the fp and ip 115 | -- | note that 'complex' instructions do not need to honour opParamCount and opPopsCount 116 | -- | e.g. a 'ret' instruction pops a variable number of parameters 117 | data Instruction = Instruction { opCode :: Operation 118 | , opPopsCount :: Int 119 | , opParamCount :: Int 120 | , opSimple :: Bool 121 | } 122 | deriving (Show, Eq) 123 | ``` 124 | 125 | The instructions can then be setup and a map created from opcode to Instruction 126 | 127 | ```haskell 128 | -- | Config for the op codes 129 | instructions :: [Instruction] 130 | instructions = [ Instruction { opCode = Nop, opParamCount = 0, opPopsCount = 0, opSimple = True } 131 | -- ... 132 | , Instruction { opCode = Call, opParamCount = 1, opPopsCount = 0, opSimple = False } 133 | ] 134 | 135 | -- | Instructions indexed by opcode 136 | instrByOp :: Map.Map Operation Instruction 137 | instrByOp = Map.fromList $ map (\i -> (opCode i, i)) instructions 138 | ``` 139 | 140 | The assembler then does nothing more than converting the opcode enum to a byte (an Int in the code but it would be serialised as a byte) checking the number of parameters for each opcode. It is small enough to be pasted in full here 141 | 142 | ```haskell 143 | -- | A single assembler error 144 | data AssemblerError = AssemblerError Integer Operation Text deriving (Show, Eq) 145 | 146 | -- | Compiles the list to byte code 147 | -- | Returns as many errors as possible rather than just first error 148 | assembleByteCode :: [(Operation, [Int])] -> Either [AssemblerError] [Int] 149 | assembleByteCode code = 150 | let res = foldl assemble [] code in 151 | case lefts res of 152 | [] -> Right $ concat $ rights res 153 | errors -> Left errors 154 | 155 | where 156 | assemble :: [Either AssemblerError [Int]] -> OpAndParam -> [Either AssemblerError [Int]] 157 | assemble res (op, prms) = 158 | res ++ case Map.lookup op instrByOp of 159 | Nothing -> [Left $ AssemblerError (toInteger $ length res) op "unknown op code"] 160 | Just i -> 161 | if opParamCount i == length prms 162 | then [Right $ fromEnum (opCode i) : prms] 163 | else [Left $ AssemblerError (toInteger $ length res) op "incorrect number of parameters"] 164 | ``` 165 | 166 | ### The interpreter 167 | 168 | With all of that in place the interpreter can finally be written. 169 | 170 | ```haskell 171 | -- | Interpreter for the byte code 172 | -- | Given a byte code stream will 'run' the code 173 | -- | If debug is enabled then the full history (all states) will be returned. 174 | interpretByteCode :: S.Seq Int -> [Cpu] 175 | ``` 176 | 177 | The interpreter takes the output of the assembler or bytes loaded from a file and runs it producing CPU state along the way. In debug mode the interpreter stores all the states as it interprets, in "non-debug" mode only the last state is kept. _Note that currently the interpreter is always in debug mode_ 178 | 179 | Before looking at the implementation of the interpreter its worth going over a few examples of how it should operate 180 | 181 | ##### Push & Pop 182 | 183 | ```text 184 | +--------+ 185 | +--------+ | 123 | 186 | +--------+ | 123 | | 456 | 187 | +--------+ push 123 +--------+ push 456 +--------+ 188 | ``` 189 | 190 | In the first example the value 123 is pushed onto an empty stack. Then the value 456 is pushed. The head of the stack is at the "bottom" 191 | 192 | ```text 193 | +--------+ 194 | | 123 | +--------+ 195 | | 456 | | 123 | 196 | +--------+ pop +--------+ 197 | ``` 198 | 199 | A ***pop*** is the opposite of a ***push***. The pop operation get the most recent value from the stack (FIFO) in this case 456 leaving 123 on the stack. 200 | 201 | 202 | ##### Add 203 | 204 | An ***add*** operation pops the top two items from the stack, adds them and pushes the result back onto the stack 205 | 206 | ```text 207 | +--------+ 208 | | 100 | +--------+ 209 | push 100 | 123 | | 223 | 210 | push 123 +--------+ add +--------+ 211 | ``` 212 | 213 | Look at the definition of the instructions for these three operators 214 | 215 | ```haskell 216 | Instruction { opCode = Push, opParamCount = 1, opPopsCount = 0, opSimple = True } 217 | Instruction { opCode = Pop, opParamCount = 0, opPopsCount = 1, opSimple = True } 218 | Instruction { opCode = Add, opParamCount = 0, opPopsCount = 2, opSimple = True } 219 | ``` 220 | 221 | From this you can see that the ***Instruction*** shows that a ***push*** takes one parameter, a ***pop*** takes no parameters but pops a single value off the stack and an ***Add*** pops two values off the stack. As noted in the code comments ***opSimple*** indicates that these are simple operators with fixed stack effects. 222 | 223 | ##### Jmp 224 | 225 | The ***Jmp*** operator performs an unconditional jump to a fixed location relative to the start of the code stream. I.e. ***Jmp 100*** sets the instruction pointer to 100 and execution continues from there. 226 | 227 | Consider the following simple list of ops in Haskel 228 | 229 | ```haskell 230 | [ (Jmp, [3]) 231 | , (Nop, []) 232 | , (Halt, []) 233 | ] 234 | ``` 235 | 236 | This gets assembled into the following byte code 237 | ```text 238 | 00: 0903 -- Jmp 3 239 | 02: 00 -- Nop 240 | 03: 02 -- Halt 241 | ``` 242 | 243 | The Jmp instruction causes the CPU to set the instruction pointer (ip) to 3. In this example that means that the ***Nop*** at offset 2 is skipped and execution continues with the ***Halt*** operation at offset 3 244 | 245 | ##### Branching (Beq, Bne, Blr, Blte, Bgt, Bgte) 246 | 247 | Branching is a conditional jump. The top two values are popped off the stack compared based on the type of conditional operator. 248 | 249 | In the following example the values 1 and to are pushed. ***Bgt*** is executed and if 2 is greater than 1 then the CPU jumps to location 7. If no then it continues executing at the location after the branch (6) 250 | 251 | ```haskell 252 | [ (Push, [1]) 253 | , (Push, [2]) 254 | , (Bgt, [7]) 255 | , (Nop, []) 256 | , (Halt, []) 257 | ] 258 | ``` 259 | 260 | ```text 261 | 00: 0301 -- Push 1 262 | 02: 0302 -- Push 2 263 | 04: 0C07 -- Bgt 7 264 | 06: 00 -- Nop 265 | 07: 02 -- Halt 266 | ``` 267 | 268 | ```text 269 | +--------+ 270 | | 1 | +--------+ 271 | push 1 | 2 | | | 272 | push 2 +--------+ Bgt +--------+ 273 | ``` 274 | 275 | This is what the history of the CPU would look like for the above example 276 | 277 | ```haskell 278 | Cpu {ip = -1, cpuStack = [], ranOp = 0, state = "", panic = False} 279 | Cpu {ip = 1, cpuStack = [1], ranOp = 3, state = "Push", panic = False} 280 | Cpu {ip = 3, cpuStack = [2,1], ranOp = 3, state = "Push", panic = False} 281 | Cpu {ip = 5, cpuStack = [], ranOp = 12, state = "Bgt", panic = False} 282 | Cpu {ip = 6, cpuStack = [], ranOp = 0, state = "Nop", panic = False} 283 | Cpu {ip = 7, cpuStack = [], ranOp = 2, state = "Halt", panic = True} 284 | ``` 285 | 286 | ##### Call & Ret 287 | 288 | A function call is much like a jmp except that you have to store an address to return to. You could have two stacks, one for values and one for return address. It’s more common however to have a single stack with "stack frames". 289 | 290 | As a trivial example consider a the case when there are no parameters 291 | 292 | ```text 293 | 00: 1003 -- Call 03 294 | 02: 02 -- Halt 295 | 03: 00 -- Nop 296 | 04: 11 -- Ret 297 | ``` 298 | 299 | The CPU does the following 300 | * Executes the ***call*** operation and pushes the return address onto the stack, i.e. the next instruction after the ***call***. Here it is the ***halt*** at 02. 301 | * The ip is set to 03, the offset of the function, and the CPU executes the function (***nop***) 302 | * The ***ret*** operation gets the return address (02) from the stack and updates the ip 303 | * The ***halt*** at 02 is executed. 304 | 305 | ```text 306 | +--------+ 307 | | 02 | +--------+ 308 | Call 03 +--------+ Ret +--------+ 309 | ``` 310 | 311 | However this simple scheme does not work when you have variable numbers of parameters, locals etc. This is where the frame pointer (fp) and stack frames come in. 312 | 313 | A stack frame is the set of data stored on the stack for each method call. In this virtual machine that is 314 | 1. The return address 315 | 2. Parameters for the function 316 | 3. The previous frame pointer value 317 | 318 | 319 | As an example consider a function that adds two numbers and returns the sum. 320 | 321 | * The caller pushes the two values to be added (123 and 22), these are pushed in reverse order. I.e. parameter 1 last. 322 | 323 | ```text 324 | 00: 0322 -- Push 22 +--------+ ip = 4 325 | __ 02: 037b -- Push 123 | 22 | fp = -1 326 | 04: 1003 -- Call 09 | 123 | 327 | 06: 0502 -- PopPrev 2 +--------+ 328 | 08: 02 -- Halt 329 | 09: 1202 -- LdArd 2 330 | 0b: 1201 -- LdArg 1 331 | 0c: 06 -- Add 332 | 0d: 11 -- Ret 333 | ``` 334 | 335 | * The current frame pointer (-1) is stored. If no function has been called this will be -1. Further down in the example this will make more sense 336 | * The return address (06) is pushed onto the stack 337 | * The frame pointer is set to the start of the stack frame (02), i.e. position 2 from the end of the stack 338 | * The CPU sets the instruction pointer and "jumps" to the function (09) being called. 339 | 340 | ```text 341 | 00: 0322 -- Push 22 +--------+ ip = 0b 342 | 02: 037b -- Push 123 | 22 | fp = 2 343 | 04: 1003 -- Call 09 | 123 | 344 | 06: 0502 -- PopPrev 2 | -1 | 345 | 08: 02 -- Halt | 6 | <-- fp 346 | 09: 1202 -- LdArd 2 | 22 | 347 | __ 0b: 1201 -- LdArg 1 | 123 | 348 | 0c: 06 -- Add +--------+ 349 | 0d: 11 -- Ret 350 | ``` 351 | 352 | * The function can then push any values it needs onto the stack and do its work. 353 | * Here the function loads the two arguments ***LdArg 1*** and ***LdArg 2*** (i.e. get params to top of stack) and then calls ***Add*** to add them 354 | * In this VM a function always returns a single value on the stack 355 | 356 | ```text 357 | 00: 0322 -- Push 22 +--------+ ip = 0c 358 | 02: 037b -- Push 123 | 22 | fp = 2 359 | 04: 1003 -- Call 09 | 123 | 360 | 06: 0502 -- PopPrev 2 | -1 | 361 | 08: 02 -- Halt | 6 | <-- fp 362 | 09: 1202 -- LdArd 2 | 145 | 363 | 0b: 1201 -- LdArg 1 +--------+ 364 | __ 0c: 06 -- Add 365 | 0d: 11 -- Ret 366 | ``` 367 | 368 | * Then when a ***ret*** operation is executed the CPU needs to do the reverse. 369 | * The return value is popped, and the stack shifted back to the fp 370 | * Notice that the original parameters are still on the stack. This is normal for the cdecl calling convention. The caller is responsible for cleaning up. 371 | 372 | ```text 373 | 00: 0322 -- Push 22 +--------+ ip = 06 374 | 02: 037b -- Push 123 | 22 | fp = -1 375 | 04: 1003 -- Call 09 | 123 | 376 | 06: 0502 -- PopPrev 2 | 145 | 377 | 08: 02 -- Halt +--------+ 378 | 09: 1202 -- LdArd 2 379 | 0b: 1201 -- LdArg 1 380 | 0c: 06 -- Add 381 | __ 0d: 11 -- Ret 382 | ``` 383 | 384 | * The ***PopPrev*** operation is used to do the parameter clean-up by the caller. It pops the number of items specified before the item at the top of the stack 385 | 386 | ```text 387 | 00: 0322 -- Push 22 +--------+ ip = 08 388 | 02: 037b -- Push 123 | 145 | fp = -1 389 | 04: 1003 -- Call 09 +--------+ 390 | __ 06: 0502 -- PopPrev 2 391 | 08: 02 -- Halt 392 | 09: 1202 -- LdArd 2 393 | 0b: 1201 -- LdArg 1 394 | 0c: 06 -- Add 395 | 0d: 11 -- Ret 396 | ``` 397 | 398 | The important thing to notice is that since the old frame pointer is stored on the stack you are able to call multiple functions and always be able to return to the previous function. Also having the fp lets you unwind the stack to the frame no matter how many items the current function may have pushed onto the stack, i.e. you don't need to try and track that 399 | 400 | Here is the output for the code above 401 | 402 | ``` 403 | {ip = -1, fp = -1, cpuStack = [], ranOp = 0, state = "", panic = False} 404 | {ip = 1, fp = -1, cpuStack = [22], ranOp = 3, state = "Push", panic = False} 405 | {ip = 3, fp = -1, cpuStack = [123,22], ranOp = 3, state = "Push", panic = False} 406 | {ip = 8, fp = 2, cpuStack = [6,-1,123,22], ranOp = 16, state = "Call", panic = False} 407 | {ip = 10, fp = 2, cpuStack = [22,6,-1,123,22], ranOp = 18, state = "LdArg", panic = False} 408 | {ip = 12, fp = 2, cpuStack = [123,22,6,-1,123,22], ranOp = 18, state = "LdArg", panic = False} 409 | {ip = 13, fp = 2, cpuStack = [145,6,-1,123,22], ranOp = 6, state = "Add", panic = False} 410 | {ip = 5, fp = -1, cpuStack = [145,123,22], ranOp = 17, state = "Ret", panic = False} 411 | {ip = 7, fp = -1, cpuStack = [145], ranOp = 5, state = "PopPrev", panic = False} 412 | {ip = 8, fp = -1, cpuStack = [145], ranOp = 2, state = "Halt", panic = True} 413 | ``` 414 | 415 | ##### The interpreter code 416 | 417 | ```haskell 418 | -- | Interpreter for the byte code 419 | -- | Given a byte code stream will run the code 420 | interpretByteCode :: S.Seq Int -> [Cpu] 421 | ``` 422 | 423 | * The interpreter converts a sequence of byte codes into a list of CPUs. Final CPU is the final state 424 | 425 | 426 | ```haskell 427 | interpret [emptyCpu] byteCode 428 | where 429 | interpret :: [Cpu] -> S.Seq Int -> [Cpu] 430 | 431 | -- Ensure that this function is not called with an empty CPU list 432 | interpret [] _ = [emptyCpu { state = "INVALID: no start CPU" }] 433 | 434 | -- Start interpreting 435 | interpret cpus@(cpu:_) code = 436 | -- Move to next op code 437 | let atIp = ip cpu + 1 in 438 | 439 | -- Try get the code byte at index atIp 440 | case indexMay code atIp of 441 | Nothing -> 442 | -- No byte at expected index, return error 443 | cpu { ip = atIp, ranOp = 0, state = "INVALID: invalid ip index, reading past end of byte stream", panic = True } : cpus 444 | 445 | Just opByte -> 446 | -- Ensure this is a valid opcode 447 | 448 | case toEnumMay opByte :: Maybe Operation of 449 | Nothing -> 450 | -- This is not a valid opcode, error 451 | cpu { ip = atIp, ranOp = opByte, state = "INVALID: Unknown op", panic = True } : cpus 452 | 453 | Just op -> 454 | -- Get the instruction for the op code 455 | case (op, Map.lookup op instrByOp) of 456 | (_, Nothing) -> 457 | -- The byte was an opcode enum but was not configured as an instruction 458 | cpu { ip = atIp, ranOp = opByte, state = "INVALID: Op not found", panic = True } : cpus 459 | 460 | ``` 461 | * Start interpreting with an empty CPU 462 | * The code does the following checks, to get a valid instruction for the byte 463 | 1. Check that there is a start CPU 464 | 1. Check that there is an opcode in the sequence at the ip index 465 | 1. Check that the opcode is valid, i.e. belongs to the ***Operation*** enum 466 | 1. Check that the opcode is in the instruction map ***instrByOp*** 467 | 468 | 469 | ```haskell 470 | (_, Just instr) -> 471 | -- 'params' are the bytes from the code stream that are used as parameters for the op, e.g. to be pushed onto stack 472 | -- get the params from the byte stream into a list 473 | let paramsCount = opParamCount instr in 474 | let params = S.take paramsCount $ S.drop (atIp + 1) code in 475 | 476 | -- 'Pops' are the bytes popped from the stack and used by the current instruction 477 | -- get the values from the stack into the list 478 | let popsCount = opPopsCount instr in 479 | let (pops, remainingStack) = S.splitAt popsCount $ cpuStack cpu in 480 | 481 | if S.length params > paramsCount 482 | then 483 | cpu { ip = atIp, ranOp = opByte, state = "Code underflow", panic = True } : cpus 484 | else 485 | if length pops < popsCount 486 | then cpu { ip = atIp, ranOp = opByte, state = "Stack underflow", panic = True } : cpus 487 | ``` 488 | * Get the parameter bytes from the code stream. E.g. Push takes a single param, the value to be pushed onto the stack 489 | * Pop the required number of values from the stack. 490 | * Check that both operations succeed 491 | 492 | 493 | ```haskell 494 | else 495 | -- Interpret the opcode using the simple/complex interpreter as indicated by the instruction 496 | let next = if opSimple instr then 497 | let res = interpretSimple emptyCpu { ip = atIp + paramsCount, state = show op } op (toList pops) params in 498 | cpu { ranOp = opByte 499 | , cpuStack = cpuStack res <> remainingStack 500 | , ip = ip res 501 | , panic = panic res 502 | , state = state res 503 | } 504 | else 505 | interpretComplex cpu { ip = atIp + paramsCount, state = show op, ranOp = opByte } op (toList pops) params remainingStack 506 | 507 | in if panic next 508 | then 509 | -- In panic state, add current CPU and stop executing 510 | next : cpus 511 | else 512 | -- Everything fine, add CPU and interpret nex byte 513 | interpret (next : cpus) code 514 | 515 | ``` 516 | 517 | Finally the core interpreter code can be called. Since the params, pops are now stored as lists and all checks performed this code is quite simple. 518 | 519 | * Remember that in this VM there are two types of ***Instructions***; simple and complex. Simple instructions are fully defined by the ***Instruction***. Complex instructions have full control over the CPU 520 | * Simple instructions are given an empty CPU and return a CPU with the values that need to be changed. For example simple instructions can not pop extra values or change the fp 521 | * Complex instructions are not fully defined by the ***Instruction*** and can change the CPU in any way they need to. 522 | 523 | 524 | ##### Simple instructions 525 | 526 | ```haskell 527 | -- | Simple instructions, can not directly change CPU state, e.g. cant set ip/fp and they just return data to be added to stack 528 | interpretSimple :: Cpu -> Operation -> [Int] -> S.Seq Int -> Cpu 529 | interpretSimple cpu op popped params = 530 | case op of 531 | Nop -> cpu 532 | ``` 533 | 534 | * Nop just returns the current CPU 535 | 536 | ```haskell 537 | Push -> cpu { cpuStack = params } 538 | ``` 539 | 540 | * Push sets the stack (appended with current stack in the interpreter code above) 541 | 542 | ```haskell 543 | Pop -> cpu 544 | ``` 545 | 546 | * Pop also just returns the CPU. This is because as a "simple" instruction the fact that it pops a single value is configured in the ***Instruction*** and the interpreter will do that. 547 | 548 | 549 | ```haskell 550 | Bne -> branchIf cpu popped params (/=) 551 | Beq -> branchIf cpu popped params (==) 552 | Bgt -> branchIf cpu popped params (>) 553 | Bgte -> branchIf cpu popped params (>=) 554 | Blt -> branchIf cpu popped params (<) 555 | Blte -> branchIf cpu popped params (<=) 556 | ``` 557 | 558 | * The branch instructions call the ***branchIf*** function with the appropriate conditional operator 559 | 560 | 561 | ```haskell 562 | -- | Perform a branch instruction if the predicate for the op (beq, bne, bgte etc) is true 563 | branchIf :: Cpu -> [Int] -> S.Seq Int -> (Int -> Int -> Bool) -> Cpu 564 | branchIf cpu popped params prd = 565 | case head params of 566 | Nothing -> cpu { panic = True, state = "missing param" } 567 | Just jmp -> 568 | case popped of 569 | [b, a] -> cpu { ip = if prd a b then ip cpu + jmp else ip cpu } 570 | _ -> cpu { panic = True, state = "invalid stack" } 571 | ``` 572 | 573 | * The branch checks the predicate (prd) and if it is True the ip is updated to the ip specified by the branch. 574 | 575 | 576 | ##### Complex Instructions 577 | 578 | ```haskell 579 | -- | Complex instructions have full access to the CPU and must ensure the CPU is correctly setup for the next instruction 580 | interpretComplex :: Cpu -> Operation -> [Int] -> S.Seq Int -> S.Seq Int -> Cpu 581 | interpretComplex cpu op popped params remainingStack = 582 | case op of 583 | ``` 584 | 585 | ```haskell 586 | Call -> 587 | -- Set up a stack frame and jump to the call location. See the comments above for details of the stack frame 588 | singleVal cpu (toList params) (\v -> cpu { ip = v - 1 -- 1 byte before method, next loop increments to function start byte 589 | , fp = S.length remainingStack -- frame pointer points to start of new stack frame 590 | , cpuStack = S.fromList [ip cpu + 1, fp cpu] <> remainingStack -- add return ip and current fp to stack 591 | }) "" 592 | ``` 593 | 594 | * Call creates a new CPU, setting the ip, fp and stack frame as discussed above in the section about Call & Ret 595 | 596 | ```haskell 597 | Ret -> 598 | -- Return to the calling code, get the previous frame pointer (fp) and return address from the current stack frame. 599 | -- A ret always 'returns' a single value, the last item on the stack 600 | let stackAtFp = S.drop (S.length remainingStack - fp cpu - 2) remainingStack in 601 | let (retParams, retStack) = S.splitAt 2 stackAtFp in 602 | if S.length retParams == 2 then 603 | let [retIp, retFp] = toList retParams in 604 | cpu { cpuStack = S.fromList popped <> retStack 605 | , ip = retIp - 1 606 | , fp = retFp 607 | } 608 | else 609 | cpu { panic = True, state = "Stack underflow getting return frame" } 610 | ``` 611 | 612 | * Ret gets the old fp and the return address from the stack frame and constructs the new CPU to return to the caller. 613 | 614 | ## What's next? 615 | 616 | This explanation of the stack machine is significantly longer than the code for it :). Hopefully you'll be able to see how easy creating a simple stack machine is. There are many things than can be built on top of this. E.g. 617 | * A higher level assembler with support for labels, i.e. don’t make the user count offsets 618 | * A simple higher level language that generates this new assembler code. E.g. stack/concatenative language or a functional language 619 | * Many more operators 620 | * Code optimisation 621 | * Support for globals 622 | * IO 623 | * ..... 624 | 625 | 626 | 627 | ## Links 628 | 1. https://en.wikipedia.org/wiki/Stack_machine 629 | 2. https://github.com/sdiehl/protolude 630 | 631 | ## See also 632 | - https://www.youtube.com/watch?v=OjaAToVkoTw 633 | 634 | -------------------------------------------------------------------------------- /Setup.hs: -------------------------------------------------------------------------------- 1 | import Distribution.Simple 2 | main = defaultMain 3 | -------------------------------------------------------------------------------- /app/Main.hs: -------------------------------------------------------------------------------- 1 | {-# LANGUAGE NoImplicitPrelude #-} 2 | {-# LANGUAGE OverloadedStrings #-} 3 | 4 | module Main where 5 | 6 | import Protolude 7 | import qualified Data.Text as Txt 8 | import qualified Data.Sequence as S 9 | import VirtualMachine 10 | 11 | main :: IO () 12 | main = do 13 | {- 14 | let assembly = [ (Nop, []) 15 | , (Push, [7]) 16 | , (Push, [15]) 17 | , (Push, [41]) 18 | , (Nop, []) 19 | , (Pop, []) 20 | , (Add, []) 21 | , (Pop, []) 22 | , (Jmp, [1]) 23 | , (Halt, []) 24 | , (Nop, []) 25 | , (Push, [1]) 26 | , (Dup, []) 27 | , (Push, [5]) 28 | , (Bgt, [3] ) 29 | , (Inc, []) 30 | , (Jmp, [-8]) 31 | , (Nop, []) 32 | , (Nop, []) 33 | ] 34 | let assembly = [(Push, [11]), (Call, [6]), (Nop, []), (Halt, []), (Break, []), (Push, [99]), (Ret, []), (Push, [1])] 35 | -} 36 | 37 | let assembly = [ (Push, [22]) 38 | , (Push, [123]) 39 | , (Call, [9]) 40 | , (PopPrev, [2]) 41 | , (Halt, []) 42 | , (LdArg, [2]) 43 | , (LdArg, [1]) 44 | , (Add, []) 45 | , (Ret, []) 46 | ] 47 | 48 | let assembled = assembleByteCode assembly 49 | let (lines, cpus) = case assembled of 50 | Left err -> 51 | (Txt.intercalate "\n" $ map show err, []) 52 | Right bytes -> 53 | let c = reverse $ interpretByteCode $ S.fromList bytes in 54 | (showT bytes, c) 55 | 56 | putText lines 57 | putText "------------------------" 58 | putText $ Txt.intercalate "\n" $ map show cpus 59 | 60 | 61 | where 62 | showT :: (Show o) => o -> Text 63 | showT = show 64 | -------------------------------------------------------------------------------- /br.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | stack build && .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/DemoVirtualMachine-exe/DemoVirtualMachine-exe 3 | -------------------------------------------------------------------------------- /run.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | .stack-work/dist/x86_64-linux/Cabal-1.22.5.0/build/DemoVirtualMachine-exe/DemoVirtualMachine-exe 3 | -------------------------------------------------------------------------------- /src/Lib.hs: -------------------------------------------------------------------------------- 1 | {-# LANGUAGE NoImplicitPrelude #-} 2 | {-# LANGUAGE OverloadedStrings #-} 3 | {-# LANGUAGE ViewPatterns #-} 4 | 5 | module Lib where 6 | 7 | import Protolude 8 | import qualified Data.Sequence as S 9 | 10 | -- | from https://stackoverflow.com/questions/2743858/safe-and-polymorphic-toenum/2744712#2744712 11 | enumIfBetween :: (Enum a) => a -> a -> Int -> Maybe a 12 | enumIfBetween a z x = let a' = fromEnum a 13 | z' = fromEnum z 14 | in if a' <= x && x <= z' 15 | then Just $ toEnum x 16 | else Nothing 17 | 18 | -- | A safe from enum to avoid, returns a Maybe rather than having to worry about ⊥ 19 | toEnumMay :: (Bounded a, Enum a) => Int -> Maybe a 20 | toEnumMay i = enumIfBetween minBound maxBound i 21 | 22 | -- | Safe version of Data.Seq.index, avoiding ⊥ 23 | indexMay :: S.Seq a -> Int -> Maybe a 24 | indexMay s idx 25 | | idx < 0 = Nothing 26 | | idx >= S.length s = Nothing 27 | | otherwise = Just $ S.index s idx 28 | 29 | 30 | -------------------------------------------------------------------------------- /src/VirtualMachine.hs: -------------------------------------------------------------------------------- 1 | {-# LANGUAGE NoImplicitPrelude #-} 2 | {-# LANGUAGE OverloadedStrings #-} 3 | {-# LANGUAGE ViewPatterns #-} 4 | 5 | module VirtualMachine (Operation(..), Cpu(..), assembleByteCode, interpretByteCode) where 6 | 7 | import Protolude 8 | import qualified Data.Sequence as S 9 | --import Data.Sequence((<|), (|>), (><), ViewL(..), ViewR(..)) 10 | import qualified Data.Map as Map 11 | import Lib 12 | 13 | -- | The op codes 14 | data Operation = Nop -- No operation 15 | | Break -- Break for debugger - not implemented 16 | | Halt -- Stop CPU execution 17 | | Push -- Push a value onto the stack 18 | | Pop -- Pop the most recent value from the stack 19 | | PopPrev -- Pop n values before the most recent value from the stack 20 | | Add -- Add the top two items on the stack 21 | | Inc -- Increment the top item on the stack 22 | | Dup -- Duplicate the most recent item on the stack 23 | | Jmp -- Jump unconditionally to a location 24 | | Bne -- Pop the top two items, compare and branch if the values are not equal 25 | | Beq -- Pop the top two items, compare and branch if the values are equal 26 | | Bgt -- Pop the top two items, compare and branch if value 1 is greater than value 2 27 | | Bgte -- Pop the top two items, compare and branch if value 1 is greater or equal to value 2 28 | | Blt -- Pop the top two items, compare and branch if value 1 is less than value 2 29 | | Blte -- Pop the top two items, compare and branch if value 1 is less than or equal to value 2 30 | | Call -- Call a function 31 | | Ret -- Return from a function 32 | | LdArg -- Push local value n onto the stack 33 | deriving (Show, Eq, Ord, Enum, Bounded) 34 | 35 | {-| Representation of the current CPU state 36 | ip = instruction pointer 37 | fp = frame pointer, points to the start of the stack frame 38 | cpuStack = the stack 39 | cpuGlobals = global variables 40 | runOp = op that was executed 41 | state = debug text 42 | debug = enable debugging features 43 | panic = interpreter will stop when this is set to true 44 | 45 | 46 | This is how the current stack frame is setup 47 | 48 | | +-----------------+ 49 | | | existing stack | 50 | | +-----------------+ 51 | | | arg 2 | 52 | | | arg 1 | 53 | | +-----------------+ <-------- fp 54 | | | old fp | 55 | | | return address | 56 | | +-----------------+ 57 | | | method's stack | 58 | | +-----------------+ 59 | V 60 | stack 61 | growth 62 | 63 | The fp points to the start of a stack frame. When a return instruction is executed the following takes place 64 | 1) Pop the return value (there is always one return value) 65 | 2) Rewind the stack to the current value of the fp + 2 66 | 3) pop the return address and set the ip 67 | 4) pop the old fp value and set the current fp 68 | 5) caller pops the returrn value 69 | 6) caller pops paremeters off stack (cdecl convention) 70 | 71 | Note that return value is after the args passed to the function, i.e. steps 5 and 6 above are logically backwards. 72 | The PopPrev op can be used to make this simpler as then steps 5 and 6 can be swapped, i.e. first PopPrev {n}, then pop {return value} 73 | -} 74 | data Cpu = Cpu { ip :: Int -- Instruction pointer 75 | , fp :: Int -- Frame pointer 76 | , cpuStack :: S.Seq Int -- The stack 77 | , cpuGlobals :: S.Seq Int -- Gloal variables 78 | , ranOp :: Int -- The last opcode that was executed 79 | , state :: Text -- Debugging message 80 | , debug :: Bool -- Enable/disable debugging 81 | , panic :: Bool -- Is the CPU in a faulted state 82 | } 83 | deriving (Show, Eq) 84 | 85 | -- | Default empty/initial CPU state 86 | emptyCpu :: Cpu 87 | emptyCpu = Cpu { ip = -1, fp = -1, cpuStack = S.empty, cpuGlobals = S.empty, state = "", debug = True, ranOp = 0, panic = False } 88 | 89 | -- | Configuration for an operation 90 | -- | opParamCount = number of paramaters taken from the code stream 91 | -- | opPopsCount = number of values this op pops from the stack 92 | -- | opSimple = determines if the op needs full access to cpu state to change things like the fp and ip 93 | -- | note that 'complex' instructions do not need to honour opParamCount and opPopsCount 94 | -- | e.g. a 'ret' instruction pops a variable number of parameters 95 | data Instruction = Instruction { opCode :: Operation 96 | , opPopsCount :: Int 97 | , opParamCount :: Int 98 | , opSimple :: Bool 99 | } 100 | deriving (Show, Eq) 101 | 102 | -- | Config for the op codes 103 | instructions :: [Instruction] 104 | instructions = [ Instruction { opCode = Nop, opParamCount = 0, opPopsCount = 0, opSimple = True } 105 | , Instruction { opCode = Break, opParamCount = 0, opPopsCount = 0, opSimple = True } 106 | , Instruction { opCode = Halt, opParamCount = 0, opPopsCount = 0, opSimple = True } 107 | , Instruction { opCode = Push, opParamCount = 1, opPopsCount = 0, opSimple = True } 108 | , Instruction { opCode = Pop, opParamCount = 0, opPopsCount = 1, opSimple = True } 109 | , Instruction { opCode = PopPrev, opParamCount = 1, opPopsCount = 0, opSimple = False } 110 | , Instruction { opCode = Add, opParamCount = 0, opPopsCount = 2, opSimple = True } 111 | , Instruction { opCode = Inc, opParamCount = 0, opPopsCount = 1, opSimple = True } 112 | , Instruction { opCode = Dup, opParamCount = 0, opPopsCount = 1, opSimple = True } 113 | , Instruction { opCode = Jmp, opParamCount = 1, opPopsCount = 0, opSimple = True } 114 | , Instruction { opCode = Bne, opParamCount = 1, opPopsCount = 2, opSimple = True } 115 | , Instruction { opCode = Beq, opParamCount = 1, opPopsCount = 2, opSimple = True } 116 | , Instruction { opCode = Bgt, opParamCount = 1, opPopsCount = 2, opSimple = True } 117 | , Instruction { opCode = Bgte, opParamCount = 1, opPopsCount = 2, opSimple = True } 118 | , Instruction { opCode = Blt, opParamCount = 1, opPopsCount = 2, opSimple = True } 119 | , Instruction { opCode = Blte, opParamCount = 1, opPopsCount = 2, opSimple = True } 120 | , Instruction { opCode = Call, opParamCount = 1, opPopsCount = 0, opSimple = False } 121 | , Instruction { opCode = Ret, opParamCount = 0, opPopsCount = 1, opSimple = False } 122 | , Instruction { opCode = LdArg, opParamCount = 1, opPopsCount = 0, opSimple = False } 123 | ] 124 | 125 | -- | Instructions indexed by opcode 126 | instrByOp :: Map.Map Operation Instruction 127 | instrByOp = Map.fromList $ map (\i -> (opCode i, i)) instructions 128 | 129 | -- | A single assembler error 130 | data AssemblerError = AssemblerError Integer Operation Text deriving (Show, Eq) 131 | 132 | -- | A single CPU operator and its parameters 133 | type OpAndParam = (Operation, [Int]) 134 | 135 | 136 | -- | Compiles the list to byte code 137 | -- | Returns as many errors as possible rather than just first error 138 | assembleByteCode :: [OpAndParam] -> Either [AssemblerError] [Int] 139 | assembleByteCode code = 140 | let res = foldl assemble [] code in 141 | case lefts res of 142 | [] -> Right $ concat $ rights res 143 | errors -> Left errors 144 | 145 | where 146 | assemble :: [Either AssemblerError [Int]] -> OpAndParam -> [Either AssemblerError [Int]] 147 | assemble res (op, prms) = 148 | res ++ case Map.lookup op instrByOp of 149 | Nothing -> [Left $ AssemblerError (toInteger $ length res) op "unknown op code"] 150 | Just i -> 151 | if opParamCount i == length prms 152 | then [Right $ fromEnum (opCode i) : prms] 153 | else [Left $ AssemblerError (toInteger $ length res) op "incorrect number of parameters"] 154 | 155 | 156 | -- | Interpreter for the byte code 157 | -- | Given a byte code stream will run the code 158 | -- | If debug is enabled then the full history (all states) will be returned. TODO currently always enabled 159 | interpretByteCode :: S.Seq Int -> [Cpu] 160 | interpretByteCode byteCode = 161 | interpret [emptyCpu] byteCode 162 | where 163 | interpret :: [Cpu] -> S.Seq Int -> [Cpu] 164 | 165 | -- Ensure that this function is not called with an empty CPU list 166 | interpret [] _ = [emptyCpu { state = "INVALID: no start CPU" }] 167 | 168 | -- Start interpreting 169 | interpret cpus@(cpu:_) code = 170 | -- Move to next op code 171 | let atIp = ip cpu + 1 in 172 | 173 | -- Try get the code byte at index atIp 174 | case indexMay code atIp of 175 | Nothing -> 176 | -- No byte at expected index, return error 177 | cpu { ip = atIp, ranOp = 0, state = "INVALID: invalid ip index, reading past end of byte stream", panic = True } : cpus 178 | 179 | Just opByte -> 180 | -- Ensure this is a valid opcode 181 | 182 | case toEnumMay opByte :: Maybe Operation of 183 | Nothing -> 184 | -- This is not a valid opcode, error 185 | cpu { ip = atIp, ranOp = opByte, state = "INVALID: Unknown op", panic = True } : cpus 186 | 187 | Just op -> 188 | -- Get the instruction for the op code 189 | case (op, Map.lookup op instrByOp) of 190 | (_, Nothing) -> 191 | -- The byte was an opcode enum but was not configured as an instruction 192 | cpu { ip = atIp, ranOp = opByte, state = "INVALID: Op not found", panic = True } : cpus 193 | 194 | (_, Just instr) -> 195 | -- 'params' are the bytes from the code stream that are used as parameters for the op, e.g. to be pushed onto stack 196 | -- get the params from the byte stream into a list 197 | let paramsCount = opParamCount instr in 198 | let params = S.take paramsCount $ S.drop (atIp + 1) code in 199 | 200 | -- 'Pops' are the bytes popped from the stack and used by the current instruction 201 | -- get the values from the stack into the list 202 | let popsCount = opPopsCount instr in 203 | let (pops, remainingStack) = S.splitAt popsCount $ cpuStack cpu in 204 | 205 | if S.length params > paramsCount 206 | then 207 | cpu { ip = atIp, ranOp = opByte, state = "Code underflow", panic = True } : cpus 208 | else 209 | if length pops < popsCount 210 | then cpu { ip = atIp, ranOp = opByte, state = "Stack underflow", panic = True } : cpus 211 | else 212 | -- Interpret the opcode using the simple/complex interpreter as indicated by the instruction 213 | let next = if opSimple instr then 214 | let res = interpretSimple emptyCpu { ip = atIp + paramsCount, state = show op } op (toList pops) params in 215 | cpu { ranOp = opByte 216 | , cpuStack = cpuStack res <> remainingStack 217 | , ip = ip res 218 | , panic = panic res 219 | , state = state res 220 | } 221 | else 222 | interpretComplex cpu { ip = atIp + paramsCount, state = show op, ranOp = opByte } op (toList pops) params remainingStack 223 | 224 | in if panic next 225 | then 226 | -- In panic state, add current CPU and stop executing 227 | next : cpus 228 | else 229 | -- Everything fine, add CPU and interpret nex byte 230 | interpret (next : cpus) code 231 | 232 | -- | Simple instructions, can not directly change CPU state, e.g. cant set ip/fp and they just return data to be added to stack 233 | interpretSimple :: Cpu -> Operation -> [Int] -> S.Seq Int -> Cpu 234 | interpretSimple cpu op popped params = 235 | case op of 236 | Nop -> cpu 237 | Break -> cpu --TODO not implemented 238 | Halt -> cpu { panic = True } 239 | Push -> cpu { cpuStack = params } 240 | Pop -> cpu 241 | Bne -> branchIf cpu popped params (/=) 242 | Beq -> branchIf cpu popped params (==) 243 | Bgt -> branchIf cpu popped params (>) 244 | Bgte -> branchIf cpu popped params (>=) 245 | Blt -> branchIf cpu popped params (<) 246 | Blte -> branchIf cpu popped params (<=) 247 | Inc -> singleVal cpu popped (\v -> cpu { cpuStack = S.singleton $ v + 1 }) "IncI stack underflow" 248 | Dup -> singleVal cpu popped (\v -> cpu { cpuStack = S.fromList [v, v] }) "Dup stack underflow" 249 | Jmp -> singleVal cpu (toList params) (\v -> cpu { ip = ip cpu + v }) "Jmp missing param" 250 | Add -> cpu { cpuStack = S.singleton $ sum popped } 251 | _ -> cpu { panic = True, state = "Unhandled simple op" } 252 | 253 | -- | Complex instructions have full access to the CPU and must ensure the CPU is correctly setup for the next instruction 254 | interpretComplex :: Cpu -> Operation -> [Int] -> S.Seq Int -> S.Seq Int -> Cpu 255 | interpretComplex cpu op popped params remainingStack = 256 | case op of 257 | Call -> 258 | -- Set up a stack frame and jump to the call location. See the comments above for details of the stack frame 259 | singleVal cpu (toList params) (\v -> cpu { ip = v - 1 -- 1 byte before method, next loop increments to function start byte 260 | , fp = S.length remainingStack -- frame pointer points to start of new stack frame 261 | , cpuStack = S.fromList [ip cpu + 1, fp cpu] <> remainingStack -- add return ip and current fp to stack 262 | }) "" 263 | Ret -> 264 | -- Return to the calling code, get the previous frame pointer (fp) and return address from the current stack frame. 265 | -- A ret always 'returns' a single value, the last item on the stack 266 | let stackAtFp = S.drop (S.length remainingStack - fp cpu - 2) remainingStack in 267 | let (retParams, retStack) = S.splitAt 2 stackAtFp in 268 | if S.length retParams == 2 then 269 | let [retIp, retFp] = toList retParams in 270 | cpu { cpuStack = S.fromList popped <> retStack 271 | , ip = retIp - 1 272 | , fp = retFp 273 | } 274 | else 275 | cpu { panic = True, state = "Stack underflow getting return frame" } 276 | 277 | PopPrev -> 278 | -- Keep the top item on the stack but remove the previous n items 279 | case toList $ S.take 1 params of 280 | [nr] -> 281 | let (h,t) = S.splitAt (nr + 1) remainingStack in 282 | if S.length h == nr + 1 283 | then 284 | cpu { cpuStack = S.take 1 h <> t } 285 | else 286 | cpu { panic = True, state = "Stack underflow in pop prev" } 287 | [] -> cpu { panic = True, state = "Code underflow getting number of items to pop prev" } 288 | _ -> cpu { panic = True, state = "Invalid number of items to pop prev" } 289 | 290 | LdArg -> 291 | singleVal cpu (toList params) (\v -> 292 | case indexMay remainingStack (length remainingStack - fp cpu + v - 1) of 293 | Nothing -> cpu { panic = True, state = "Stack underflow reading local arg in Ldarg"} 294 | Just l -> cpu { cpuStack = S.singleton l <> remainingStack } 295 | ) "Ldarg code underflow" 296 | 297 | _ -> cpu { panic = True, state = "Unhandled complex op" } 298 | 299 | -- | Get a single value from the list, if there is not a value available, return an error 300 | singleVal :: Cpu -> [Int] -> (Int -> Cpu) -> Text -> Cpu 301 | singleVal cpu params onOne err = 302 | case head params of 303 | Nothing -> cpu { panic = True, state = err } 304 | Just v -> onOne v 305 | 306 | -- | Perform a branch instruction if the predicate for the op (beq, bne, bgte etc) is true 307 | branchIf :: Cpu -> [Int] -> S.Seq Int -> (Int -> Int -> Bool) -> Cpu 308 | branchIf cpu popped params prd = 309 | case head params of 310 | Nothing -> cpu { panic = True, state = "missing param" } 311 | Just jmp -> 312 | case popped of 313 | [b, a] -> cpu { ip = if prd a b then ip cpu + jmp else ip cpu } 314 | _ -> cpu { panic = True, state = "invalid stack" } 315 | -------------------------------------------------------------------------------- /stack.yaml: -------------------------------------------------------------------------------- 1 | # This file was automatically generated by stack init 2 | # For more information, see: http://docs.haskellstack.org/en/stable/yaml_configuration/ 3 | 4 | # Specifies the GHC version and set of packages available (e.g., lts-3.5, nightly-2015-09-21, ghc-7.10.2) 5 | resolver: lts-5.17 6 | 7 | # Local packages, usually specified by relative directory name 8 | packages: 9 | - '.' 10 | # Packages to be pulled from upstream that are not in the resolver (e.g., acme-missiles-0.3) 11 | extra-deps: ["string-conv-0.1.1", "protolude-0.1.5"] 12 | 13 | # Override default flag values for local packages and extra-deps 14 | flags: {} 15 | 16 | # Extra package databases containing global packages 17 | extra-package-dbs: [] 18 | 19 | # Control whether we use the GHC we find on the path 20 | # system-ghc: true 21 | 22 | # Require a specific version of stack, using version ranges 23 | # require-stack-version: -any # Default 24 | # require-stack-version: >= 1.0.0 25 | 26 | # Override the architecture used by stack, especially useful on Windows 27 | # arch: i386 28 | # arch: x86_64 29 | 30 | # Extra directories used by stack for building 31 | # extra-include-dirs: [/path/to/dir] 32 | # extra-lib-dirs: [/path/to/dir] 33 | 34 | # Allow a newer minor version of GHC than the snapshot specifies 35 | # compiler-check: newer-minor 36 | -------------------------------------------------------------------------------- /test/Spec.hs: -------------------------------------------------------------------------------- 1 | {-# LANGUAGE NoImplicitPrelude #-} 2 | {-# LANGUAGE OverloadedStrings #-} 3 | 4 | import Protolude 5 | import VirtualMachine 6 | import Test.Hspec 7 | import Test.QuickCheck 8 | import qualified Data.Sequence as S 9 | 10 | runAndGetStack :: [(Operation, [Int])] -> [Int] 11 | runAndGetStack assembly = 12 | case assembleByteCode assembly of 13 | Left _ -> 14 | [-1] 15 | Right bytes -> 16 | case interpretByteCode $ S.fromList bytes of 17 | [] -> [-1] 18 | (cpu:_) -> toList $ cpuStack cpu 19 | 20 | main :: IO () 21 | main = hspec $ do 22 | describe "check stack result" $ do 23 | it "nop does nothing" $ do 24 | let res = runAndGetStack [ (Nop, []) ] 25 | res `shouldBe` [] 26 | 27 | it "nops do nothing" $ do 28 | let res = runAndGetStack [ (Nop, []), (Nop, []), (Nop, []) ] 29 | res `shouldBe` [] 30 | 31 | it "pushi" $ do 32 | let res = runAndGetStack [ (Push, [123]) ] 33 | res `shouldBe` [123] 34 | 35 | it "popi" $ do 36 | let res = runAndGetStack [ (Push, [123]), (Pop, []) ] 37 | res `shouldBe` [] 38 | 39 | it "addi" $ do 40 | let res = runAndGetStack [ (Push, [123]), (Push, [415]), (Add, []) ] 41 | res `shouldBe` [538] 42 | 43 | describe "stack result property tests" $ do 44 | it "pushi" $ property $ 45 | \x -> runAndGetStack [ (Push, [x])] == [(x :: Int)] 46 | 47 | it "pushi multiple" $ property $ 48 | \x -> (runAndGetStack $ map (\y -> (Push, [y])) x) == reverse (x :: [Int]) 49 | 50 | it "addi" $ property $ 51 | \x y -> runAndGetStack [ (Push, [x]), (Push, [y]), (Add, []) ] == [(x :: Int) + (y :: Int)] 52 | 53 | it "inci" $ property $ 54 | \x -> runAndGetStack [ (Push, [x]), (Inc, []) ] == [(x :: Int) + 1] 55 | 56 | it "dup" $ property $ 57 | \x -> runAndGetStack [ (Push, [x]), (Dup, []) ] == [(x :: Int), x] 58 | 59 | it "eq" $ property $ 60 | \x y -> runAndGetStack [ (Push, [x]), (Push, [y]), (Beq, [1]), (Push, [99]) ] == if (x :: Int) == (y :: Int) then [] else [99] 61 | 62 | it "ne" $ property $ 63 | \x y -> runAndGetStack [ (Push, [x]), (Push, [y]), (Bne, [1]), (Push, [99]) ] == if (x :: Int) /= (y :: Int) then [] else [99] 64 | 65 | it "gt" $ property $ 66 | \x y -> runAndGetStack [ (Push, [x]), (Push, [y]), (Bgt, [1]), (Push, [99]) ] == if (x :: Int) > (y :: Int) then [] else [99] 67 | 68 | it "gte" $ property $ 69 | \x y -> runAndGetStack [ (Push, [x]), (Push, [y]), (Bgte, [1]), (Push, [99]) ] == if (x :: Int) >= (y :: Int) then [] else [99] 70 | 71 | it "lt" $ property $ 72 | \x y -> runAndGetStack [ (Push, [x]), (Push, [y]), (Blt, [1]), (Push, [99]) ] == if (x :: Int) < (y :: Int) then [] else [99] 73 | 74 | it "lte" $ property $ 75 | \x y -> runAndGetStack [ (Push, [x]), (Push, [y]), (Blte, [1]), (Push, [99]) ] == if (x :: Int) <= (y :: Int) then [] else [99] 76 | 77 | it "count with jmp and bgt" $ property $ 78 | \x -> runAndGetStack 79 | [ (Push, [0]) 80 | , (Dup, []) 81 | , (Push, [abs x]) 82 | , (Bgt, [3] ) 83 | , (Inc, []) 84 | , (Jmp, [-8]) 85 | , (Nop, []) 86 | ] == [(abs (x :: Int)) + 1] 87 | 88 | it "add with call, ldard and ret" $ property $ 89 | \x y -> runAndGetStack 90 | [ (Push, [abs x]) 91 | , (Push, [abs y]) 92 | , (Call, [10]) 93 | , (Nop, []) 94 | , (PopPrev, [2]) 95 | , (Halt, []) 96 | , (LdArg, [1]) 97 | , (LdArg, [2]) 98 | , (Add, []) 99 | , (Ret, []) 100 | ] == [(abs (x :: Int)) + (abs (y :: Int))] 101 | --------------------------------------------------------------------------------