├── LICENSE ├── Makefile ├── README.md ├── examples ├── 01-helloworld.f ├── 02-quine.f ├── 03-variables.f ├── 04-debugstack.f ├── 05-fizzbuzz.f └── README.md └── sectorforth.asm /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2020 Cesar Blum 2 | 3 | Permission is hereby granted, free of charge, to any person obtaining a copy 4 | of this software and associated documentation files (the "Software"), to deal 5 | in the Software without restriction, including without limitation the rights 6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 7 | copies of the Software, and to permit persons to whom the Software is 8 | furnished to do so, subject to the following conditions: 9 | 10 | The above copyright notice and this permission notice shall be included in all 11 | copies or substantial portions of the Software. 12 | 13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 19 | SOFTWARE. 20 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | name = sectorforth 2 | 3 | all: $(name).bin $(name).img 4 | 5 | %.bin: %.asm 6 | nasm -f bin -o $@ -l $(^:.asm=.lst) $^ 7 | 8 | %.img: %.bin 9 | dd if=$^ of=boot.img bs=512 10 | dd if=/dev/zero of=zero.img bs=512 count=2879 11 | cat boot.img zero.img > $@ 12 | rm -f boot.img zero.img 13 | 14 | .PHONY: debug 15 | gdb: $(name).bin 16 | qemu-system-i386 -hda $^ -monitor stdio -s -S 17 | 18 | .PHONY: run 19 | run: $(name).bin 20 | qemu-system-i386 -hda $^ 21 | 22 | .PHONY: clean 23 | clean: 24 | rm -rf *.{bin,lst,img} 25 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # sectorforth 2 | 3 | sectorforth is a 16-bit x86 Forth that fits in a 512-byte boot sector. 4 | 5 | Inspiration to write sectorforth came from a 1996 6 | [Usenet thread](https://groups.google.com/g/comp.lang.forth/c/NS2icrCj1jQ) 7 | (in particular, Bernd Paysan's first post on the thread). 8 | 9 | ## Batteries not included 10 | 11 | sectorforth contains only the eight primitives outlined in the Usenet 12 | post above, five variables for manipulating internal state, and two I/O 13 | primitives. 14 | 15 | With that minimal set of building blocks, words for branching, compiling, 16 | manipulating the return stack, etc. can all be written in Forth itself 17 | (check out the examples!). 18 | 19 | The colon compiler (`:`) is available, so new words can be defined easily 20 | (that means `;` is also there, of course). 21 | 22 | Contrary to many Forth implementations, sectorforth does not attempt to 23 | convert unknown words to numbers, since numbers can be produced using the 24 | available primitives. The two included I/O primitives are sufficient to 25 | write a more powerful interpreter that can parse numbers. 26 | 27 | ### Primitives 28 | 29 | | Primitive | Stack effects | Description | 30 | | --------- | ------------- | --------------------------------------------- | 31 | | `@` | ( addr -- x ) | Fetch memory contents at addr | 32 | | `!` | ( x addr -- ) | Store x at addr | 33 | | `sp@` | ( -- sp ) | Get pointer to top of data stack | 34 | | `rp@` | ( -- rp ) | Get pointer to top of return stack | 35 | | `0=` | ( x -- flag ) | -1 if top of stack is 0, 0 otherwise | 36 | | `+` | ( x y -- z ) | Sum the two numbers at the top of the stack | 37 | | `nand` | ( x y -- z ) | NAND the two numbers at the top of the stack | 38 | | `exit` | ( r:addr -- ) | Pop return stack and resume execution at addr | 39 | | `key` | ( -- x ) | Read key stroke as ASCII character | 40 | | `emit` | ( x -- ) | Print low byte of x as an ASCII character | 41 | 42 | ### Variables 43 | 44 | | Variable | Description | 45 | | -------- | ------------------------------------------------------------- | 46 | | `state` | 0: execute words; 1: compile word addresses to the dictionary | 47 | | `tib` | Terminal input buffer, where input is parsed from | 48 | | `>in` | Current parsing offset into terminal input buffer | 49 | | `here` | Pointer to next free position in the dictionary | 50 | | `latest` | Pointer to most recent dictionary entry | 51 | 52 | ## Compiling 53 | 54 | sectorforth was developed using NASM 2.15.01. Earlier versions of NASM 55 | are probably capable of compiling it, but that hasn't been tested. 56 | 57 | To compile sectorforth, just run `make`: 58 | 59 | ``` 60 | $ make 61 | ``` 62 | 63 | That will produce a compiled binary (`sectorforth.bin`) and a floppy disk 64 | image (`sectorforth.img`) containing the binary in its boot sector. 65 | 66 | ## Running 67 | 68 | The makefile contains two targets for running sectorforth in QEMU: 69 | 70 | - `debug` starts QEMU in debug mode, with execution paused. That allows 71 | you to set up a remote target in GDB (`target remote localhost:1234`) and 72 | set any breakpoints you want before sectorforth starts running. 73 | - `run` simply runs sectorforth in QEMU. 74 | 75 | ## Usage 76 | 77 | Up to 4KB of input can be entered per line. After pressing return, the 78 | interpreter parses one word at a time an interprets it (i.e. executes it 79 | or compiles it, according to the current value of the `state` variable). 80 | 81 | sectorforth does not print the ` ok` prompt familiar to Forth users. 82 | However, if a word is not found in the dictionary, the error message `!!` 83 | is printed in red, letting you know an error happened. 84 | 85 | When a word is not found in the dictionary, the interpreter's state is 86 | reset: the data and return stacks, as well as the terminal input buffer 87 | are cleared, and the interpreter is placed in interpretation mode. Other 88 | errors (e.g. compiling an invalid address in a word definition and 89 | attempting to execute it) are not handled gracefully, and will crash the 90 | interpreter. 91 | 92 | ## Code structure 93 | 94 | Comments throughout the code assume familiarity with Forth and how it is 95 | commonly implemented. 96 | 97 | If you're not familiar with Forth, read Leo Brodie's 98 | [Starting Forth](https://www.forth.com/starting-forth). 99 | 100 | If you're not familiar with how Forth is implemented on x86, read the 101 | assembly code for Richard W.M. Jones' 102 | [jonesforth](http://git.annexia.org/?p=jonesforth.git;a=blob;f=jonesforth.S). 103 | 104 | sectorforth draws a lot of inspiration from jonesforth, but the latter 105 | does a much better job at explaining the basics in its comments. 106 | 107 | For an excellent introduction to threaded code techniques, and to how to 108 | implement Forth in different architectures, read Brad Rodriguez's 109 | [Moving Forth](http://www.bradrodriguez.com/papers/moving1.htm). 110 | -------------------------------------------------------------------------------- /examples/01-helloworld.f: -------------------------------------------------------------------------------- 1 | \ "hello, world" example for sectorforth, a 512-byte, bootable x86 Forth. 2 | \ Copyright (c) 2020 Cesar Blum 3 | \ Distributed under the MIT license. See LICENSE for details. 4 | 5 | : dup ( x -- x x ) sp@ @ ; 6 | 7 | \ make some numbers 8 | : -1 ( x -- x -1 ) dup dup nand dup dup nand nand ; 9 | : 0 -1 dup nand ; 10 | : 1 -1 dup + dup nand ; 11 | : 2 1 1 + ; 12 | : 4 2 2 + ; 13 | : 6 2 4 + ; 14 | 15 | \ logic and arithmetic operators 16 | : invert ( x -- !x ) dup nand ; 17 | : and ( x y -- x&y ) nand invert ; 18 | : negate ( x -- -x ) invert 1 + ; 19 | : - ( x y -- x-y ) negate + ; 20 | 21 | \ equality checks 22 | : = ( x y -- flag ) - 0= ; 23 | : <> ( x y -- flag ) = invert ; 24 | 25 | \ stack manipulation words 26 | : drop ( x y -- x ) dup - + ; 27 | : over ( x y -- x y x ) sp@ 2 + @ ; 28 | : swap ( x y -- y x ) over over sp@ 6 + ! sp@ 2 + ! ; 29 | : nip ( x y -- y ) swap drop ; 30 | : 2dup ( x y -- x y x y ) over over ; 31 | : 2drop ( x y -- ) drop drop ; 32 | 33 | \ more logic 34 | : or ( x y -- x|y ) invert swap invert and invert ; 35 | 36 | \ compile things 37 | : , ( x -- ) here @ ! here @ 2 + here ! ; 38 | 39 | \ left shift 1 bit 40 | : 2* ( x -- 2x ) dup + ; 41 | 42 | \ constant to check/set immediate flag 43 | : 80h ( -- 80h ) 1 2* 2* 2* 2* 2* 2* 2* ; 44 | 45 | \ make words immediate 46 | : immediate latest @ 2 + dup @ 80h or swap ! ; 47 | 48 | \ control interpreter state 49 | : [ 0 state ! ; immediate 50 | : ] 1 state ! ; 51 | 52 | \ unconditional branch 53 | : branch ( r:addr -- r:addr+offset ) rp@ @ dup @ + rp@ ! ; 54 | 55 | \ conditional branch when top of stack is 0 56 | : ?branch ( r:addr -- r:addr | r:addr+offset) 57 | 0= rp@ @ @ 2 - and rp@ @ + 2 + rp@ ! ; 58 | 59 | \ lit pushes the value on the next cell to the stack at runtime 60 | \ e.g. lit [ 42 , ] pushes 42 to the stack 61 | : lit ( -- x ) rp@ @ dup 2 + rp@ ! @ ; 62 | 63 | \ ['] is identical to lit, the choice of either depends on context 64 | \ don't write as : ['] lit ; as that will break lit's assumptions about 65 | \ the return stack 66 | : ['] ( -- addr ) rp@ @ dup 2 + rp@ ! @ ; 67 | 68 | \ push/pop return stack 69 | : >rexit ( addr r:addr0 -- r:addr ) 70 | rp@ ! ; \ override return address with original return 71 | \ address from >r 72 | : >r ( x -- r:x) 73 | rp@ @ \ get current return address 74 | swap rp@ ! \ replace top of return stack with value 75 | >rexit ; \ push new address to return stack 76 | : r> ( r:x -- x ) 77 | rp@ 2 + @ \ get value stored in return stack with >r 78 | rp@ @ rp@ 2 + ! \ replace value with address to return from r> 79 | lit [ here @ 6 + , ] \ get address to this word's exit call 80 | rp@ ! ; \ make code return to this word's exit call, 81 | \ effectively calling exit twice to pop return 82 | \ stack entry created by >r 83 | 84 | \ rotate stack 85 | : rot ( x y z -- y z x ) >r swap r> swap ; 86 | 87 | \ if/then/else 88 | : if 89 | ['] ?branch , \ compile ?branch to skip if's body when false 90 | here @ \ get address where offset will be written 91 | 0 , \ compile dummy offset 92 | ; immediate 93 | : then 94 | dup \ duplicate offset address 95 | here @ swap - \ calculate offset from if/else 96 | swap ! \ store calculated offset for ?branch/branch 97 | ; immediate 98 | : else 99 | ['] branch , \ compile branch to skip else's body when true 100 | here @ \ get address where offset will be written 101 | 0 , \ compile dummy offset 102 | swap \ bring if's ?branch offset address to top of stack 103 | dup here @ swap - \ calculate offset from if 104 | swap ! \ store calculated offset for ?branch 105 | ; immediate 106 | 107 | \ begin...while...repeat and begin...until loops 108 | : begin 109 | here @ \ get location to branch back to 110 | ; immediate 111 | : while 112 | ['] ?branch , \ compile ?branch to terminate loop when false 113 | here @ \ get address where offset will be written 114 | 0 , \ compile dummy offset 115 | ; immediate 116 | : repeat 117 | swap \ offset will be negative 118 | ['] branch , here @ - , \ compile branch back to begin 119 | dup here @ swap - swap ! \ compile offset from while 120 | ; immediate 121 | : until 122 | ['] ?branch , here @ - , \ compile ?branch back to begin 123 | ; immediate 124 | 125 | \ do...loop loops 126 | : do ( end index -- ) 127 | here @ \ get location to branch back to 128 | ['] >r , ['] >r , \ at runtime, push inputs to return stack 129 | ; immediate 130 | : loop 131 | ['] r> , ['] r> , \ move current index and end to data stack 132 | ['] lit , 1 , ['] + , \ increment index 133 | ['] 2dup , ['] = , \ index equals end? 134 | ['] ?branch , here @ - , \ when false, branch back to do 135 | ['] 2drop , \ discard index and end when loop terminates 136 | ; immediate 137 | 138 | \ fetch/store bytes 139 | : 0fh lit [ 4 4 4 4 + + + 1 - , ] ; 140 | : ffh lit [ 0fh 2* 2* 2* 2* 0fh or , ] ; 141 | : c@ ( -- c ) @ ffh and ; 142 | : c! ( c addr -- ) 143 | dup @ \ fetch memory contents at address 144 | ffh invert and \ zero out low byte 145 | rot ffh and \ zero out high byte of value being stored 146 | or swap ! \ overwrite low byte of existing contents 147 | ; 148 | 149 | \ compile bytes 150 | : c, ( x -- ) here @ c! here @ 1 + here ! ; 151 | 152 | \ read literal string from word body 153 | : litstring ( -- addr len ) 154 | rp@ @ dup 2 + rp@ ! @ \ push length to stack 155 | rp@ @ \ push string address to stack 156 | swap 157 | 2dup + rp@ ! ; \ move return address past string 158 | 159 | \ print string 160 | : type ( addr len -- ) 0 do dup c@ emit 1 + loop drop ; 161 | 162 | \ read char from terminal input buffer, advance >in 163 | : in> ( "c" -- c ) tib >in @ + c@ >in dup @ 1 + swap ! ; 164 | 165 | \ constant for space char 166 | : bl ( -- spc ) lit [ 1 2* 2* 2* 2* 2* , ] ; 167 | 168 | \ parse input with specified delimiter 169 | : parse ( delim "input" -- addr len ) 170 | in> drop \ skip space after parse 171 | tib >in @ + \ put address of parsed input on stack 172 | swap 0 begin \ ( addr delim len ) 173 | over in> \ ( addr delim len delim char ) 174 | <> while 175 | 1 + \ ( addr delim len+1 ) 176 | repeat swap \ ( addr len delim ) 177 | bl = if 178 | >in dup @ 1 - swap ! \ move >in back 1 char if delimiter is bl, 179 | \ otherwise the interpreter is left in a 180 | \ bad state 181 | then ; 182 | 183 | \ parse input with specified delimiter, skipping leading delimiters 184 | : word ( delim "input" -- addr len ) 185 | in> drop \ skip space after word 186 | begin dup in> <> until \ skip leading delimiters 187 | >in @ 2 - >in ! \ "put back" last char read from tib, 188 | \ and backtrack >in leading char that will 189 | \ be skipped by parse 190 | parse ; 191 | 192 | \ parse word, compile first char as literal 193 | : [char] ( "input" -- c ) 194 | ['] lit , bl word drop c@ , ; immediate 195 | 196 | : ." ( "input" -- ) 197 | [char] " parse \ parse input up to " 198 | state @ if 199 | ['] litstring , \ compile litstring 200 | dup , \ compile length 201 | 0 do dup c@ c, 1 + loop drop \ compile string 202 | ['] type , \ display string at runtime 203 | else 204 | type \ display string 205 | then ; immediate 206 | 207 | ." hello, world" 208 | : hello ." hello, world" ; 209 | hello 210 | -------------------------------------------------------------------------------- /examples/02-quine.f: -------------------------------------------------------------------------------- 1 | \ Quine example for sectorforth, a 512-byte, bootable x86 Forth. 2 | \ Copyright (c) 2020 Cesar Blum 3 | \ Distributed under the MIT license. See LICENSE for details. 4 | \ Depends on definitions built in the "hello, world" example. 5 | 6 | : 0<> 0= invert ; 7 | 8 | \ get address to input buffer and number of characters in it 9 | : source ( -- addr n ) 10 | tib dup 11 | begin dup c@ 0<> while 1 + repeat 12 | tib - ; 13 | 14 | \ prints itself 15 | source type 16 | -------------------------------------------------------------------------------- /examples/03-variables.f: -------------------------------------------------------------------------------- 1 | \ Variables example for sectorforth, a 512-byte, bootable x86 Forth. 2 | \ Copyright (c) 2020 Cesar Blum 3 | \ Distributed under the MIT license. See LICENSE for details. 4 | \ Depends on definitions built in the "hello, world" example. 5 | 6 | \ constant to check/set hidden flag 7 | : 40h lit [ 1 2* 2* 2* 2* 2* 2* , ] ; 8 | 9 | \ make words visible 10 | : reveal latest @ 2 + dup @ 40h invert and swap ! ; 11 | 12 | \ creates a word that pushes the address to its body at runtime 13 | : create 14 | : \ parse word and create dictionary entry 15 | ['] lit , \ compile lit 16 | here @ 4 + , \ compile address past new word's exit call 17 | ['] exit , \ compile exit 18 | reveal \ make created word visible 19 | 0 state ! \ switch back to interpretation state 20 | 21 | \ cells are 2 bytes wide 22 | : cells ( -- x ) lit [ 2 , ] ; 23 | 24 | \ reserve bytes in dictionary 25 | : allot ( x -- ) here @ + here ! ; 26 | 27 | : variable create 1 cells allot ; 28 | 29 | variable var 30 | 2 var ! 31 | var @ emit \ should print smiley face 32 | -------------------------------------------------------------------------------- /examples/04-debugstack.f: -------------------------------------------------------------------------------- 1 | \ Stack debugging example for sectorforth, a 512-byte, bootable x86 Forth. 2 | \ Copyright (c) 2020 Cesar Blum 3 | \ Distributed under the MIT license. See LICENSE for details. 4 | \ Depends on definitions built up to the variables examples. 5 | 6 | \ make a few more basic operators 7 | : ?dup dup ?branch [ 4 , ] dup ; 8 | : -rot ( x y z -- z x y ) rot rot ; 9 | : xor ( x y -- x^y) 2dup and invert -rot or and ; 10 | : 8000h lit [ 0 c, 80h c, ] ; \ little endian 11 | : >= ( x y -- flag ) - 8000h and 0= ; 12 | : < ( x y -- flag ) >= invert ; 13 | : <= ( x y -- flag ) 2dup < -rot = or ; 14 | : 0< ( x -- flag ) 0 < ; 15 | 16 | \ divison and modulo 17 | : /mod ( x y -- x%y x/y ) 18 | over 0< -rot \ remainder negative if dividend is negative 19 | 2dup xor 0< -rot \ quotient negative if operand signs differ 20 | dup 0< if negate then \ make divisor positive if negative 21 | swap dup 0< if negate then \ make dividend positive if negative 22 | 0 >r begin \ hold quotient in return stack 23 | over 2dup >= \ while divisor greater than dividend 24 | while 25 | - \ subtract divisor from dividend 26 | r> 1 + >r \ increment quotient 27 | repeat 28 | drop nip \ leave sign flags and remainder on stack 29 | rot if negate then \ set remainder sign 30 | r> rot \ get quotient from return stack 31 | if negate then ; \ set quotient sign 32 | : / /mod nip ; 33 | : mod /mod drop ; 34 | 35 | \ constants for decimal and hexadecimal 10 (i.e. 10 and 16) 36 | : 10 lit [ 4 4 2 + + , ] ; 37 | : 10h lit [ 4 4 4 4 + + + , ] ; 38 | 39 | variable base 40 | 10 base ! 41 | 42 | \ switch to common bases 43 | : hex 10h base ! ; 44 | : decimal 10 base ! ; 45 | 46 | \ convert number to ASCII digit 47 | : digit ( x -- c ) 48 | dup 10 < if [char] 0 + else 10 - [char] A + then ; 49 | 50 | \ print space 51 | : space bl emit ; 52 | 53 | \ print number at the top of the stack in current base 54 | : . ( x -- ) 55 | -1 swap \ put sentinel on stack 56 | dup 0< if negate -1 else 0 then \ make positive if negative 57 | >r \ save sign on return stack 58 | begin base @ /mod ?dup 0= until \ convert to base 10 digits 59 | r> if [char] - emit then \ print sign 60 | begin digit emit dup -1 = until drop \ print digits 61 | space ; \ print space 62 | 63 | \ base of data stack 64 | : sp0 lit [ sp@ , ] ; 65 | 66 | \ print backspace 67 | : backspace lit [ 4 4 + , ] emit ; 68 | 69 | \ print stack 70 | : .s 71 | sp@ 0 swap begin 72 | dup sp0 < 73 | while 74 | 2 + 75 | swap 1 + swap 76 | repeat swap 77 | [char] < emit dup . backspace [char] > emit space 78 | ?dup if 79 | 0 do 2 - dup @ . loop 80 | then drop ; 81 | -------------------------------------------------------------------------------- /examples/05-fizzbuzz.f: -------------------------------------------------------------------------------- 1 | \ FizzBuzz example for sectorforth, a 512-byte, bootable x86 Forth. 2 | \ Copyright (c) 2020 Cesar Blum 3 | \ Distributed under the MIT license. See LICENSE for details. 4 | \ Depends on definitions built up to the stack debugging example. 5 | 6 | \ get do...loop index 7 | : i ( -- index ) rp@ 4 + @ ; 8 | 9 | \ make more numbers 10 | : 3 1 2 + ; 11 | : 5 2 3 + ; 12 | 13 | \ newline 14 | : cr lit [ 4 6 3 + + , ] lit [ 4 6 + , ] emit emit ; 15 | 16 | : fizzbuzz ( x -- ) 17 | cr 1 + 1 do 18 | i 3 mod 0= dup if ." Fizz" then 19 | i 5 mod 0= dup if ." Buzz" then 20 | or invert if i . then 21 | cr 22 | loop ; 23 | -------------------------------------------------------------------------------- /examples/README.md: -------------------------------------------------------------------------------- 1 | # A note on return stack manipulation 2 | 3 | In these examples, some defintions like `branch` and `lit` do a fair 4 | bit of return stack manipulation that may not be immediately intuitive 5 | to grasp. 6 | 7 | The key to understanding how those definitions work is in how Forth's 8 | [threaded code](https://en.wikipedia.org/wiki/Threaded_code) is executed. 9 | 10 | A word's body is comprised of a sequence of addresses to other words it 11 | calls. One of the processor's registers (`SI`, in the case of sectorforth) 12 | works as Forth's "instruction pointer", which is distinct from the 13 | processor's instruction pointer. 14 | 15 | Consider the following word definition: 16 | 17 | ```forth 18 | : w4 w1 w2 w3 ; 19 | ``` 20 | 21 | Its body is laid out in memory like this: 22 | 23 | ``` 24 | address addr1 addr2 addr3 25 | *---------------*---------------*---------------* 26 | contents | address of w1 | address of w2 | address of w3 | 27 | *---------------*---------------*---------------* 28 | size 2 bytes 2 bytes 2 bytes 29 | ``` 30 | 31 | When `w4` is about to be executed, `SI` points to its first cell: 32 | 33 | ``` 34 | address addr1 addr2 addr3 35 | *---------------*---------------*---------------* 36 | contents | address of w1 | address of w2 | address of w3 | 37 | *---------------*---------------*---------------* 38 | size 2 bytes 2 bytes 2 bytes 39 | ^ 40 | | 41 | *--- SI 42 | ``` 43 | 44 | When `w4` starts executing and calls `w1`, two things happen: 45 | 46 | - `SI` is advanced to the next cell (i.e. `SI = SI + 2`) 47 | - `SI` is pushed to the return stack 48 | 49 | Which means that if `w1` were to fetch the contents of the return stack 50 | (`rp@ @`), it would get `addr2` as a result. 51 | 52 | Now, when a word finishes executing, it calls `exit`, which pops the 53 | return stack, and sets `SI` to the popped address so that execution 54 | resumes there. In the example above, the execution of `w4` would 55 | resume right past the point where it called `w1`, calling `w2` next. 56 | 57 | What if `w1` were to do the following though: 58 | 59 | ```forth 60 | ... rp@ @ 2 + rp@ ! ... 61 | ``` 62 | 63 | `rp@ @ 2 +` would fetch the top of the return stack, yielding `addr2`, 64 | then it would add 2 to it, resulting in `addr3`. `rp@ !` would then 65 | replace the value at the top of the return stack with `addr3`. 66 | 67 | In that situation, when `w1` calls `exit`, the top of the return stack 68 | is popped, yielding `addr3` this time, and execution resumes there, 69 | skipping the call to `w2` in the body of `w4` and going straight to `w3`. 70 | 71 | That's how `branch`, `lit`, and other definitions that manipulate the 72 | return stack work. `branch` reads an offset from the top of the return 73 | stack (`rp@ @ @` reads the contents of the address at the top of the 74 | return stack) and adds that offset to the address at the top of the return 75 | stack itself (`rp@ @ + rp@ !`), so execution skips a number of words 76 | corresponding to the offset (it actually skips bytes, so offsets always 77 | have to be multiples of 2 to skip words). Like `branch`, `lit` reads a 78 | value from the address at the top of the return stack, but always adds 2 79 | to that same address so execution skips the literal (since attemping to 80 | execute the literal value itself would not make sense). 81 | 82 | The most involved definitions in terms of manipulating the return stack 83 | are `>r` and `r>`, which push and pop arbitrary values to and from the 84 | return stack itself: 85 | 86 | ```forth 87 | : >rexit ( addr r:addr0 -- r:addr ) 88 | rp@ ! ; \ override return address with original return 89 | \ address from >r 90 | : >r ( x -- r:x) 91 | rp@ @ \ get current return address 92 | swap rp@ ! \ replace top of return stack with value 93 | >rexit ; \ push new address to return stack 94 | : r> ( r:x -- x ) 95 | rp@ 2 + @ \ get value stored in return stack with >r 96 | rp@ @ rp@ 2 + ! \ replace value with address to return from r> 97 | lit [ here @ 6 + , ] \ get address to this word's exit call 98 | rp@ ! ; \ make code return to this word's exit call, 99 | \ effectively calling exit twice to pop return 100 | \ stack entry created by >r 101 | ``` 102 | 103 | `>r` uses an auxiliary word, `>rexit`, to push a new 104 | address to the return stack (remember, an address is pushed every time a 105 | word is called, so calling `>rexit` will do just that), then replaces it 106 | with the return address that was pushed when `>r` was called. *That* 107 | original address can thus be replaced with whatever value was on the data 108 | stack when `>r` was called. When `>r` exits, the value left at the top of 109 | the return stack is the argument to `>r`. 110 | 111 | `r>` is a bit more complicated. In addition to reading a value placed on 112 | the return stack by `>r` earlier, `r>` needs to pop that off. Evidently, 113 | it cannot do so via an auxiliary word like `>r` does, since that would 114 | only _push_ yet another address on the return stack. Instead, it obtains 115 | the address to its `exit` call (located where `;` is), and replaces the 116 | value pushed by `>r` with it. When `r>` calls `exit` the first time, 117 | execution goes back _to that same exit call_ one more time, popping off 118 | the return stack space created by `>r`; the second call to `exit` then 119 | pops the address to return to wherever `r>` was called. 120 | -------------------------------------------------------------------------------- /sectorforth.asm: -------------------------------------------------------------------------------- 1 | ; sectorforth - a 512-byte, bootable x86 Forth. 2 | ; Copyright (c) 2020 Cesar Blum 3 | ; Distributed under the MIT license. See LICENSE for details. 4 | ; 5 | ; sectorforth is a 16-bit x86 Forth that fits entirely within a 6 | ; boot sector (512 bytes). 7 | ; 8 | ; It's a direct threaded Forth, with SI acting as the Forth 9 | ; instruction pointer. Words are executed using LODSW to advance 10 | ; SI and load the next word's address into AX, which is then 11 | ; jumped to. 12 | ; 13 | ; The SP register is used as the data stack pointer, and the BP 14 | ; register acts as the return stack pointer. 15 | ; 16 | ; The minimum CPU required to run sectorforth is the 386, to use 17 | ; the SETNZ instruction. 18 | bits 16 19 | cpu 386 20 | 21 | ; Set CS to a known value by performing a far jump. Memory up to 22 | ; 0x0500 is used by the BIOS. Setting the segment to 0x0500 gives 23 | ; sectorforth an entire free segment to work with. 24 | jmp 0x0050:start 25 | 26 | ; On x86, the boot sector is loaded at 0x7c00 on boot. In segment 27 | ; 0x0500, that's 0x7700 (0x0050 << 4 + 0x7700 == 0x7c00). 28 | org 0x7700 29 | 30 | ; Define constants for the memory map. Everything is organized 31 | ; within a single 64 KB segment. TIB is placed at 0x0000 to 32 | ; simplify input parsing (the Forth variable >IN ends up being 33 | ; also a pointer into TIB, so there's no need to add >IN to TIB 34 | ; to get a pointer to the parse area). TIB is 4 KB long. 35 | TIB equ 0x0000 ; terminal input buffer (TIB) 36 | STATE equ 0x1000 ; current state (0=interpret, 1=compile) 37 | TOIN equ 0x1002 ; current read offset into TIB (>IN) 38 | RP0 equ 0x76fe ; bottom of return stack 39 | SP0 equ 0xfffe ; bottom of data stack 40 | 41 | ; Each dictionary entry is laid out in memory as such: 42 | ; 43 | ; *--------------*--------------*--------------*--------------* 44 | ; | Link pointer | Flags+Length | Name... | Code... | 45 | ; *--------------*--------------*--------------*--------------* 46 | ; 2 bytes 1 byte Length bytes Variable 47 | ; 48 | ; Flags IMMEDIATE and HIDDEN are used in assembly code. Room is 49 | ; left for an additional, user-defined flag, so word names are 50 | ; limited to 32 characters. 51 | F_IMMEDIATE equ 0x80 52 | F_HIDDEN equ 0x40 53 | LENMASK equ 0x1f 54 | 55 | ; Each dictionary entry needs a link to the previous entry. The 56 | ; last entry links to zero, marking the end of the dictionary. 57 | ; As dictionary entries are defined, link will be redefined to 58 | ; point to the previous entry. 59 | %define link 0 60 | 61 | ; defword lays out a dictionary entry where it is expanded. 62 | %macro defword 2-3 0 ; name, label, flags 63 | word_%2: 64 | dw link ; link to previous word 65 | %define link word_%2 66 | %strlen %%len %1 67 | db %3+%%len ; flags+length 68 | db %1 ; name 69 | %2: ; code starts here 70 | %endmacro 71 | 72 | ; NEXT advances execution to the next word. The actual code is 73 | ; placed further ahead for strategic reasons. The macro has to be 74 | ; defined here, since it's used in the words defined ahead. 75 | %define NEXT jmp next 76 | 77 | ; sectorforth has only eight primitive words, with which 78 | ; everything else can be built in Forth: 79 | ; 80 | ; @ ( addr -- x ) Fetch memory at addr 81 | ; ! ( x addr -- ) Store x at addr 82 | ; sp@ ( -- addr ) Get current data stack pointer 83 | ; rp@ ( -- addr ) Get current return stack pointer 84 | ; 0= ( x -- f ) -1 if top of stack is 0, 0 otherwise 85 | ; + ( x1 x2 -- n ) Add the two values at the top of the stack 86 | ; nand ( x1 x2 -- n ) NAND the two values at the top of the stack 87 | ; exit ( r:addr -- ) Resume execution at address at the top of 88 | ; the return stack 89 | defword "@",FETCH 90 | pop bx 91 | push word [bx] 92 | NEXT 93 | 94 | defword "!",STORE 95 | pop bx 96 | pop word [bx] 97 | NEXT 98 | 99 | defword "sp@",SPFETCH 100 | push sp 101 | NEXT 102 | 103 | defword "rp@",RPFETCH 104 | push bp 105 | NEXT 106 | 107 | defword "0=",ZEROEQUALS 108 | pop ax 109 | test ax,ax 110 | setnz al ; AL=0 if ZF=1, else AL=1 111 | dec ax ; AL=ff if AL=0, else AL=0 112 | cbw ; AH=AL 113 | push ax 114 | NEXT 115 | 116 | defword "+",PLUS 117 | pop bx 118 | pop ax 119 | add ax,bx 120 | push ax 121 | NEXT 122 | 123 | defword "nand",NAND 124 | pop bx 125 | pop ax 126 | and ax,bx 127 | not ax 128 | push ax 129 | NEXT 130 | 131 | defword "exit",EXIT 132 | xchg sp,bp ; swap SP and BP, SP controls return stack 133 | pop si ; pop address to next word 134 | xchg sp,bp ; restore SP and BP 135 | NEXT 136 | 137 | ; Besides primitives, a few variables are exposed to Forth code: 138 | ; TIB, STATE, >IN, HERE, and LATEST. With sectorforth's >IN being 139 | ; both an offset and a pointer into TIB (as TIB starts at 0x0000), 140 | ; TIB could be left out. But it is exposed so that sectorforth 141 | ; code that accesses the parse area can be written in an idiomatic 142 | ; fashion (e.g. TIB >IN @ +). 143 | defword "tib",TIBVAR 144 | push word TIB 145 | NEXT 146 | 147 | defword "state",STATEVAR 148 | push word STATE 149 | NEXT 150 | 151 | defword ">in",TOINVAR 152 | push word TOIN 153 | NEXT 154 | 155 | ; Strategically define next here so most jumps to it are short, 156 | ; saving extra bytes that would be taken by near jumps. 157 | next: 158 | lodsw ; load next word's address into AX 159 | jmp ax ; jump directly to it 160 | 161 | ; Words and data space for the HERE and LATEST variables. 162 | defword "here",HEREVAR 163 | push word HERE 164 | NEXT 165 | HERE: dw start_HERE 166 | 167 | defword "latest",LATESTVAR 168 | push word LATEST 169 | NEXT 170 | LATEST: dw word_SEMICOLON ; initialized to last word in dictionary 171 | 172 | ; Define a couple of I/O primitives to make things interactive. 173 | ; They can also be used to build a richer interpreter loop. 174 | ; 175 | ; KEY waits for a key press and pushes its scan code (AH) and 176 | ; ASCII character (AL) to the stack, both in a single cell. 177 | defword "key",KEY 178 | mov ah,0 179 | int 0x16 180 | push ax 181 | NEXT 182 | 183 | ; EMIT writes to the screen the ASCII character corresponding to 184 | ; the lowest 8 bits of the value at the top of the stack. 185 | defword "emit",EMIT 186 | pop ax 187 | call writechar 188 | NEXT 189 | 190 | ; The colon compiler reads a name from the terminal input buffer, 191 | ; creates a dictionary entry for it, writes machine code to jump 192 | ; to DOCOL, updates LATEST and HERE, and switches to compilation 193 | ; state. 194 | defword ":",COLON 195 | call token ; parse word from input 196 | push si 197 | mov si,di ; set parsed word as string copy source 198 | mov di,[HERE] ; set current value of HERE as destination 199 | mov ax,[LATEST] ; get pointer to latest defined word 200 | mov [LATEST],di ; update LATEST to new word being defined 201 | stosw ; link pointer 202 | mov al,cl 203 | or al,F_HIDDEN ; hide new word while it's being defined 204 | stosb ; word length 205 | rep movsb ; word name 206 | mov ax,0x26ff 207 | stosw ; compile near jump, absolute indirect... 208 | mov ax,DOCOL.addr 209 | stosw ; ...to DOCOL 210 | mov [HERE],di ; update HERE to next free position 211 | mov byte [STATE],1 ; switch to compilation state 212 | pop si 213 | NEXT 214 | 215 | ; DOCOL sets up and starts execution of a user-defined words. 216 | ; Those differ from words defined in machine code by being 217 | ; sequences of addresses to other words, so a bit of code is 218 | ; needed to save the current value of SI (this Forth's instruction 219 | ; pointer), and point it to the sequence of addresses that makes 220 | ; up a word's body. 221 | ; 222 | ; DOCOL advances AX 4 bytes, and then moves that value to SI. When 223 | ; DOCOL is jumped to, AX points to the code field of the word 224 | ; about to be executed. The 4 bytes being skipped are the actual 225 | ; jump instruction to DOCOL itself, inserted by the colon compiler 226 | ; when it creates a new entry in the dictionary. 227 | DOCOL: 228 | xchg sp,bp ; swap SP and BP, SP controls return stack 229 | push si ; push current "instruction pointer" 230 | xchg sp,bp ; restore SP and BP 231 | add ax,4 ; skip word's code field 232 | mov si,ax ; point "instruction pointer" to word body 233 | NEXT ; start executing the word 234 | 235 | ; The jump instruction inserted by the compiler is an indirect 236 | ; jump, so it needs to read the location to jump to from another 237 | ; memory location. 238 | .addr: dw DOCOL 239 | 240 | ; Semicolon is the only immediate primitive. It writes the address 241 | ; of EXIT to the end of a new word definition, makes the word 242 | ; visible in the dictionary, and switches back to interpretation 243 | ; state. 244 | defword ";",SEMICOLON,F_IMMEDIATE 245 | mov bx,[LATEST] 246 | and byte [bx+2],~F_HIDDEN ; reveal new word 247 | mov byte [STATE],0 ; switch to interpretation state 248 | mov ax,EXIT ; prepare to compile EXIT 249 | compile: 250 | mov di,[HERE] 251 | stosw ; compile contents of AX to HERE 252 | mov [HERE],di ; advance HERE to next cell 253 | NEXT 254 | 255 | ; Execution starts here. 256 | start: 257 | cld ; clear direction flag 258 | 259 | ; Set up segment registers to point to the same segment as CS. 260 | push cs 261 | push cs 262 | push cs 263 | pop ds 264 | pop es 265 | pop ss 266 | 267 | ; Skip error signaling on initialization 268 | jmp init 269 | 270 | ; Display a red '!!' to let the user know an error happened and the 271 | ; interpreter is being reset 272 | error: 273 | mov ax,0x0921 ; write '!' 274 | mov bx,0x0004 ; black background, red text 275 | mov cx,2 ; twice 276 | int 0x10 277 | 278 | ; Initialize stack pointers, state, and terminal input buffer. 279 | init: 280 | mov bp,RP0 ; BP is the return stack pointer 281 | mov sp,SP0 ; SP is the data stack pointer 282 | 283 | ; Fill TIB with zeros, and set STATE and >IN to 0 284 | mov al,0 285 | mov cx,STATE+4 286 | mov di,TIB 287 | rep stosb 288 | 289 | ; Enter the interpreter loop. 290 | ; 291 | ; Words are read one at time and searched for in the dictionary. 292 | ; If a word is found in the dictionary, it is either interpreted 293 | ; (i.e. executed) or compiled, depending on the current state and 294 | ; the word's IMMEDIATE flag. 295 | ; 296 | ; When a word is not found, the state of the interpreter is reset: 297 | ; the data and return stacks are cleared as well as the terminal 298 | ; input buffer, and the interpreter goes into interpretation mode. 299 | interpreter: 300 | call token ; parse word from input 301 | mov bx,[LATEST] ; start searching for it in the dictionary 302 | .1: test bx,bx ; zero? 303 | jz error ; not found, reset interpreter state 304 | mov si,bx 305 | lodsw ; skip link 306 | lodsb ; read flags+length 307 | mov dl,al ; save those for later use 308 | test al,F_HIDDEN ; entry hidden? 309 | jnz .2 ; if so, skip it 310 | and al,LENMASK ; mask out flags 311 | cmp al,cl ; same length? 312 | jne .2 ; if not, skip entry 313 | push cx 314 | push di 315 | repe cmpsb ; compare strings 316 | pop di 317 | pop cx 318 | je .3 ; if equal, search is over 319 | .2: mov bx,[bx] ; skip to next entry 320 | jmp .1 ; try again 321 | .3: mov ax,si ; after comparison, SI points to code field 322 | mov si,.loop ; set SI so NEXT loops back to interpreter 323 | ; Decide whether to interpret or compile the word. The IMMEDIATE 324 | ; flag is located in the most significant bit of the flags+length 325 | ; byte. STATE can only be 0 or 1. When ORing those two, these are 326 | ; the possibilities: 327 | ; 328 | ; IMMEDIATE STATE OR ACTION 329 | ; 0000000 0000000 00000000 Interpret 330 | ; 0000000 0000001 00000001 Compile 331 | ; 1000000 0000000 10000000 Interpret 332 | ; 1000000 0000001 10000001 Interpret 333 | ; 334 | ; A word is only compiled when the result of that OR is 1. 335 | ; Decrementing that result sets the zero flag for a conditional 336 | ; jump. 337 | and dl,F_IMMEDIATE ; isolate IMMEDIATE flag 338 | or dl,[STATE] ; OR with state 339 | dec dl ; decrement 340 | jz compile ; if result is zero, compile 341 | jmp ax ; otherwise, interpret (execute) the word 342 | .loop: dw interpreter 343 | 344 | ; Parse a word from the terminal input buffer and return its 345 | ; address and length in DI and CX, respectively. 346 | ; 347 | ; If after skipping spaces a 0 is found, more input is read from 348 | ; the keyboard into the terminal input buffer until return is 349 | ; pressed, at which point execution jumps back to the beginning of 350 | ; token so it can attempt to parse a word again. 351 | ; 352 | ; Before reading input from the keyboard, a CRLF is emitted so 353 | ; the user can enter input on a fresh, blank line on the screen. 354 | token: 355 | mov di,[TOIN] ; starting at the current position in TIB 356 | mov cx,-1 ; search "indefinitely" 357 | mov al,32 ; for a character that's not a space 358 | repe scasb 359 | dec di ; result is one byte past found character 360 | cmp byte [di],0 ; found a 0? 361 | je .readline ; if so, read more input 362 | mov cx,-1 ; search "indefinitely" again 363 | repne scasb ; this time, for a space 364 | dec di ; adjust DI again 365 | mov [TOIN],di ; update current position in TIB 366 | not cx ; after ones' complement, CX=length+1 367 | dec cx ; adjust CX to correct length 368 | sub di,cx ; point to start of parsed word 369 | ret 370 | .readline: 371 | mov al,13 372 | call writechar ; CR 373 | mov al,10 374 | call writechar ; LF 375 | mov di,TIB ; read into TIB 376 | .1: mov ah,0 ; wait until a key is pressed 377 | int 0x16 378 | cmp al,13 ; return pressed? 379 | je .3 ; if so, finish reading 380 | cmp al,8 ; backspace pressed? 381 | je .2 ; if so, erase character 382 | call writechar ; otherwise, write character to screen 383 | stosb ; store character in TIB 384 | jmp .1 ; keep reading 385 | .2: cmp di,TIB ; start of TIB? 386 | je .1 ; if so, there's nothing to erase 387 | dec di ; erase character in TIB 388 | call writechar ; move cursor back one character 389 | mov ax,0x0a20 ; erase without moving cursor 390 | mov cx,1 391 | int 0x10 ; (BH already set to 0 by writechar) 392 | jmp .1 ; keep reading 393 | .3: mov ax,0x0020 394 | stosw ; put final delimiter and 0 in TIB 395 | call writechar ; write a space between user input and 396 | ; execution output 397 | mov word [TOIN],0 ; point >IN to start of TIB 398 | jmp token ; try parsing a word again 399 | 400 | ; writechar writes a character to the screen. It uses INT 10/AH=0e 401 | ; to perform teletype output, writing the character, updating the 402 | ; cursor, and scrolling the screen, all in one go. Writing 403 | ; backspace using the BIOS only moves the cursor backwards within 404 | ; a line, but does not move it back to the previous line. 405 | ; writechar addresses that. 406 | writechar: 407 | push ax ; INT 10h/AH=03h clobbers AX in some BIOSes 408 | mov bh,0 ; video page 0 for all BIOS calls 409 | mov ah,3 ; get cursor position (DH=row, DL=column) 410 | int 0x10 411 | pop ax ; restore AX 412 | mov ah,0x0e ; teletype output 413 | mov bl,0x7 ; black background, light grey text 414 | int 0x10 415 | cmp al,8 ; backspace? 416 | jne .1 ; if not, nothing else to do 417 | test dl,dl ; was cursor in first column? 418 | jnz .1 ; if not, nothing else to do 419 | mov ah,2 ; move cursor 420 | mov dl,79 ; to last column 421 | dec dh ; of previous row 422 | int 0x10 423 | .1: ret 424 | 425 | times 510-($-$$) db 0 426 | db 0x55, 0xaa 427 | 428 | ; New dictionary entries will be written starting here. 429 | start_HERE: 430 | --------------------------------------------------------------------------------