├── LICENSE
├── Makefile
├── README.md
├── examples
├── 01-helloworld.f
├── 02-quine.f
├── 03-variables.f
├── 04-debugstack.f
├── 05-fizzbuzz.f
└── README.md
└── sectorforth.asm
/LICENSE:
--------------------------------------------------------------------------------
1 | Copyright (c) 2020 Cesar Blum
2 |
3 | Permission is hereby granted, free of charge, to any person obtaining a copy
4 | of this software and associated documentation files (the "Software"), to deal
5 | in the Software without restriction, including without limitation the rights
6 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
7 | copies of the Software, and to permit persons to whom the Software is
8 | furnished to do so, subject to the following conditions:
9 |
10 | The above copyright notice and this permission notice shall be included in all
11 | copies or substantial portions of the Software.
12 |
13 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
14 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
15 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
16 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
17 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
18 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
19 | SOFTWARE.
20 |
--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
1 | name = sectorforth
2 |
3 | all: $(name).bin $(name).img
4 |
5 | %.bin: %.asm
6 | nasm -f bin -o $@ -l $(^:.asm=.lst) $^
7 |
8 | %.img: %.bin
9 | dd if=$^ of=boot.img bs=512
10 | dd if=/dev/zero of=zero.img bs=512 count=2879
11 | cat boot.img zero.img > $@
12 | rm -f boot.img zero.img
13 |
14 | .PHONY: debug
15 | gdb: $(name).bin
16 | qemu-system-i386 -hda $^ -monitor stdio -s -S
17 |
18 | .PHONY: run
19 | run: $(name).bin
20 | qemu-system-i386 -hda $^
21 |
22 | .PHONY: clean
23 | clean:
24 | rm -rf *.{bin,lst,img}
25 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # sectorforth
2 |
3 | sectorforth is a 16-bit x86 Forth that fits in a 512-byte boot sector.
4 |
5 | Inspiration to write sectorforth came from a 1996
6 | [Usenet thread](https://groups.google.com/g/comp.lang.forth/c/NS2icrCj1jQ)
7 | (in particular, Bernd Paysan's first post on the thread).
8 |
9 | ## Batteries not included
10 |
11 | sectorforth contains only the eight primitives outlined in the Usenet
12 | post above, five variables for manipulating internal state, and two I/O
13 | primitives.
14 |
15 | With that minimal set of building blocks, words for branching, compiling,
16 | manipulating the return stack, etc. can all be written in Forth itself
17 | (check out the examples!).
18 |
19 | The colon compiler (`:`) is available, so new words can be defined easily
20 | (that means `;` is also there, of course).
21 |
22 | Contrary to many Forth implementations, sectorforth does not attempt to
23 | convert unknown words to numbers, since numbers can be produced using the
24 | available primitives. The two included I/O primitives are sufficient to
25 | write a more powerful interpreter that can parse numbers.
26 |
27 | ### Primitives
28 |
29 | | Primitive | Stack effects | Description |
30 | | --------- | ------------- | --------------------------------------------- |
31 | | `@` | ( addr -- x ) | Fetch memory contents at addr |
32 | | `!` | ( x addr -- ) | Store x at addr |
33 | | `sp@` | ( -- sp ) | Get pointer to top of data stack |
34 | | `rp@` | ( -- rp ) | Get pointer to top of return stack |
35 | | `0=` | ( x -- flag ) | -1 if top of stack is 0, 0 otherwise |
36 | | `+` | ( x y -- z ) | Sum the two numbers at the top of the stack |
37 | | `nand` | ( x y -- z ) | NAND the two numbers at the top of the stack |
38 | | `exit` | ( r:addr -- ) | Pop return stack and resume execution at addr |
39 | | `key` | ( -- x ) | Read key stroke as ASCII character |
40 | | `emit` | ( x -- ) | Print low byte of x as an ASCII character |
41 |
42 | ### Variables
43 |
44 | | Variable | Description |
45 | | -------- | ------------------------------------------------------------- |
46 | | `state` | 0: execute words; 1: compile word addresses to the dictionary |
47 | | `tib` | Terminal input buffer, where input is parsed from |
48 | | `>in` | Current parsing offset into terminal input buffer |
49 | | `here` | Pointer to next free position in the dictionary |
50 | | `latest` | Pointer to most recent dictionary entry |
51 |
52 | ## Compiling
53 |
54 | sectorforth was developed using NASM 2.15.01. Earlier versions of NASM
55 | are probably capable of compiling it, but that hasn't been tested.
56 |
57 | To compile sectorforth, just run `make`:
58 |
59 | ```
60 | $ make
61 | ```
62 |
63 | That will produce a compiled binary (`sectorforth.bin`) and a floppy disk
64 | image (`sectorforth.img`) containing the binary in its boot sector.
65 |
66 | ## Running
67 |
68 | The makefile contains two targets for running sectorforth in QEMU:
69 |
70 | - `debug` starts QEMU in debug mode, with execution paused. That allows
71 | you to set up a remote target in GDB (`target remote localhost:1234`) and
72 | set any breakpoints you want before sectorforth starts running.
73 | - `run` simply runs sectorforth in QEMU.
74 |
75 | ## Usage
76 |
77 | Up to 4KB of input can be entered per line. After pressing return, the
78 | interpreter parses one word at a time an interprets it (i.e. executes it
79 | or compiles it, according to the current value of the `state` variable).
80 |
81 | sectorforth does not print the ` ok` prompt familiar to Forth users.
82 | However, if a word is not found in the dictionary, the error message `!!`
83 | is printed in red, letting you know an error happened.
84 |
85 | When a word is not found in the dictionary, the interpreter's state is
86 | reset: the data and return stacks, as well as the terminal input buffer
87 | are cleared, and the interpreter is placed in interpretation mode. Other
88 | errors (e.g. compiling an invalid address in a word definition and
89 | attempting to execute it) are not handled gracefully, and will crash the
90 | interpreter.
91 |
92 | ## Code structure
93 |
94 | Comments throughout the code assume familiarity with Forth and how it is
95 | commonly implemented.
96 |
97 | If you're not familiar with Forth, read Leo Brodie's
98 | [Starting Forth](https://www.forth.com/starting-forth).
99 |
100 | If you're not familiar with how Forth is implemented on x86, read the
101 | assembly code for Richard W.M. Jones'
102 | [jonesforth](http://git.annexia.org/?p=jonesforth.git;a=blob;f=jonesforth.S).
103 |
104 | sectorforth draws a lot of inspiration from jonesforth, but the latter
105 | does a much better job at explaining the basics in its comments.
106 |
107 | For an excellent introduction to threaded code techniques, and to how to
108 | implement Forth in different architectures, read Brad Rodriguez's
109 | [Moving Forth](http://www.bradrodriguez.com/papers/moving1.htm).
110 |
--------------------------------------------------------------------------------
/examples/01-helloworld.f:
--------------------------------------------------------------------------------
1 | \ "hello, world" example for sectorforth, a 512-byte, bootable x86 Forth.
2 | \ Copyright (c) 2020 Cesar Blum
3 | \ Distributed under the MIT license. See LICENSE for details.
4 |
5 | : dup ( x -- x x ) sp@ @ ;
6 |
7 | \ make some numbers
8 | : -1 ( x -- x -1 ) dup dup nand dup dup nand nand ;
9 | : 0 -1 dup nand ;
10 | : 1 -1 dup + dup nand ;
11 | : 2 1 1 + ;
12 | : 4 2 2 + ;
13 | : 6 2 4 + ;
14 |
15 | \ logic and arithmetic operators
16 | : invert ( x -- !x ) dup nand ;
17 | : and ( x y -- x&y ) nand invert ;
18 | : negate ( x -- -x ) invert 1 + ;
19 | : - ( x y -- x-y ) negate + ;
20 |
21 | \ equality checks
22 | : = ( x y -- flag ) - 0= ;
23 | : <> ( x y -- flag ) = invert ;
24 |
25 | \ stack manipulation words
26 | : drop ( x y -- x ) dup - + ;
27 | : over ( x y -- x y x ) sp@ 2 + @ ;
28 | : swap ( x y -- y x ) over over sp@ 6 + ! sp@ 2 + ! ;
29 | : nip ( x y -- y ) swap drop ;
30 | : 2dup ( x y -- x y x y ) over over ;
31 | : 2drop ( x y -- ) drop drop ;
32 |
33 | \ more logic
34 | : or ( x y -- x|y ) invert swap invert and invert ;
35 |
36 | \ compile things
37 | : , ( x -- ) here @ ! here @ 2 + here ! ;
38 |
39 | \ left shift 1 bit
40 | : 2* ( x -- 2x ) dup + ;
41 |
42 | \ constant to check/set immediate flag
43 | : 80h ( -- 80h ) 1 2* 2* 2* 2* 2* 2* 2* ;
44 |
45 | \ make words immediate
46 | : immediate latest @ 2 + dup @ 80h or swap ! ;
47 |
48 | \ control interpreter state
49 | : [ 0 state ! ; immediate
50 | : ] 1 state ! ;
51 |
52 | \ unconditional branch
53 | : branch ( r:addr -- r:addr+offset ) rp@ @ dup @ + rp@ ! ;
54 |
55 | \ conditional branch when top of stack is 0
56 | : ?branch ( r:addr -- r:addr | r:addr+offset)
57 | 0= rp@ @ @ 2 - and rp@ @ + 2 + rp@ ! ;
58 |
59 | \ lit pushes the value on the next cell to the stack at runtime
60 | \ e.g. lit [ 42 , ] pushes 42 to the stack
61 | : lit ( -- x ) rp@ @ dup 2 + rp@ ! @ ;
62 |
63 | \ ['] is identical to lit, the choice of either depends on context
64 | \ don't write as : ['] lit ; as that will break lit's assumptions about
65 | \ the return stack
66 | : ['] ( -- addr ) rp@ @ dup 2 + rp@ ! @ ;
67 |
68 | \ push/pop return stack
69 | : >rexit ( addr r:addr0 -- r:addr )
70 | rp@ ! ; \ override return address with original return
71 | \ address from >r
72 | : >r ( x -- r:x)
73 | rp@ @ \ get current return address
74 | swap rp@ ! \ replace top of return stack with value
75 | >rexit ; \ push new address to return stack
76 | : r> ( r:x -- x )
77 | rp@ 2 + @ \ get value stored in return stack with >r
78 | rp@ @ rp@ 2 + ! \ replace value with address to return from r>
79 | lit [ here @ 6 + , ] \ get address to this word's exit call
80 | rp@ ! ; \ make code return to this word's exit call,
81 | \ effectively calling exit twice to pop return
82 | \ stack entry created by >r
83 |
84 | \ rotate stack
85 | : rot ( x y z -- y z x ) >r swap r> swap ;
86 |
87 | \ if/then/else
88 | : if
89 | ['] ?branch , \ compile ?branch to skip if's body when false
90 | here @ \ get address where offset will be written
91 | 0 , \ compile dummy offset
92 | ; immediate
93 | : then
94 | dup \ duplicate offset address
95 | here @ swap - \ calculate offset from if/else
96 | swap ! \ store calculated offset for ?branch/branch
97 | ; immediate
98 | : else
99 | ['] branch , \ compile branch to skip else's body when true
100 | here @ \ get address where offset will be written
101 | 0 , \ compile dummy offset
102 | swap \ bring if's ?branch offset address to top of stack
103 | dup here @ swap - \ calculate offset from if
104 | swap ! \ store calculated offset for ?branch
105 | ; immediate
106 |
107 | \ begin...while...repeat and begin...until loops
108 | : begin
109 | here @ \ get location to branch back to
110 | ; immediate
111 | : while
112 | ['] ?branch , \ compile ?branch to terminate loop when false
113 | here @ \ get address where offset will be written
114 | 0 , \ compile dummy offset
115 | ; immediate
116 | : repeat
117 | swap \ offset will be negative
118 | ['] branch , here @ - , \ compile branch back to begin
119 | dup here @ swap - swap ! \ compile offset from while
120 | ; immediate
121 | : until
122 | ['] ?branch , here @ - , \ compile ?branch back to begin
123 | ; immediate
124 |
125 | \ do...loop loops
126 | : do ( end index -- )
127 | here @ \ get location to branch back to
128 | ['] >r , ['] >r , \ at runtime, push inputs to return stack
129 | ; immediate
130 | : loop
131 | ['] r> , ['] r> , \ move current index and end to data stack
132 | ['] lit , 1 , ['] + , \ increment index
133 | ['] 2dup , ['] = , \ index equals end?
134 | ['] ?branch , here @ - , \ when false, branch back to do
135 | ['] 2drop , \ discard index and end when loop terminates
136 | ; immediate
137 |
138 | \ fetch/store bytes
139 | : 0fh lit [ 4 4 4 4 + + + 1 - , ] ;
140 | : ffh lit [ 0fh 2* 2* 2* 2* 0fh or , ] ;
141 | : c@ ( -- c ) @ ffh and ;
142 | : c! ( c addr -- )
143 | dup @ \ fetch memory contents at address
144 | ffh invert and \ zero out low byte
145 | rot ffh and \ zero out high byte of value being stored
146 | or swap ! \ overwrite low byte of existing contents
147 | ;
148 |
149 | \ compile bytes
150 | : c, ( x -- ) here @ c! here @ 1 + here ! ;
151 |
152 | \ read literal string from word body
153 | : litstring ( -- addr len )
154 | rp@ @ dup 2 + rp@ ! @ \ push length to stack
155 | rp@ @ \ push string address to stack
156 | swap
157 | 2dup + rp@ ! ; \ move return address past string
158 |
159 | \ print string
160 | : type ( addr len -- ) 0 do dup c@ emit 1 + loop drop ;
161 |
162 | \ read char from terminal input buffer, advance >in
163 | : in> ( "c" -- c ) tib >in @ + c@ >in dup @ 1 + swap ! ;
164 |
165 | \ constant for space char
166 | : bl ( -- spc ) lit [ 1 2* 2* 2* 2* 2* , ] ;
167 |
168 | \ parse input with specified delimiter
169 | : parse ( delim "input" -- addr len )
170 | in> drop \ skip space after parse
171 | tib >in @ + \ put address of parsed input on stack
172 | swap 0 begin \ ( addr delim len )
173 | over in> \ ( addr delim len delim char )
174 | <> while
175 | 1 + \ ( addr delim len+1 )
176 | repeat swap \ ( addr len delim )
177 | bl = if
178 | >in dup @ 1 - swap ! \ move >in back 1 char if delimiter is bl,
179 | \ otherwise the interpreter is left in a
180 | \ bad state
181 | then ;
182 |
183 | \ parse input with specified delimiter, skipping leading delimiters
184 | : word ( delim "input" -- addr len )
185 | in> drop \ skip space after word
186 | begin dup in> <> until \ skip leading delimiters
187 | >in @ 2 - >in ! \ "put back" last char read from tib,
188 | \ and backtrack >in leading char that will
189 | \ be skipped by parse
190 | parse ;
191 |
192 | \ parse word, compile first char as literal
193 | : [char] ( "input" -- c )
194 | ['] lit , bl word drop c@ , ; immediate
195 |
196 | : ." ( "input" -- )
197 | [char] " parse \ parse input up to "
198 | state @ if
199 | ['] litstring , \ compile litstring
200 | dup , \ compile length
201 | 0 do dup c@ c, 1 + loop drop \ compile string
202 | ['] type , \ display string at runtime
203 | else
204 | type \ display string
205 | then ; immediate
206 |
207 | ." hello, world"
208 | : hello ." hello, world" ;
209 | hello
210 |
--------------------------------------------------------------------------------
/examples/02-quine.f:
--------------------------------------------------------------------------------
1 | \ Quine example for sectorforth, a 512-byte, bootable x86 Forth.
2 | \ Copyright (c) 2020 Cesar Blum
3 | \ Distributed under the MIT license. See LICENSE for details.
4 | \ Depends on definitions built in the "hello, world" example.
5 |
6 | : 0<> 0= invert ;
7 |
8 | \ get address to input buffer and number of characters in it
9 | : source ( -- addr n )
10 | tib dup
11 | begin dup c@ 0<> while 1 + repeat
12 | tib - ;
13 |
14 | \ prints itself
15 | source type
16 |
--------------------------------------------------------------------------------
/examples/03-variables.f:
--------------------------------------------------------------------------------
1 | \ Variables example for sectorforth, a 512-byte, bootable x86 Forth.
2 | \ Copyright (c) 2020 Cesar Blum
3 | \ Distributed under the MIT license. See LICENSE for details.
4 | \ Depends on definitions built in the "hello, world" example.
5 |
6 | \ constant to check/set hidden flag
7 | : 40h lit [ 1 2* 2* 2* 2* 2* 2* , ] ;
8 |
9 | \ make words visible
10 | : reveal latest @ 2 + dup @ 40h invert and swap ! ;
11 |
12 | \ creates a word that pushes the address to its body at runtime
13 | : create
14 | : \ parse word and create dictionary entry
15 | ['] lit , \ compile lit
16 | here @ 4 + , \ compile address past new word's exit call
17 | ['] exit , \ compile exit
18 | reveal \ make created word visible
19 | 0 state ! \ switch back to interpretation state
20 |
21 | \ cells are 2 bytes wide
22 | : cells ( -- x ) lit [ 2 , ] ;
23 |
24 | \ reserve bytes in dictionary
25 | : allot ( x -- ) here @ + here ! ;
26 |
27 | : variable create 1 cells allot ;
28 |
29 | variable var
30 | 2 var !
31 | var @ emit \ should print smiley face
32 |
--------------------------------------------------------------------------------
/examples/04-debugstack.f:
--------------------------------------------------------------------------------
1 | \ Stack debugging example for sectorforth, a 512-byte, bootable x86 Forth.
2 | \ Copyright (c) 2020 Cesar Blum
3 | \ Distributed under the MIT license. See LICENSE for details.
4 | \ Depends on definitions built up to the variables examples.
5 |
6 | \ make a few more basic operators
7 | : ?dup dup ?branch [ 4 , ] dup ;
8 | : -rot ( x y z -- z x y ) rot rot ;
9 | : xor ( x y -- x^y) 2dup and invert -rot or and ;
10 | : 8000h lit [ 0 c, 80h c, ] ; \ little endian
11 | : >= ( x y -- flag ) - 8000h and 0= ;
12 | : < ( x y -- flag ) >= invert ;
13 | : <= ( x y -- flag ) 2dup < -rot = or ;
14 | : 0< ( x -- flag ) 0 < ;
15 |
16 | \ divison and modulo
17 | : /mod ( x y -- x%y x/y )
18 | over 0< -rot \ remainder negative if dividend is negative
19 | 2dup xor 0< -rot \ quotient negative if operand signs differ
20 | dup 0< if negate then \ make divisor positive if negative
21 | swap dup 0< if negate then \ make dividend positive if negative
22 | 0 >r begin \ hold quotient in return stack
23 | over 2dup >= \ while divisor greater than dividend
24 | while
25 | - \ subtract divisor from dividend
26 | r> 1 + >r \ increment quotient
27 | repeat
28 | drop nip \ leave sign flags and remainder on stack
29 | rot if negate then \ set remainder sign
30 | r> rot \ get quotient from return stack
31 | if negate then ; \ set quotient sign
32 | : / /mod nip ;
33 | : mod /mod drop ;
34 |
35 | \ constants for decimal and hexadecimal 10 (i.e. 10 and 16)
36 | : 10 lit [ 4 4 2 + + , ] ;
37 | : 10h lit [ 4 4 4 4 + + + , ] ;
38 |
39 | variable base
40 | 10 base !
41 |
42 | \ switch to common bases
43 | : hex 10h base ! ;
44 | : decimal 10 base ! ;
45 |
46 | \ convert number to ASCII digit
47 | : digit ( x -- c )
48 | dup 10 < if [char] 0 + else 10 - [char] A + then ;
49 |
50 | \ print space
51 | : space bl emit ;
52 |
53 | \ print number at the top of the stack in current base
54 | : . ( x -- )
55 | -1 swap \ put sentinel on stack
56 | dup 0< if negate -1 else 0 then \ make positive if negative
57 | >r \ save sign on return stack
58 | begin base @ /mod ?dup 0= until \ convert to base 10 digits
59 | r> if [char] - emit then \ print sign
60 | begin digit emit dup -1 = until drop \ print digits
61 | space ; \ print space
62 |
63 | \ base of data stack
64 | : sp0 lit [ sp@ , ] ;
65 |
66 | \ print backspace
67 | : backspace lit [ 4 4 + , ] emit ;
68 |
69 | \ print stack
70 | : .s
71 | sp@ 0 swap begin
72 | dup sp0 <
73 | while
74 | 2 +
75 | swap 1 + swap
76 | repeat swap
77 | [char] < emit dup . backspace [char] > emit space
78 | ?dup if
79 | 0 do 2 - dup @ . loop
80 | then drop ;
81 |
--------------------------------------------------------------------------------
/examples/05-fizzbuzz.f:
--------------------------------------------------------------------------------
1 | \ FizzBuzz example for sectorforth, a 512-byte, bootable x86 Forth.
2 | \ Copyright (c) 2020 Cesar Blum
3 | \ Distributed under the MIT license. See LICENSE for details.
4 | \ Depends on definitions built up to the stack debugging example.
5 |
6 | \ get do...loop index
7 | : i ( -- index ) rp@ 4 + @ ;
8 |
9 | \ make more numbers
10 | : 3 1 2 + ;
11 | : 5 2 3 + ;
12 |
13 | \ newline
14 | : cr lit [ 4 6 3 + + , ] lit [ 4 6 + , ] emit emit ;
15 |
16 | : fizzbuzz ( x -- )
17 | cr 1 + 1 do
18 | i 3 mod 0= dup if ." Fizz" then
19 | i 5 mod 0= dup if ." Buzz" then
20 | or invert if i . then
21 | cr
22 | loop ;
23 |
--------------------------------------------------------------------------------
/examples/README.md:
--------------------------------------------------------------------------------
1 | # A note on return stack manipulation
2 |
3 | In these examples, some defintions like `branch` and `lit` do a fair
4 | bit of return stack manipulation that may not be immediately intuitive
5 | to grasp.
6 |
7 | The key to understanding how those definitions work is in how Forth's
8 | [threaded code](https://en.wikipedia.org/wiki/Threaded_code) is executed.
9 |
10 | A word's body is comprised of a sequence of addresses to other words it
11 | calls. One of the processor's registers (`SI`, in the case of sectorforth)
12 | works as Forth's "instruction pointer", which is distinct from the
13 | processor's instruction pointer.
14 |
15 | Consider the following word definition:
16 |
17 | ```forth
18 | : w4 w1 w2 w3 ;
19 | ```
20 |
21 | Its body is laid out in memory like this:
22 |
23 | ```
24 | address addr1 addr2 addr3
25 | *---------------*---------------*---------------*
26 | contents | address of w1 | address of w2 | address of w3 |
27 | *---------------*---------------*---------------*
28 | size 2 bytes 2 bytes 2 bytes
29 | ```
30 |
31 | When `w4` is about to be executed, `SI` points to its first cell:
32 |
33 | ```
34 | address addr1 addr2 addr3
35 | *---------------*---------------*---------------*
36 | contents | address of w1 | address of w2 | address of w3 |
37 | *---------------*---------------*---------------*
38 | size 2 bytes 2 bytes 2 bytes
39 | ^
40 | |
41 | *--- SI
42 | ```
43 |
44 | When `w4` starts executing and calls `w1`, two things happen:
45 |
46 | - `SI` is advanced to the next cell (i.e. `SI = SI + 2`)
47 | - `SI` is pushed to the return stack
48 |
49 | Which means that if `w1` were to fetch the contents of the return stack
50 | (`rp@ @`), it would get `addr2` as a result.
51 |
52 | Now, when a word finishes executing, it calls `exit`, which pops the
53 | return stack, and sets `SI` to the popped address so that execution
54 | resumes there. In the example above, the execution of `w4` would
55 | resume right past the point where it called `w1`, calling `w2` next.
56 |
57 | What if `w1` were to do the following though:
58 |
59 | ```forth
60 | ... rp@ @ 2 + rp@ ! ...
61 | ```
62 |
63 | `rp@ @ 2 +` would fetch the top of the return stack, yielding `addr2`,
64 | then it would add 2 to it, resulting in `addr3`. `rp@ !` would then
65 | replace the value at the top of the return stack with `addr3`.
66 |
67 | In that situation, when `w1` calls `exit`, the top of the return stack
68 | is popped, yielding `addr3` this time, and execution resumes there,
69 | skipping the call to `w2` in the body of `w4` and going straight to `w3`.
70 |
71 | That's how `branch`, `lit`, and other definitions that manipulate the
72 | return stack work. `branch` reads an offset from the top of the return
73 | stack (`rp@ @ @` reads the contents of the address at the top of the
74 | return stack) and adds that offset to the address at the top of the return
75 | stack itself (`rp@ @ + rp@ !`), so execution skips a number of words
76 | corresponding to the offset (it actually skips bytes, so offsets always
77 | have to be multiples of 2 to skip words). Like `branch`, `lit` reads a
78 | value from the address at the top of the return stack, but always adds 2
79 | to that same address so execution skips the literal (since attemping to
80 | execute the literal value itself would not make sense).
81 |
82 | The most involved definitions in terms of manipulating the return stack
83 | are `>r` and `r>`, which push and pop arbitrary values to and from the
84 | return stack itself:
85 |
86 | ```forth
87 | : >rexit ( addr r:addr0 -- r:addr )
88 | rp@ ! ; \ override return address with original return
89 | \ address from >r
90 | : >r ( x -- r:x)
91 | rp@ @ \ get current return address
92 | swap rp@ ! \ replace top of return stack with value
93 | >rexit ; \ push new address to return stack
94 | : r> ( r:x -- x )
95 | rp@ 2 + @ \ get value stored in return stack with >r
96 | rp@ @ rp@ 2 + ! \ replace value with address to return from r>
97 | lit [ here @ 6 + , ] \ get address to this word's exit call
98 | rp@ ! ; \ make code return to this word's exit call,
99 | \ effectively calling exit twice to pop return
100 | \ stack entry created by >r
101 | ```
102 |
103 | `>r` uses an auxiliary word, `>rexit`, to push a new
104 | address to the return stack (remember, an address is pushed every time a
105 | word is called, so calling `>rexit` will do just that), then replaces it
106 | with the return address that was pushed when `>r` was called. *That*
107 | original address can thus be replaced with whatever value was on the data
108 | stack when `>r` was called. When `>r` exits, the value left at the top of
109 | the return stack is the argument to `>r`.
110 |
111 | `r>` is a bit more complicated. In addition to reading a value placed on
112 | the return stack by `>r` earlier, `r>` needs to pop that off. Evidently,
113 | it cannot do so via an auxiliary word like `>r` does, since that would
114 | only _push_ yet another address on the return stack. Instead, it obtains
115 | the address to its `exit` call (located where `;` is), and replaces the
116 | value pushed by `>r` with it. When `r>` calls `exit` the first time,
117 | execution goes back _to that same exit call_ one more time, popping off
118 | the return stack space created by `>r`; the second call to `exit` then
119 | pops the address to return to wherever `r>` was called.
120 |
--------------------------------------------------------------------------------
/sectorforth.asm:
--------------------------------------------------------------------------------
1 | ; sectorforth - a 512-byte, bootable x86 Forth.
2 | ; Copyright (c) 2020 Cesar Blum
3 | ; Distributed under the MIT license. See LICENSE for details.
4 | ;
5 | ; sectorforth is a 16-bit x86 Forth that fits entirely within a
6 | ; boot sector (512 bytes).
7 | ;
8 | ; It's a direct threaded Forth, with SI acting as the Forth
9 | ; instruction pointer. Words are executed using LODSW to advance
10 | ; SI and load the next word's address into AX, which is then
11 | ; jumped to.
12 | ;
13 | ; The SP register is used as the data stack pointer, and the BP
14 | ; register acts as the return stack pointer.
15 | ;
16 | ; The minimum CPU required to run sectorforth is the 386, to use
17 | ; the SETNZ instruction.
18 | bits 16
19 | cpu 386
20 |
21 | ; Set CS to a known value by performing a far jump. Memory up to
22 | ; 0x0500 is used by the BIOS. Setting the segment to 0x0500 gives
23 | ; sectorforth an entire free segment to work with.
24 | jmp 0x0050:start
25 |
26 | ; On x86, the boot sector is loaded at 0x7c00 on boot. In segment
27 | ; 0x0500, that's 0x7700 (0x0050 << 4 + 0x7700 == 0x7c00).
28 | org 0x7700
29 |
30 | ; Define constants for the memory map. Everything is organized
31 | ; within a single 64 KB segment. TIB is placed at 0x0000 to
32 | ; simplify input parsing (the Forth variable >IN ends up being
33 | ; also a pointer into TIB, so there's no need to add >IN to TIB
34 | ; to get a pointer to the parse area). TIB is 4 KB long.
35 | TIB equ 0x0000 ; terminal input buffer (TIB)
36 | STATE equ 0x1000 ; current state (0=interpret, 1=compile)
37 | TOIN equ 0x1002 ; current read offset into TIB (>IN)
38 | RP0 equ 0x76fe ; bottom of return stack
39 | SP0 equ 0xfffe ; bottom of data stack
40 |
41 | ; Each dictionary entry is laid out in memory as such:
42 | ;
43 | ; *--------------*--------------*--------------*--------------*
44 | ; | Link pointer | Flags+Length | Name... | Code... |
45 | ; *--------------*--------------*--------------*--------------*
46 | ; 2 bytes 1 byte Length bytes Variable
47 | ;
48 | ; Flags IMMEDIATE and HIDDEN are used in assembly code. Room is
49 | ; left for an additional, user-defined flag, so word names are
50 | ; limited to 32 characters.
51 | F_IMMEDIATE equ 0x80
52 | F_HIDDEN equ 0x40
53 | LENMASK equ 0x1f
54 |
55 | ; Each dictionary entry needs a link to the previous entry. The
56 | ; last entry links to zero, marking the end of the dictionary.
57 | ; As dictionary entries are defined, link will be redefined to
58 | ; point to the previous entry.
59 | %define link 0
60 |
61 | ; defword lays out a dictionary entry where it is expanded.
62 | %macro defword 2-3 0 ; name, label, flags
63 | word_%2:
64 | dw link ; link to previous word
65 | %define link word_%2
66 | %strlen %%len %1
67 | db %3+%%len ; flags+length
68 | db %1 ; name
69 | %2: ; code starts here
70 | %endmacro
71 |
72 | ; NEXT advances execution to the next word. The actual code is
73 | ; placed further ahead for strategic reasons. The macro has to be
74 | ; defined here, since it's used in the words defined ahead.
75 | %define NEXT jmp next
76 |
77 | ; sectorforth has only eight primitive words, with which
78 | ; everything else can be built in Forth:
79 | ;
80 | ; @ ( addr -- x ) Fetch memory at addr
81 | ; ! ( x addr -- ) Store x at addr
82 | ; sp@ ( -- addr ) Get current data stack pointer
83 | ; rp@ ( -- addr ) Get current return stack pointer
84 | ; 0= ( x -- f ) -1 if top of stack is 0, 0 otherwise
85 | ; + ( x1 x2 -- n ) Add the two values at the top of the stack
86 | ; nand ( x1 x2 -- n ) NAND the two values at the top of the stack
87 | ; exit ( r:addr -- ) Resume execution at address at the top of
88 | ; the return stack
89 | defword "@",FETCH
90 | pop bx
91 | push word [bx]
92 | NEXT
93 |
94 | defword "!",STORE
95 | pop bx
96 | pop word [bx]
97 | NEXT
98 |
99 | defword "sp@",SPFETCH
100 | push sp
101 | NEXT
102 |
103 | defword "rp@",RPFETCH
104 | push bp
105 | NEXT
106 |
107 | defword "0=",ZEROEQUALS
108 | pop ax
109 | test ax,ax
110 | setnz al ; AL=0 if ZF=1, else AL=1
111 | dec ax ; AL=ff if AL=0, else AL=0
112 | cbw ; AH=AL
113 | push ax
114 | NEXT
115 |
116 | defword "+",PLUS
117 | pop bx
118 | pop ax
119 | add ax,bx
120 | push ax
121 | NEXT
122 |
123 | defword "nand",NAND
124 | pop bx
125 | pop ax
126 | and ax,bx
127 | not ax
128 | push ax
129 | NEXT
130 |
131 | defword "exit",EXIT
132 | xchg sp,bp ; swap SP and BP, SP controls return stack
133 | pop si ; pop address to next word
134 | xchg sp,bp ; restore SP and BP
135 | NEXT
136 |
137 | ; Besides primitives, a few variables are exposed to Forth code:
138 | ; TIB, STATE, >IN, HERE, and LATEST. With sectorforth's >IN being
139 | ; both an offset and a pointer into TIB (as TIB starts at 0x0000),
140 | ; TIB could be left out. But it is exposed so that sectorforth
141 | ; code that accesses the parse area can be written in an idiomatic
142 | ; fashion (e.g. TIB >IN @ +).
143 | defword "tib",TIBVAR
144 | push word TIB
145 | NEXT
146 |
147 | defword "state",STATEVAR
148 | push word STATE
149 | NEXT
150 |
151 | defword ">in",TOINVAR
152 | push word TOIN
153 | NEXT
154 |
155 | ; Strategically define next here so most jumps to it are short,
156 | ; saving extra bytes that would be taken by near jumps.
157 | next:
158 | lodsw ; load next word's address into AX
159 | jmp ax ; jump directly to it
160 |
161 | ; Words and data space for the HERE and LATEST variables.
162 | defword "here",HEREVAR
163 | push word HERE
164 | NEXT
165 | HERE: dw start_HERE
166 |
167 | defword "latest",LATESTVAR
168 | push word LATEST
169 | NEXT
170 | LATEST: dw word_SEMICOLON ; initialized to last word in dictionary
171 |
172 | ; Define a couple of I/O primitives to make things interactive.
173 | ; They can also be used to build a richer interpreter loop.
174 | ;
175 | ; KEY waits for a key press and pushes its scan code (AH) and
176 | ; ASCII character (AL) to the stack, both in a single cell.
177 | defword "key",KEY
178 | mov ah,0
179 | int 0x16
180 | push ax
181 | NEXT
182 |
183 | ; EMIT writes to the screen the ASCII character corresponding to
184 | ; the lowest 8 bits of the value at the top of the stack.
185 | defword "emit",EMIT
186 | pop ax
187 | call writechar
188 | NEXT
189 |
190 | ; The colon compiler reads a name from the terminal input buffer,
191 | ; creates a dictionary entry for it, writes machine code to jump
192 | ; to DOCOL, updates LATEST and HERE, and switches to compilation
193 | ; state.
194 | defword ":",COLON
195 | call token ; parse word from input
196 | push si
197 | mov si,di ; set parsed word as string copy source
198 | mov di,[HERE] ; set current value of HERE as destination
199 | mov ax,[LATEST] ; get pointer to latest defined word
200 | mov [LATEST],di ; update LATEST to new word being defined
201 | stosw ; link pointer
202 | mov al,cl
203 | or al,F_HIDDEN ; hide new word while it's being defined
204 | stosb ; word length
205 | rep movsb ; word name
206 | mov ax,0x26ff
207 | stosw ; compile near jump, absolute indirect...
208 | mov ax,DOCOL.addr
209 | stosw ; ...to DOCOL
210 | mov [HERE],di ; update HERE to next free position
211 | mov byte [STATE],1 ; switch to compilation state
212 | pop si
213 | NEXT
214 |
215 | ; DOCOL sets up and starts execution of a user-defined words.
216 | ; Those differ from words defined in machine code by being
217 | ; sequences of addresses to other words, so a bit of code is
218 | ; needed to save the current value of SI (this Forth's instruction
219 | ; pointer), and point it to the sequence of addresses that makes
220 | ; up a word's body.
221 | ;
222 | ; DOCOL advances AX 4 bytes, and then moves that value to SI. When
223 | ; DOCOL is jumped to, AX points to the code field of the word
224 | ; about to be executed. The 4 bytes being skipped are the actual
225 | ; jump instruction to DOCOL itself, inserted by the colon compiler
226 | ; when it creates a new entry in the dictionary.
227 | DOCOL:
228 | xchg sp,bp ; swap SP and BP, SP controls return stack
229 | push si ; push current "instruction pointer"
230 | xchg sp,bp ; restore SP and BP
231 | add ax,4 ; skip word's code field
232 | mov si,ax ; point "instruction pointer" to word body
233 | NEXT ; start executing the word
234 |
235 | ; The jump instruction inserted by the compiler is an indirect
236 | ; jump, so it needs to read the location to jump to from another
237 | ; memory location.
238 | .addr: dw DOCOL
239 |
240 | ; Semicolon is the only immediate primitive. It writes the address
241 | ; of EXIT to the end of a new word definition, makes the word
242 | ; visible in the dictionary, and switches back to interpretation
243 | ; state.
244 | defword ";",SEMICOLON,F_IMMEDIATE
245 | mov bx,[LATEST]
246 | and byte [bx+2],~F_HIDDEN ; reveal new word
247 | mov byte [STATE],0 ; switch to interpretation state
248 | mov ax,EXIT ; prepare to compile EXIT
249 | compile:
250 | mov di,[HERE]
251 | stosw ; compile contents of AX to HERE
252 | mov [HERE],di ; advance HERE to next cell
253 | NEXT
254 |
255 | ; Execution starts here.
256 | start:
257 | cld ; clear direction flag
258 |
259 | ; Set up segment registers to point to the same segment as CS.
260 | push cs
261 | push cs
262 | push cs
263 | pop ds
264 | pop es
265 | pop ss
266 |
267 | ; Skip error signaling on initialization
268 | jmp init
269 |
270 | ; Display a red '!!' to let the user know an error happened and the
271 | ; interpreter is being reset
272 | error:
273 | mov ax,0x0921 ; write '!'
274 | mov bx,0x0004 ; black background, red text
275 | mov cx,2 ; twice
276 | int 0x10
277 |
278 | ; Initialize stack pointers, state, and terminal input buffer.
279 | init:
280 | mov bp,RP0 ; BP is the return stack pointer
281 | mov sp,SP0 ; SP is the data stack pointer
282 |
283 | ; Fill TIB with zeros, and set STATE and >IN to 0
284 | mov al,0
285 | mov cx,STATE+4
286 | mov di,TIB
287 | rep stosb
288 |
289 | ; Enter the interpreter loop.
290 | ;
291 | ; Words are read one at time and searched for in the dictionary.
292 | ; If a word is found in the dictionary, it is either interpreted
293 | ; (i.e. executed) or compiled, depending on the current state and
294 | ; the word's IMMEDIATE flag.
295 | ;
296 | ; When a word is not found, the state of the interpreter is reset:
297 | ; the data and return stacks are cleared as well as the terminal
298 | ; input buffer, and the interpreter goes into interpretation mode.
299 | interpreter:
300 | call token ; parse word from input
301 | mov bx,[LATEST] ; start searching for it in the dictionary
302 | .1: test bx,bx ; zero?
303 | jz error ; not found, reset interpreter state
304 | mov si,bx
305 | lodsw ; skip link
306 | lodsb ; read flags+length
307 | mov dl,al ; save those for later use
308 | test al,F_HIDDEN ; entry hidden?
309 | jnz .2 ; if so, skip it
310 | and al,LENMASK ; mask out flags
311 | cmp al,cl ; same length?
312 | jne .2 ; if not, skip entry
313 | push cx
314 | push di
315 | repe cmpsb ; compare strings
316 | pop di
317 | pop cx
318 | je .3 ; if equal, search is over
319 | .2: mov bx,[bx] ; skip to next entry
320 | jmp .1 ; try again
321 | .3: mov ax,si ; after comparison, SI points to code field
322 | mov si,.loop ; set SI so NEXT loops back to interpreter
323 | ; Decide whether to interpret or compile the word. The IMMEDIATE
324 | ; flag is located in the most significant bit of the flags+length
325 | ; byte. STATE can only be 0 or 1. When ORing those two, these are
326 | ; the possibilities:
327 | ;
328 | ; IMMEDIATE STATE OR ACTION
329 | ; 0000000 0000000 00000000 Interpret
330 | ; 0000000 0000001 00000001 Compile
331 | ; 1000000 0000000 10000000 Interpret
332 | ; 1000000 0000001 10000001 Interpret
333 | ;
334 | ; A word is only compiled when the result of that OR is 1.
335 | ; Decrementing that result sets the zero flag for a conditional
336 | ; jump.
337 | and dl,F_IMMEDIATE ; isolate IMMEDIATE flag
338 | or dl,[STATE] ; OR with state
339 | dec dl ; decrement
340 | jz compile ; if result is zero, compile
341 | jmp ax ; otherwise, interpret (execute) the word
342 | .loop: dw interpreter
343 |
344 | ; Parse a word from the terminal input buffer and return its
345 | ; address and length in DI and CX, respectively.
346 | ;
347 | ; If after skipping spaces a 0 is found, more input is read from
348 | ; the keyboard into the terminal input buffer until return is
349 | ; pressed, at which point execution jumps back to the beginning of
350 | ; token so it can attempt to parse a word again.
351 | ;
352 | ; Before reading input from the keyboard, a CRLF is emitted so
353 | ; the user can enter input on a fresh, blank line on the screen.
354 | token:
355 | mov di,[TOIN] ; starting at the current position in TIB
356 | mov cx,-1 ; search "indefinitely"
357 | mov al,32 ; for a character that's not a space
358 | repe scasb
359 | dec di ; result is one byte past found character
360 | cmp byte [di],0 ; found a 0?
361 | je .readline ; if so, read more input
362 | mov cx,-1 ; search "indefinitely" again
363 | repne scasb ; this time, for a space
364 | dec di ; adjust DI again
365 | mov [TOIN],di ; update current position in TIB
366 | not cx ; after ones' complement, CX=length+1
367 | dec cx ; adjust CX to correct length
368 | sub di,cx ; point to start of parsed word
369 | ret
370 | .readline:
371 | mov al,13
372 | call writechar ; CR
373 | mov al,10
374 | call writechar ; LF
375 | mov di,TIB ; read into TIB
376 | .1: mov ah,0 ; wait until a key is pressed
377 | int 0x16
378 | cmp al,13 ; return pressed?
379 | je .3 ; if so, finish reading
380 | cmp al,8 ; backspace pressed?
381 | je .2 ; if so, erase character
382 | call writechar ; otherwise, write character to screen
383 | stosb ; store character in TIB
384 | jmp .1 ; keep reading
385 | .2: cmp di,TIB ; start of TIB?
386 | je .1 ; if so, there's nothing to erase
387 | dec di ; erase character in TIB
388 | call writechar ; move cursor back one character
389 | mov ax,0x0a20 ; erase without moving cursor
390 | mov cx,1
391 | int 0x10 ; (BH already set to 0 by writechar)
392 | jmp .1 ; keep reading
393 | .3: mov ax,0x0020
394 | stosw ; put final delimiter and 0 in TIB
395 | call writechar ; write a space between user input and
396 | ; execution output
397 | mov word [TOIN],0 ; point >IN to start of TIB
398 | jmp token ; try parsing a word again
399 |
400 | ; writechar writes a character to the screen. It uses INT 10/AH=0e
401 | ; to perform teletype output, writing the character, updating the
402 | ; cursor, and scrolling the screen, all in one go. Writing
403 | ; backspace using the BIOS only moves the cursor backwards within
404 | ; a line, but does not move it back to the previous line.
405 | ; writechar addresses that.
406 | writechar:
407 | push ax ; INT 10h/AH=03h clobbers AX in some BIOSes
408 | mov bh,0 ; video page 0 for all BIOS calls
409 | mov ah,3 ; get cursor position (DH=row, DL=column)
410 | int 0x10
411 | pop ax ; restore AX
412 | mov ah,0x0e ; teletype output
413 | mov bl,0x7 ; black background, light grey text
414 | int 0x10
415 | cmp al,8 ; backspace?
416 | jne .1 ; if not, nothing else to do
417 | test dl,dl ; was cursor in first column?
418 | jnz .1 ; if not, nothing else to do
419 | mov ah,2 ; move cursor
420 | mov dl,79 ; to last column
421 | dec dh ; of previous row
422 | int 0x10
423 | .1: ret
424 |
425 | times 510-($-$$) db 0
426 | db 0x55, 0xaa
427 |
428 | ; New dictionary entries will be written starting here.
429 | start_HERE:
430 |
--------------------------------------------------------------------------------