├── .gitignore
├── Makefile
├── README.md
├── cavoasm.py
├── cavosim.c
├── count.s
├── differences.s
├── iloop.s
└── method-of-differences.c


/.gitignore:
--------------------------------------------------------------------------------
1 | *~
2 | count.image
3 | differences.image
4 | iloop.image
5 | method-of-differences
6 | cavosim
7 | 


--------------------------------------------------------------------------------
/Makefile:
--------------------------------------------------------------------------------
 1 | CFLAGS=-g
 2 | 
 3 | all: count.image iloop.image differences.image test
 4 | clean:
 5 | 	$(RM) count.image iloop.image differences.image method-of-differences cavosim
 6 | 
 7 | %.image: %.s cavoasm.py
 8 | 	./cavoasm.py < $< > $@
 9 | 
10 | test: method-of-differences cavosim differences.image
11 | 	-./method-of-differences | head -11 # it outputs the initial a5
12 | 	-./cavosim differences.image | grep 'mem\[8]' | head # it doesn't
13 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | Calculus Vaporis CPU Design
  2 | ===========================
  3 | 
  4 | This is a sketch of a design for a very small zero-operand 12-bit CPU
  5 | in about 1000 NAND gates, or a smaller number of more powerful
  6 | components (e.g. 70 lines of C).  It’s my first CPU design, so it may
  7 | be deeply flawed.
  8 | 
  9 | Initially I called it the “Dumbass CPU”, but I thought that didn’t
 10 | seem like a name Charles Babbage would have used.  So I called it
 11 | “Calculus Vaporis”, which means “counting-stone of steam” in Latin, I
 12 | hope.
 13 | 
 14 | This directory contains a simulator written in C, a few simple
 15 | programs in a simple assembly language, and an assembler written in
 16 | Python.  Run `make` to try them out.
 17 | 
 18 | Logical Organization
 19 | --------------------
 20 | 
 21 | The machine has four 12-bit registers: P, A, X, and I.
 22 | 
 23 | P is the program counter.  It generally holds the address of the next
 24 | instruction to fetch.  It could actually be only 11 bits.
 25 | 
 26 | A is the top-of-stack register.
 27 | 
 28 | X is the second-on-stack register.
 29 | 
 30 | I is the instruction-decoding register.  It holds the instruction word
 31 | currently being decoded.
 32 | 
 33 | The machine additionally has an 11-bit address bus B, a 12-bit data
 34 | bus D, and a bus operation O, which are treated as pseudo-registers
 35 | for the purpose of the RTL description. The narrow memory bus is weird
 36 | but simplifies the usage of immediate constants.
 37 | 
 38 | Instruction Set
 39 | ---------------
 40 | 
 41 | The machine cycle has four microcycles: fetch 1, fetch 2, execute 1,
 42 | execute 2.  Fetch 1 and 2 are the same for all instructions.  All
 43 | instructions but @ do nothing during execute 2.  The other RTL below
 44 | explains what happens during execute 1.
 45 | 
 46 | Following Python, I am using `R[-1]` to refer to the highest bit of
 47 | register `R`, `R[:-1]` to refer to all but the highest bit, `R[-2]` to
 48 | refer to the next-highest bit, and so on.
 49 | 
 50 | There are seven instructions: `$`, `.`, `-`, `|`, `@`, `!`, and `nop`.
 51 | They are stored one per machine word, which is grossly wasteful of
 52 | memory space but makes for a simpler instruction decode cycle.
 53 | Avoiding the gross waste of memory space would probably require adding
 54 | at least one more register to the processor, because as it is defined
 55 | now, roughly half the instructions in any code are `$`, which needs a
 56 | whole machine word most of the time anyway.
 57 | 
 58 | There are even simpler instruction sets, such as the subtract-
 59 | indirect-and-branch-indirect-if-negative OISC, and the MOV machine.
 60 | While these instruction sets are simpler to explain, the existing
 61 | decoding logic is fairly small, about 20 out of the 1000 NAND gates.
 62 | These simpler instruction sets call for a more complicated machine
 63 | cycle.
 64 | 
 65 | - `$` is the “load immediate” instruction; it’s represented by having a
 66 |     single high bit set in a 12-bit word.  When written in code
 67 |     sequences, it is followed by the value of the other 11 bits.  It
 68 |     does:
 69 | 
 70 |     >     X ← A, A[:-1] ← I[:-1], A[-1] ← I[-2]
 71 | 
 72 |     Note that this is the only way to change the value of X, by copying
 73 |     A’s old value into it.
 74 | 
 75 | - `.` is the “conditional call/jump” instruction.  It does:
 76 | 
 77 |     >     if X[-1] == 0: 
 78 |     >         P ← A[:-1], A[:-1] ← P, A[-1] ← 0;
 79 |     >     else:
 80 |     >         A ← X;  # not very important, kind of arbitrary, maybe drop it
 81 | 
 82 | - `-` is the “subtract” instruction.  It does:
 83 | 
 84 |     >     A ← X - A;
 85 | 
 86 | - `|` is the “NAND” instruction (named after the Sheffer stroke).  In C
 87 |     notation for bitwise operators, it does:
 88 | 
 89 |     >     A ← ~(A & X);
 90 | 
 91 | - `@` is the “fetch” instruction.  It is the only instruction that needs
 92 |     both execute microcycles.  During execute 1, it does:
 93 | 
 94 |     >     B ← A[:-1], O ← “read”;
 95 | 
 96 |     Then, during execute 2:
 97 | 
 98 |     >     A ← D;
 99 | 
100 | - `!` is the “store” instruction.  It does:
101 | 
102 |     >     B ← A[:-1], D ← X, O ← “write”, A ← X;
103 | 
104 | - `nop` does nothing.
105 | 
106 | Code Snippets
107 | -------------
108 | 
109 | These show that the instruction set is usable, just barely.
110 | 
111 | Call the subroutine at address `x`, passing the return address in A:
112 | 
113 |     $0 $x .
114 | 
115 | Store the return address passed in A at a fixed address `ra` (for
116 | non-reentrant subroutines):
117 | 
118 |     $ra !
119 | 
120 | Return to the address thus passed, trashing both A and X:
121 | 
122 |     $0 $ra @ .
123 | 
124 | Decrement the memory location `sp`, used as, for example, a stack
125 | pointer:
126 | 
127 |     $sp @ $1 - $sp !
128 | 
129 | Store a return address at the location pointed to by `sp` after
130 | decrementing `sp`, using an address `tmp` for temporary storage:
131 | 
132 |     $tmp !  $sp @ $1 - $sp !  $tmp @ $sp !
133 | 
134 | Fetch that return address from the stack, increment `sp`, and return
135 | to the address:
136 | 
137 |     $sp @ @ $tmp !  $sp @ $-1 - $sp !  $0 $tmp @ .
138 | 
139 | Negate the value stored at a memory location `var`:
140 | 
141 |     $0 $var @ - $var !
142 | 
143 | Add the values stored at memory locations `a` and `b` with the aid of
144 | a third temporary location `tmp`, leaving the sum in the A register:
145 | 
146 |     $0 $a @ - $tmp ! $b @ $tmp @ -
147 | 
148 | The rest of the RTL
149 | -------------------
150 | 
151 | The instruction definitions above define what happens at the RTL level
152 | during the instruction execution microcycles.  The RTL for the other
153 | two microcycles of the machine cycle follow:
154 | 
155 | Fetch 1:
156 | 
157 |     > B ← P, P ← P + 1, O ← “read”;
158 | 
159 | Fetch 2:
160 | 
161 |     > I ← D;
162 | 
163 | The presumed memory interface semantics are:
164 | 
165 | When O ← “read”, on the next microcycle:
166 | 
167 |     > D ← M[B];
168 | 
169 | When O ← “write”:
170 | 
171 |     >  M[B] ← D.
172 | 
173 | I don’t know how enough about memory to know how realistic that is.
174 | 
175 | Translating the RTL design into gates
176 | -------------------------------------
177 | 
178 | (This part contains a number of errors.  Hopefully it’s accurate
179 | enough that the number of gates can be meaningfully estimated.)
180 | 
181 | We need a multiplexer attached to the input of most of the registers
182 | to implement the RTL described earlier.  Here are the places each
183 | thing can come from, and when:
184 | 
185 | <table>
186 |  <tr> <th> register <th> is set from   <th> when
187 |  <tr> <td> A <td> I[:-1] sign-extended <td> ($, execute 1)
188 |  <tr> <td>   <td> P                    <td> (., execute 1, if X[-1] == 0)
189 |  <tr> <td>   <td> X - A                <td> (-, execute 1)
190 |  <tr> <td>   <td> !(X & A)             <td> (|, execute 1)
191 |  <tr> <td>   <td>       D              <td> (@, execute 2)
192 |  <tr> <td>   <td>       X              <td> (!, execute 1; or ., execute 1, when X[-1] == 1)
193 |  <tr> <td> X <td> A                    <td> ($, execute 1)
194 |  <tr> <td> P <td> P + 1                <td> (fetch 1)
195 |  <tr> <td>   <td> A                    <td> (., execute 1, if X[-1] == 0)
196 |  <tr> <td> I <td> D                    <td> (fetch 2)
197 |  <tr> <td> B <td> P                    <td> (fetch 1)
198 |  <tr> <td>   <td> A[:-1]               <td> (@ or !, execute 1)
199 |  <tr> <td> D <td> X                    <td> (!, execute 1)
200 |  <tr> <td> O <td> “read”               <td> (fetch 1, or @, execute 1)
201 |  <tr> <td>   <td> "write"              <td> (!, execute 1)
202 | </table>
203 | 
204 | At all other times, registers continue with their current values.
205 | 
206 | The overall design, then, looks something like this.  Some of the
207 | “wires” are N bits wide:
208 | 
209 |     machine():
210 |         register(P_output, P_input, P_write_enable)
211 |         register(A_output, A_input, A_write_enable)
212 |         register(X_output, X_input, X_write_enable)
213 |         register(I_output, I_input, I_write_enable)
214 |         instruction_decoder(I_output[-4:], fetch_enable, instruction_select)
215 |         # everything up to execute_1 is the inputs; everything after
216 |         # that is an output
217 |         execute_1_controller(instruction_select, fetch_enable, X_output[-1],
218 |                              execute_1, A_select, jump, memory_write, 
219 |                              send_A_to_B, X_write_enable, D_write_enable)
220 | 
221 |         # All of these are outputs                             
222 |         microcycle_counter(execute_1, execute_2, fetch_1, fetch_2)
223 |         fetch_2 = I_write_enable  # that is, they’re two names for the same wire
224 |         # the last argument is the output
225 |         AND(fetch_enable, execute_2, get_A_from_D)
226 |         OR_6input(get_A_from_D, ..., A_write_enable)
227 |         A_input_mux(A_select, get_A_from_D, 
228 |                     I_sign_extended, P_output, X_minus_A, X_nand_A, D, X_output, 
229 |                     A_input)
230 | 
231 |         increment_P = fetch_1
232 |         P_input_controller(jump, increment_P, A_output, B_output, P_input)
233 |         OR(fetch_1, ???, memory_read)
234 | 
235 |         I_sign_extended = I[-2] || I[:-1]
236 | 
237 | Most of these pieces are simple multiplexers or registers of one sort
238 | or another.  The `microcycle_counter` is just a 4-bit ring counter;
239 | the `instruction_decoder` is just a 6-way decoder of its 4-bit input.
240 | However, the `execute_1_controller` requires a little more
241 | clarification.
242 | 
243 |     execute_1_controller(instruction_select, instruction_is_fetch, X_highbit,
244 |                          execute_1, A_select, jump, memory_write, 
245 |                          send_A_to_B, X_write_enable, D_write_enable):
246 |         # The A_select output is 5 bits; it’s used to determine where
247 |         # the A register gets read from if it gets read during the
248 |         # execute_1 microcycle.  We don’t care what value it has the
249 |         # rest of the time.
250 |         A_select = (get_A_from_I_sign_extended, get_A_from_P,
251 |                     get_A_from_X_minus_A, get_A_from_X_nand_A,
252 |                     get_A_from_X)
253 |         # The instruction_select input is also 5 bits, representing
254 |         # the five instructions that can affect A, other than fetch.
255 |         instruction_select = (instruction_is_immediate, instruction_is_jump, 
256 |                               instruction_is_subtract, instruction_is_nand,
257 |                               instruction_is_store)
258 | 
259 |         # Here’s how those outputs are computed.                                        
260 |         get_A_from_I_sign_extended = instruction_is_immediate
261 |         NOT(X_highbit, X_not_highbit)
262 |         AND(instruction_is_jump, X_not_highbit, get_A_from_P)
263 |         get_A_from_X_minus_A = instruction_is_subtract
264 |         get_A_from_X_nand_A = instruction_is_nand
265 |         AND(instruction_is_jump, X_highbit, failed_jump)
266 |         OR(failed_jump, instruction_is_store, get_A_from_X)
267 | 
268 |         jump = get_A_from_P
269 |         AND(execute_1, instruction_is_store, memory_write)
270 | 
271 |         OR(instruction_is_store, instruction_is_fetch, memory_access)
272 |         AND(execute_1, memory_access, send_A_to_B)
273 | 
274 |         AND(execute_1, instruction_is_immediate, X_write_enable)
275 |         
276 |         AND(execute_1, instruction_is_store, D_write_enable)
277 | 
278 | The ALU simply has to compute `X_minus_A` and `X_nand_A`, which
279 | constantly feed into the `A_input_mux`.  `X_minus_A` requires 12 full
280 | subtractors (analogous to full adders) and `X_nand_A` requires 12 NAND
281 | gates.
282 | 
283 | Probable errors:
284 | 
285 | - I wasn’t clear who was responsible for setting the write enables on
286 |   the various registers.
287 | - Some of the `execute_1_controller` outputs probably need to be zero
288 |   when `execute_1` is zero.
289 | - I don’t know anything about memory interfaces and so the memory
290 |   controller is omitted entirely.
291 | 
292 | Size Estimate
293 | -------------
294 | 
295 | My initial estimates for number of 2-input NAND gates were:
296 | 
297 | <table><tr><td>                    <th> bit-serial <th> 12-bit parallel
298 |        <tr><th> microcycle counter <td> 28 NANDs   <td> 28
299 |        <tr><th> rest of control    <td> 127        <td> 578
300 |        <tr><th> registers          <td> 384        <td> 260 (no shifting needed)
301 |        <tr><th> subtractor         <td> 25         <td> 204
302 |        <tr><th> NAND               <td> 1          <td> 12
303 |        <tr><th> bit counter        <td> 70         <td> 0
304 |        <tr><th> total              <td> 635        <td> 1082
305 | </table>
306 | 
307 | I was estimating a 5-gate D latch per bit in the parallel case, or an
308 | 8-gate-per-bit master-slave D flip-flop per bit in the serial case.
309 | My microcycle counter was going to be a pair of D flip-flops, four AND
310 | gates to compute the output bits, and an OR on two of those outputs to
311 | compute the new high bit.
312 | 
313 | I figured on a ripple-carry subtractor that would cost about 17 NAND
314 | gates per output bit, although I didn’t actually design one.
315 | 
316 | Because of the relative paucity of N-bit-wide data paths, going
317 | bit-serial doesn’t actually save many gates, but it would slow the
318 | machine down by a substantial factor.
319 | 
320 | Possible Improvements
321 | ---------------------
322 | 
323 | Dropping the get-X-from-A path on skipped jumps would simplify the
324 | processor, probably without making it any harder to use.
325 | 
326 | A third register (or more) on the stack wouldn’t affect the
327 | instruction set at all, but would simplify some code.  For example,
328 | the code for “add the values stored at memory locations `a` and `b`,
329 | leaving the sum in the A register” would simplify from:
330 | 
331 |     $0 $a @ - $tmp ! $b @ $tmp @ -
332 | 
333 | to:
334 | 
335 |     $a @ $0 $b @ - -
336 | 
337 | Using one-hot encodings of the instructions would require using seven
338 | bits of the I register instead of four, but would almost eliminate the
339 | instruction decoder.  (You'd still need to ensure that the instruction
340 | wasn’t $.)
341 | 
342 | Alternatively, you could pack three, four, or five instructions into
343 | each 12-bit word of memory, as Chuck Moore’s c18 core does, instead of
344 | one.  The "I" instruction, if encodable in lower-order positions,
345 | could simply sign-extend more bits, reducing the number of encodable
346 | immediate constants in such a case, but not to zero.
347 | 
348 | You could try a different bit width. 2048 instructions of code is
349 | close to a minimum to run, say, a BASIC interpreter.
350 | 
351 | If N × N → N bit LUTs and registers are available as primitives, the
352 | total complexity of the device could drop by an order of magnitude.
353 | For example, with 5 × 5 → 5 bit LUTs, you could construct a 4-bit
354 | subtractor with borrow-in and borrow-out as a single LUT, and chain
355 | three of them together to get a 12-bit subtractor.  Even if you have
356 | only 4 → 1 bit LUTs, like on a normal FPGA, you can implement a full
357 | subtractor in two LUTs attached to the same inputs, instead of 17 (or
358 | however many) discrete NAND gates.
359 | 
360 | If the stack were a little deeper, a “dup”, “swap”, or “over”
361 | instruction might help a lot with certain code sequences by
362 | reducing the number of immediate constants.
363 | 
364 | <link rel="stylesheet" href="http://canonical.org/~kragen/style.css" />
365 | 


--------------------------------------------------------------------------------
/cavoasm.py:
--------------------------------------------------------------------------------
 1 | #!/usr/bin/python
 2 | """Assembler for the Calculus Vaporis CPU."""
 3 | import sys, re
 4 | 
 5 | def words(fileobj):
 6 |     for line in fileobj:
 7 |         line = re.sub(';.*', '', line)
 8 |         for word in line.split():
 9 |             yield word
10 | 
11 | instructions = { '.': 0, '-': 1, '|': 2, '@': 3, '!': 4, 'nop': 5 }
12 | 
13 | nbits = 12
14 | 
15 | def bit(n):
16 |     return 1 << n
17 | 
18 | immediate = bit(nbits-1)
19 | 
20 | def bits(n):
21 |     return bit(n) -1
22 | 
23 | def is_integer(word):
24 |     try:
25 |         int(word)
26 |         return True
27 |     except ValueError:
28 |         return False
29 | 
30 | class Relocation:
31 |     def __init__(self, encoding, destination):
32 |         self.encoding = encoding
33 |         self.destination = destination
34 |     def resolve(self, program, resolution):
35 |         program.memory[self.destination] = self.encoding(resolution)
36 | 
37 | def encode_immediate(value):
38 |     return (value & bits(nbits-1)) | immediate
39 | 
40 | def encode_normal(value):
41 |     return value
42 | 
43 | class Program:
44 |     def __init__(self):
45 |         self.memory = []
46 |         self.labels = {}
47 |         self.backpatches = {}
48 |     def assemble_words(self, words):
49 |         for word in words:
50 |             self.assemble_word(word)
51 |     def assemble_word(self, word):
52 |         if word in instructions:
53 |             self.assemble(instructions[word] << (nbits - 4))
54 |         elif word.startswith('$'):
55 |             self.assemble_reference(encode_immediate, word[1:])
56 |         elif word.endswith(':'):
57 |             self.assemble_label(word[:-1])
58 |         else:
59 |             self.assemble_reference(encode_normal, word)
60 |     def assemble_reference(self, encoding, text):
61 |         if is_integer(text):
62 |             return self.assemble(encoding(int(text)))
63 |         elif text in self.labels:
64 |             return self.assemble(encoding(self.labels[text]))
65 |         else:
66 |             rel = Relocation(encoding, len(self.memory))
67 |             self.backpatches.setdefault(text, []).append(rel)
68 |             return self.assemble(0)
69 |     def assemble(self, number):
70 |         assert isinstance(number, int)
71 |         self.memory.append(number)
72 |     def assemble_label(self, label):
73 |         self.resolve(label, len(self.memory))
74 |     def resolve(self, label, value):
75 |         if label in self.backpatches:
76 |             for item in self.backpatches[label]:
77 |                 item.resolve(self, value)
78 |             del self.backpatches[label]
79 |         self.labels[label] = value
80 |     def warn_undefined_labels(self):
81 |         for label in self.backpatches:
82 |             sys.stderr.write('WARNING: label %r undefined\n' % label)
83 |     def dump(self, outfile):
84 |         self.warn_undefined_labels()
85 |         for number in self.memory:
86 |             outfile.write('%d\n' % number)
87 | 
88 | if __name__ == '__main__':
89 |     import cgitb
90 |     cgitb.enable(format='text')
91 |     p = Program()
92 |     p.assemble_words(words(sys.stdin))
93 |     p.dump(sys.stdout)
94 | 


--------------------------------------------------------------------------------
/cavosim.c:
--------------------------------------------------------------------------------
 1 | #include <arpa/inet.h>
 2 | #include <stdlib.h>
 3 | #include <stdio.h>
 4 | 
 5 | #define BIT(n) (1 << (n))
 6 | enum { nbits = 12, immediate_mask = BIT(nbits-1) };
 7 | #define ONES(number_of_ones) (BIT(number_of_ones) - 1)
 8 | #define SIGN_EXTEND(x) ((((x) & BIT(nbits-2)) << 1) | (x & ONES(nbits-1)))
 9 | #define INSTRUCTION_MASK (ONES(nbits-1) & ~ONES(nbits-4))
10 | enum instructions { jump = 0, subtract, nand, fetch, store, nop };
11 | 
12 | typedef int cavo_word;       /* only use the bottom 12 bits */
13 | enum { memory_size = BIT(nbits-1) };
14 | cavo_word p, a, x, i, tmp, mem[memory_size]; /* CPU registers and memory */
15 | 
16 | void panic() {
17 |   perror("panicking");
18 |   abort();
19 | }
20 | 
21 | void do_store(cavo_word addr, cavo_word value) {
22 |   printf("mem[%d] ← %d\n", (int)addr, (int)value);
23 |   mem[addr] = value;
24 | }
25 | 
26 | 
27 | void go() {
28 |   for (;;) {
29 | 
30 |     /* fetch */
31 |     printf("[%d] ", p);
32 |     fflush(stdout);
33 |     i = mem[p++];
34 |     p %= memory_size;
35 | 
36 |     /* execute */
37 | 
38 |     if (i & immediate_mask) { /* $ */
39 |       x = a;
40 |       a = SIGN_EXTEND(i);
41 | 
42 |     } else {
43 |       switch ((i & INSTRUCTION_MASK) >> (nbits-4)) {
44 | 
45 |       case jump:                /* . */
46 |         if (BIT(nbits-1) & x) a = x;
47 |         else {
48 |           tmp = a;
49 |           a = p;
50 |           p = tmp % memory_size;
51 |         }
52 |         break;
53 | 
54 |       case subtract:            /* - */
55 |         a = (x - a) & ONES(nbits);
56 |         break;
57 | 
58 |       case nand:                /* | */
59 |         a = ~(a & x);
60 |         break;
61 | 
62 |       case fetch:               /* @ */
63 |         a = mem[a & ONES(nbits-1)];
64 |         break;
65 | 
66 |       case store:               /* ! */
67 |         do_store(a & ONES(nbits-1), x);
68 |         break;
69 | 
70 |       case nop:
71 |         break;
72 | 
73 |       default:
74 |         panic();
75 |       }
76 |     }
77 |   }
78 | }
79 | 
80 | int main(int argc, char **argv) {
81 |   FILE *f = fopen(argv[1], "rb");
82 |   int i;
83 |   if (!f) panic();
84 |   for (i = 0; i < memory_size; i++) {
85 |     if (EOF == fscanf(f, "%d\n", &mem[i])) break;
86 |   }
87 |   p = 0;
88 |   go();
89 |   return 0;
90 | }
91 | 


--------------------------------------------------------------------------------
/count.s:
--------------------------------------------------------------------------------
1 | 	; next simplest interesting program: counting loop
2 | me:
3 |         $counter @ $-1 - $counter ! $0 $me .
4 | counter: 127
5 | 


--------------------------------------------------------------------------------
/differences.s:
--------------------------------------------------------------------------------
 1 | ; tabulate a polynomial by the method of differences
 2 | 
 3 |         $0 $main .
 4 | a0:     1                                                                    
 5 | a1:     -1                                                                   
 6 | a2:     1                                                                    
 7 | a3:     -1                                                                   
 8 | a4:     1                                                                    
 9 | a5:     -1                                                                   
10 | tmp:    0                                                                    
11 | main:                                                                       
12 |         $0 $a5 @ - $tmp ! $a4 @ $tmp @ - $a5 !                               
13 |         $0 $a4 @ - $tmp ! $a3 @ $tmp @ - $a4 !                               
14 |         $0 $a3 @ - $tmp ! $a2 @ $tmp @ - $a3 !                               
15 |         $0 $a2 @ - $tmp ! $a1 @ $tmp @ - $a2 !                               
16 |         $0 $a1 @ - $tmp ! $a0 @ $tmp @ - $a1 !
17 |         ; $a5 @ $arg1 !                                                        
18 |         ; $0 $output .          # we don't have an output routine yet
19 |         ; Instead, ./cavosim differences.image| grep 'mem\[8]' | head 
20 |         $0 $main .                                                           
21 | 


--------------------------------------------------------------------------------
/iloop.s:
--------------------------------------------------------------------------------
1 | 	; simplest interesting program: infinite loop.
2 | me:
3 |         $0 $me .
4 | 


--------------------------------------------------------------------------------
/method-of-differences.c:
--------------------------------------------------------------------------------
 1 | #include <stdio.h>
 2 | 
 3 | /* tabulate -x³ + 2x² - x + 4 */
 4 | /* int a0 = 0, a1 = 0, a2 = -6, a3 = -2, a4 = 0, a5 = 4; */
 5 | 
 6 | /* tabulate a more boring polynomial */
 7 | int a0 = 1, a1 = -1, a2 = 1, a3 = -1, a4 = 1, a5 = -1;
 8 | 
 9 | main() {
10 |   for (;;) {
11 |     printf("%d\n", a5);
12 |     a5 += a4;
13 |     a4 += a3;
14 |     a3 += a2;
15 |     a2 += a1;
16 |     a1 += a0;
17 |   }
18 | }
19 | 


--------------------------------------------------------------------------------