12 |
--------------------------------------------------------------------------------
/LICENSE.md:
--------------------------------------------------------------------------------
1 | Licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0
2 |
3 | http://creativecommons.org/licenses/by-nc-sa/3.0/
4 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | Build your own Lisp
2 | ===================
3 |
4 | http://buildyourownlisp.com
5 |
6 | About
7 | -----
8 |
9 | This is the HTML and website code for the book of the above title.
10 |
11 | Corrections / Edits / Contributions Welcome
12 |
13 | `contact@theorangeduck.com`
14 |
15 | Book contents licensed under Creative Commons Attribution-NonCommercial-ShareAlike 3.0
16 |
17 | http://creativecommons.org/licenses/by-nc-sa/3.0/
18 |
19 | Source code licensed under BSD3
20 |
21 | https://opensource.org/license/bsd-3-clause/
22 |
23 |
24 | Running
25 | -------
26 |
27 | You can't just browse the raw HTML files of the site. The links wont work, and it wont have a proper header or footer. If you want to run this website locally, you should install Flask and run the website as follows.
28 |
29 | ```
30 | pip install Flask cachelib
31 | python lispy.py
32 | ```
33 |
34 | You can specify port via `$PORT`.
35 |
36 | ```
37 | env PORT=5000 python lispy.py
38 | ```
39 |
40 | This will serve the site locally at `http://127.0.0.1:5000/`. You can browse it from there.
41 |
--------------------------------------------------------------------------------
/chapter13_conditionals.html:
--------------------------------------------------------------------------------
1 |
Conditionals • Chapter 13
2 |
3 |
Doing it yourself
4 |
5 |
We've come quite far now. Your knowledge of C should be good enough for you to stand on your own feet a little more. If you're feeling confident, this chapter is a perfect opportunity to stretch your wings out and attempt something on your own. It is a fairly short chapter and essentially consists of adding a couple of new builtin functions to deal with comparison and ordering.
6 |
7 |
8 |
9 |
Pug • if pug is asleep then pug is cute.
10 |
11 |
12 |
If you're feeling positive, go ahead and try to implement comparison and ordering into your language now. Define some new builtin functions for greater than, less than, equal to, and all the other comparison operators we use in C. Try to define an if function that tests for some condition and then either evaluate some code, or some other code, depending on the result. Once you've finished come back and compare your work to mine. Observe the differences and decide which parts you prefer.
13 |
14 |
If you still feel uncertain don't worry. Follow along and I'll explain my approach.
15 |
16 |
17 |
Ordering
18 |
19 |
For simplicity's sake I'm going to re-use our number data type to represent the result of comparisons. I'll make a rule similar to C, to say that any number that isn't 0 evaluates to true in an if statement, while 0 always evaluates to false.
20 |
21 |
Therefore our ordering functions are a little like a simplified version of our arithmetic functions. They'll only work on numbers, and we only want them to work on two arguments.
22 |
23 |
If these error conditions are met the maths is simple. We want to return a number lval either 0 or 1 depending on the equality comparison between the two input lval. We can use C's comparison operators to do this. Like our arithmetic functions we'll make use of a single function to do all of the comparisons.
24 |
25 |
First we check the error conditions, then we compare the numbers in each of the arguments to get some result. Finally we return this result as a number value.
26 |
27 |
lval* builtin_gt(lenv* e, lval* a) {
28 | return builtin_ord(e, a, ">");
29 | }
30 |
31 |
32 |
lval* builtin_lt(lenv* e, lval* a) {
33 | return builtin_ord(e, a, "<");
34 | }
35 |
36 |
37 |
lval* builtin_ge(lenv* e, lval* a) {
38 | return builtin_ord(e, a, ">=");
39 | }
40 |
41 |
42 |
lval* builtin_le(lenv* e, lval* a) {
43 | return builtin_ord(e, a, "<=");
44 | }
45 |
46 |
47 |
lval* builtin_ord(lenv* e, lval* a, char* op) {
48 | LASSERT_NUM(op, a, 2);
49 | LASSERT_TYPE(op, a, 0, LVAL_NUM);
50 | LASSERT_TYPE(op, a, 1, LVAL_NUM);
51 |
52 | int r;
53 | if (strcmp(op, ">") == 0) {
54 | r = (a->cell[0]->num > a->cell[1]->num);
55 | }
56 | if (strcmp(op, "<") == 0) {
57 | r = (a->cell[0]->num < a->cell[1]->num);
58 | }
59 | if (strcmp(op, ">=") == 0) {
60 | r = (a->cell[0]->num >= a->cell[1]->num);
61 | }
62 | if (strcmp(op, "<=") == 0) {
63 | r = (a->cell[0]->num <= a->cell[1]->num);
64 | }
65 | lval_del(a);
66 | return lval_num(r);
67 | }
68 |
69 |
70 |
71 |
Equality
72 |
73 |
Equality is going to be different to ordering because we want it to work on more than number types. It will be useful to see if an input is equal to an empty list, or to see if two functions passed in are the same. Therefore we need to define a function which can test for equality between two different types of lval.
74 |
75 |
This function essentially checks that all the fields which make up the data for a particular lval type are equal. If all the fields are equal, the whole thing is considered equal. Otherwise if there are any differences the whole thing is considered unequal.
76 |
77 |
int lval_eq(lval* x, lval* y) {
78 |
79 | /* Different Types are always unequal */
80 | if (x->type != y->type) { return 0; }
81 |
82 | /* Compare Based upon type */
83 | switch (x->type) {
84 | /* Compare Number Value */
85 | case LVAL_NUM: return (x->num == y->num);
86 |
87 | /* Compare String Values */
88 | case LVAL_ERR: return (strcmp(x->err, y->err) == 0);
89 | case LVAL_SYM: return (strcmp(x->sym, y->sym) == 0);
90 |
91 | /* If builtin compare, otherwise compare formals and body */
92 | case LVAL_FUN:
93 | if (x->builtin || y->builtin) {
94 | return x->builtin == y->builtin;
95 | } else {
96 | return lval_eq(x->formals, y->formals)
97 | && lval_eq(x->body, y->body);
98 | }
99 |
100 | /* If list compare every individual element */
101 | case LVAL_QEXPR:
102 | case LVAL_SEXPR:
103 | if (x->count != y->count) { return 0; }
104 | for (int i = 0; i < x->count; i++) {
105 | /* If any element not equal then whole list not equal */
106 | if (!lval_eq(x->cell[i], y->cell[i])) { return 0; }
107 | }
108 | /* Otherwise lists must be equal */
109 | return 1;
110 | break;
111 | }
112 | return 0;
113 | }
114 |
115 |
Using this function the new builtin function for equality comparison is very simple to add. We simply ensure two arguments are input, and that they are equal. We store the result of the comparison into a new lval and return it.
116 |
117 |
lval* builtin_cmp(lenv* e, lval* a, char* op) {
118 | LASSERT_NUM(op, a, 2);
119 | int r;
120 | if (strcmp(op, "==") == 0) {
121 | r = lval_eq(a->cell[0], a->cell[1]);
122 | }
123 | if (strcmp(op, "!=") == 0) {
124 | r = !lval_eq(a->cell[0], a->cell[1]);
125 | }
126 | lval_del(a);
127 | return lval_num(r);
128 | }
129 |
130 | lval* builtin_eq(lenv* e, lval* a) {
131 | return builtin_cmp(e, a, "==");
132 | }
133 |
134 | lval* builtin_ne(lenv* e, lval* a) {
135 | return builtin_cmp(e, a, "!=");
136 | }
137 |
138 |
139 |
If Function
140 |
141 |
To make our comparison operators useful we'll need an if function. This function is a little like the ternary operation in C. Upon some condition being true it evaluates to one thing, and if the condition is false, it evaluates to another.
142 |
143 |
We can again make use of Q-Expressions to encode a computation. First we get the user to pass in the result of a comparison, then we get the user to pass in two Q-Expressions representing the code to be evaluated upon a condition being either true or false.
144 |
145 |
lval* builtin_if(lenv* e, lval* a) {
146 | LASSERT_NUM("if", a, 3);
147 | LASSERT_TYPE("if", a, 0, LVAL_NUM);
148 | LASSERT_TYPE("if", a, 1, LVAL_QEXPR);
149 | LASSERT_TYPE("if", a, 2, LVAL_QEXPR);
150 |
151 | /* Mark Both Expressions as evaluable */
152 | lval* x;
153 | a->cell[1]->type = LVAL_SEXPR;
154 | a->cell[2]->type = LVAL_SEXPR;
155 |
156 | if (a->cell[0]->num) {
157 | /* If condition is true evaluate first expression */
158 | x = lval_eval(e, lval_pop(a, 1));
159 | } else {
160 | /* Otherwise evaluate second expression */
161 | x = lval_eval(e, lval_pop(a, 2));
162 | }
163 |
164 | /* Delete argument list and return */
165 | lval_del(a);
166 | return x;
167 | }
168 |
169 |
All that remains is for us to register all of these new builtins and we are again ready to go.
By introducing conditionals we've actually made our language a lot more powerful. This is because they effectively let us implement recursive functions.
207 |
208 |
Recursive functions are those which call themselves. We've used these already in C to perform reading in and evaluation of expressions. The reason we require conditionals for these is because they let us test for the situation where we wish to terminate the recursion.
209 |
210 |
For example we can use conditionals to implement a function len which tells us the number of items in a list. If we encounter the empty list we just return 0. Otherwise we return the length of the tail of the input list, plus 1. Think about why this works. It repeatedly uses the len function until it reaches the empty list. At this point it returns 0 and adds all the other partial results together.
Just as in C, there is a pleasant symmetry to this sort of recursive function. First we do something for the empty list (the base case). Then if we get something bigger, we take off a chunk such as the head of the list, and do something to it, before combining it with the rest of the thing to which the function has been already applied.
219 |
220 |
Here is another function for reversing a list. As before it checks for the empty list, but this time it returns the empty list back. This makes sense. The reverse of the empty list is just the empty list. But if it gets something bigger than the empty list, it reverses the tail, and sticks this in front of the head.
Although we've done a lot with our Lisp, it is still some way off from a fully complete, production-strength programming language. If you tried to use it for any sufficiently large project there are a number of issues you would eventually run into and improvements you'd have to make. Solving these problems would be what would bring it more into the scope of a fully fledged programming language.
7 |
8 |
Here are some of these issues you would likely encounter, potential solutions to these problems, and some other fun ideas for other improvements. Some may take a few hundred lines of code, others a few thousand. The choice of what to tackle is up to you. If you've become fond of your language you may enjoy doing some of these projects.
9 |
10 |
11 |
Native Types
12 |
13 |
Currently our language only wraps the native C long and char* types. This is pretty limiting if you want to do any kind of useful computation. Our operations on these data types are also pretty limited. Ideally our language should wrap all of the native C types and allow for methods of manipulating them. One of the most important additions would be the ability to manipulate decimal numbers. For this you could wrap the double type and relevant operations. With more than one number type we need to make sure the arithmetic operators such as + and - work on them all, and them in combination.
14 |
15 |
Adding support for native types should be interesting for people wishing to do computation with decimal and floating-point numbers in their language.
16 |
17 |
18 |
User Defined Types
19 |
20 |
As well as adding support for native types it would be good to give users the ability to add their own new types, just like how we use structs in C. The syntax or method you use to do this would be up to you. This is a really essential part making our language usable for any reasonably sized project.
21 |
22 |
This task may be interesting to anyone who has a specific idea of how they would like to develop the language, and what they want a final design to look like.
23 |
24 |
25 |
26 |
27 |
Important List • Play! BE HAPPY and go home.
28 |
29 |
30 |
List Literal
31 |
32 |
Some lisps use square brackets [] to give a literal notation for lists of evaluated values lists. This syntactic sugar for writing something like list 100 (+ 10 20) 300. Instead it lets you write [100 (+ 10 20) 300]. In some situations this is clearly nicer, but it does use up the [] characters which could possibly be used for more interesting purposes.
33 |
34 |
This should be a simple addition for people looking to try out adding extra syntax.
35 |
36 |
37 |
Operating System Interaction
38 |
39 |
One essential part of bootstrapping a language is to give it proper abilities for opening, reading, and writing files. This means wrapping all the C functions such as fread, fwrite, fgetc, etc in Lisp equivalents. This is a fairly straight forward task, but does require writing quite a large number of wrapper functions. This is why we've not done it for our language so far.
40 |
41 |
On a similar note it would be great to give our language access to whatever operating systems calls are appropriate. We should give it the ability to change directory, list files in a directory and that sort of thing. This is an easy task but again requires a lot of wrapping of C functions. It is essential for any real practical use of this language as a scripting language.
42 |
43 |
People who wish to make use of their language for doing simple scripting tasks and string manipulation may be interested in implementing this project.
44 |
45 |
46 |
Macros
47 |
48 |
Many other Lisps allow you to write things like (def x 100) to define the value 100 to x. In our lisp this wouldn't work because it would attempt to evaluate the x to whatever value was stored as x in the environment. In other Lisps these functions are called macros, and when encountered they stop the evaluation of their arguments, and manipulate them un-evaluated. They let you write things that look like normal function calls, but actually do complex and interesting things.
49 |
50 |
These are fun thing to have in a language. They make it so you can add a little bit of magic to some of the workings. In many cases this can make syntax nicer or allow a user to not repeat themselves.
51 |
52 |
I like how our language handles things like def and if without resorting to macros, but if you dislike how it works currently, and want it to be more similar to conventional Lisps, this might be something you are interested in implementing.
53 |
54 |
55 |
Variable Hashtable
56 |
57 |
At the moment when we lookup variable names in our language we just do a linear search over all of the variables in the environment. This gets more and more inefficient the more variables we have defined.
58 |
59 |
A more efficient way to do this is to implement a Hash Table. This technique converts the variable name to an integer and uses this to index into an array of a known size to find the value associated with this symbol. This is a really important data structure in programming and will crop up everywhere because of its fantastic performance under heavy loads.
60 |
61 |
Anyone who is interested in learning more about data structures and algorithms would be smart to take a shot at implementing this data structure or one of its variations.
62 |
63 |
64 |
Pool Allocation
65 |
66 |
Our Lisp is very simple, it is not fast. Its performance is relative to some scripting languages such as Python and Ruby. Most of the performance overhead in our program comes from the fact that doing almost anything requires us to construct and destruct lval. We therefore call malloc very often. This is a slow function as it requires the operating system to do some management for us. When doing calculations there is lots of copying, allocation and deallocation of lval types.
67 |
68 |
If we wish to reduce this overhead we need to lower the number of calls to malloc. One method of doing this is to call malloc once at the beginning of the program, allocating a large pool of memory. We should then replace all our malloc calls with calls to some function that slices and dices up this memory for use in the program. This means that we are emulating some of the behaviour of the operating system, but in a faster local way. This idea is called memory pool allocation and is a common technique used in game development, and other performance sensitive applications.
69 |
70 |
This can be tricky to implement correctly, but conceptually does not need to be complex. If you want a quick method for getting large gains in performance, looking into this might interest you.
71 |
72 |
73 |
Garbage Collection
74 |
75 |
Almost all other implementations of Lisps assign variables differently to ours. They do not store a copy of a value in the environment, but actually a pointer, or reference, to it directly. Because pointers are used, rather than copies, just like in C, there is much less overhead required when using large data structures.
76 |
77 |
78 |
79 |
Garbage Collection • Pick up that can.
80 |
81 |
82 |
If we store pointers to values, rather than copies, we need to ensure that the data pointed to is not deleted before some other value tries to make use of it. We want to delete it when there are no longer any references to it. One method to do this, called Mark and Sweep, is to monitor those values that are in the environment, as well as every value that has been allocated. When a variable is put into the environment it, and everything it references, is marked. Then, when we wish to free memory, we can then iterate over every allocation, and delete any that are not marked.
83 |
84 |
This is called Garbage Collection and is an integral part to many programming languages. As with pool allocation, implementing a Garbage Collector does not need to be complicated, but it does need to be done carefully, in particularly if you wish to make it efficient. Implementing this would be essential to making this language practical for working with large amounts of data. A particularly good tutorial on implementing a garbage collector in C can be found here.
85 |
86 |
This should interest anyone who is concerned with the language's performance and wishes to change the semantics of how variables are stored and modified in the language.
87 |
88 |
89 |
Tail Call Optimisation
90 |
91 |
Our programming language uses recursion to do its looping. This is conceptually a really clever way to do it, but practically it is quite poor. Recursive functions call themselves to collect all of the partial results of a computation, and only then combine all the results together. This is a wasteful way of doing computation when partial results can be accumulated as some total over a loop. This is particularly problematic for loops that are intended to run for many, or infinite, iterations.
92 |
93 |
Some recursive functions can be automatically converted to corresponding while loops, which accumulate totals step by step, rather than altogether. This automatic conversion is called tail call optimisation and is an essential optimisation for programs that do a lot of looping using recursion.
94 |
95 |
People who are interested in compiler optimisations and the correspondences between different forms of computation might find this project interesting.
96 |
97 |
98 |
Lexical Scoping
99 |
100 |
When our language tries to lookup a variable that has been undefined it throws an error. It would be better if it could tell us which variables are undefined before evaluating the program. This would let us avoid typos and other annoying bugs. Finding these issues before the program is run is called lexical scoping, and uses the rules for variable definition to try and infer which variables are defined and which aren't at each point in the program, without doing any evaluation.
101 |
102 |
This could be a difficult task to get exactly right, but should be interesting to anyone who wants to make their programming language safer to use, and less bug-prone.
103 |
104 |
105 |
106 |
107 |
Static Electricity • A hair-raising alternative.
108 |
109 |
110 |
Static Typing
111 |
112 |
Every value in our program has an associated type with it. This we know before any evaluation has taken place. Our builtin functions also only take certain types as input. We should be able to use this information to infer the types of new user defined functions and values. We can also use this information to check that functions are being called with the correct types before we run the program. This will reduce any errors stemming from calling functions with incorrect types before evaluation. This checking is called static typing.
113 |
114 |
Type systems are a really interesting and fundamental part of computer science. They are currently the best method we know of detecting errors before running a program. Anyone interesting in programming language safety and type systems should find this project really interesting.
115 |
116 |
117 |
Conclusion
118 |
119 |
Many thanks for reading this book. I hope you've found something of interest in its pages. If you did enjoy it please tell your friends about it. If you are going to continue developing your language then best of luck and I hope you learn many more things about C, programming languages, and computer science.
120 |
121 |
Most of all I hope you've had fun building your own Lisp. Until next time!
In this book you'll learn the C programming language and at the same time learn how to build your very own programming language, a minimal Lisp, in under 1000 lines of code! We'll be using a library to do some of the initial work, so I'm cheating a bit on the line count, but the rest of the code will be completely original, and you really will create a powerful little Lisp by the end.
7 |
8 |
This book is inspired by other tutorials which go through the steps of building a programming language from scratch. I wrote this book to show that this kind of fun and creative project is a great way to learn a language, and not limited to abstract high-level languages, or experienced programmers.
9 |
10 |
Many people are keen to learn C, but have nowhere to start. Now there is no excuse. If you follow this book I can promise that, in the worst case, you'll get a cool new programming language to play with, and hopefully you'll become an experienced C programmer too!
11 |
12 |
13 |
Who this is for
14 |
15 |
This book is for anyone wanting to learn C, or who has once wondered how to build their own programming language. This book is not suitable as a first programming language book, but anyone with some minimal programming experience, in any language, should find something new and interesting inside.
16 |
17 |
18 |
19 |
Ada Lovelace • Your typical brogrammer.
20 |
21 |
22 |
I've tried to make this book as friendly as possible to beginners. I welcome beginners the most because they have so much to discover! But beginners may also find this book challenging. We will be covering many new concepts, and essentially learning two new programming languages at once.
23 |
24 |
If you look for help you may find people are not patient with you. You may find that, rather than help, they take the time to express how much they know about the subject. Experienced programmers might tell you that you are wrong. The subtext to their tone might be that you should stop now, rather than inflict your bad code on the world.
25 |
26 |
After a couple of engagements like this you may decide that you are not a programmer, or don't really like programming, or that you just don't get it. You may have thought that you once enjoyed the idea of building your own programming language, but now you have realised that it is too abstract and you don't care any more. You are now concerned with your other passions, and any insight that may have been playful, joyful or interesting will now have become an obstacle.
27 |
28 |
For this I can only apologise. Programmers can be hostile, macho, arrogant, insecure, and aggressive. There is no excuse for this behaviour. Know that I am on your side. No one gets it at first. Everyone struggles and doubts their abilities. Please don't give up or let the joy be sucked out of the creative experience. Be proud of what you create no matter what it is. People like me don't want you to stop programming. We want to hear your voice, and what you have to say.
29 |
30 |
31 |
Why learn C
32 |
33 |
C is one of the most popular and influential programming languages in the world. It is the language of choice for development on Linux, and has been used extensively in the creation of OS X and to some extent Microsoft Windows. It is used on micro-computers too. Your fridge and car probably run on it. In modern software development, the use of C may be escapable, but its legacy is not. Anyone wanting to make a career out of software development would be smart to learn C.
34 |
35 |
36 |
37 |
A fridge • Your typical C user
38 |
39 |
40 |
But C is not about software development and careers. C is about freedom. It rose to fame on the back of technologies of collaboration and freedom - Unix, Linux, and The Libre Software Movement. It personifies the idea of personal liberty within computing. It wills you to take control of the technology affecting your life.
41 |
42 |
In this day and age, when technology is more powerful than ever, this could not be more important.
43 |
44 |
The ideology of freedom is reflected in the nature of C itself. There is little C hides from you, including its warts and flaws. There is little C stops you from doing, including breaking your programs in horrible ways. When programming in C you do not stand on a path, but a plane of decision, and C dares you to decide what to do.
45 |
46 |
C is also the language of fun and learning. Before the mainstream media got hold of it we had a word for this. Hacking. The philosophy that glorifies what is fun and clever. Nothing to do with the illegal unauthorised access of other peoples' computers. Hacking is the philosophy of exploration, personal expression, pushing boundaries, and breaking the rules. It stands against hierarchy and bureaucracy. It celebrates the individual. Hacking baits you with fun, learning, and glory. Hacking is the promise that with a computer and access to the internet, you have the agency to change the world.
47 |
48 |
To want to master C is to care about what is powerful, clever, and free. To become a programmer with all the vast powers of technology at his or her fingertips and the responsibility to do something to benefit the world.
49 |
50 |
How to learn C
51 |
52 |
There is no way around the fact that C is a difficult language. It has many concepts that are unfamiliar, and it makes no attempts to help a new user. In this book I am not going to cover in detail things like the syntax of the language, or how to write loops and conditional statements.
53 |
54 |
I will, on the other hand, show you how to build a real world program in C. This approach is always more difficult for the reader, but hopefully will teach you many implicit things a traditional approach cannot. I can't guarantee that this book will make you a confident user of C. What I can promise, is that those 1000 lines of code are going to be packed with content - and you will learn something worthwhile.
55 |
56 |
This book consists of 16 short chapters. How you complete these is up to you. It may well be possible to blast through this book over a weekend, or to take it more slowly and do a chapter or two each evening over a week. It shouldn't take very long to complete, and will hopefully leave you with a taste for developing your language further.
57 |
58 |
59 |
Why build a Lisp
60 |
61 |
The language we are going to be building in this book is a Lisp. This is a family of programming languages characterised by the fact that all their computation is represented by lists. This may sound scarier than it is. Lisps are actually very easy, distinctive, and powerful languages.
62 |
63 |
64 |
65 |
Mike Tyson • Your typical Lisp user
66 |
67 |
68 |
Building a Lisp is a great project for so many reasons. It puts you in the shoes of language designers, and gives you an appreciation for the whole process of programming, from language all the way down to machine. It teaches you about functional programming, and novel ways to view computation. The final product you are rewarded with provides a template for future thoughts and developments, giving you a starting ground for trying new things. It simply isn't possible to comprehend the creativity and cleverness that goes into programming and computer science until you explore languages themselves.
69 |
70 |
The type of Lisp we'll be building is one I've invented for the purposes of this book. I've designed it for minimalism, simplicity and clarity, and I've become quite fond of it along the way. I hope you come to like it too. Conceptually, syntactically, and in implementation, this Lisp has a number of differences to other major brands of Lisp. So much so that I'm sure I will be getting e-mails from Lisp programmers telling me it isn't a Lisp because it doesn't do/have/look-like this or that.
71 |
72 |
I've not made this Lisp different to confuse beginners. I've made it different because different is good.
73 |
74 |
If you are looking to learn about the semantics and behaviours of conventional Lisps, and how to program them, this book may not be for you. What this book offers instead is new and unique concepts, self expression, creativity, and fun. Whatever your motivation, heed this disclaimer now. Not everything I say will be objectively correct or true! You will have to decide that for yourselves.
75 |
76 |
77 |
Your own Lisp
78 |
79 |
The best way to follow this book is to, as the title says, write your own Lisp. If you are feeling confident enough I want you to add your own features, modifications and changes. Your Lisp should suit you and your own philosophy. Throughout the book I'll be giving description and insight, but with it I'll be providing a lot of code. This will make it easy to follow along by copy and pasting each section into your program without really understanding. Please do not do this!.
80 |
81 |
Type out each piece of sample code yourself. This is called The Hard Way. Not because it is hard technically, but because it requires discipline. By doing things The Hard Way you will come to understand the reasoning behind what you are typing. Ideally things will click as you follow it along character by character. When reading you may have an intuition as to why it looks right, or what may be going on, but this will not always translate to a real understanding unless you do the writing yourself!
82 |
83 |
In a perfect world you would use my code as a reference - an instruction booklet and guide to building the programming language you always dreamed of. In reality this isn't practical or viable. But the base philosophy remains. If you want to change something, do it.
Before we can start programming in C we'll need to install a couple of things, and set up our environment so that we have everything we need. Because C is such a universal language this should hopefully be fairly simple. Essentially we need to install two main things. A text editor and a compiler.
12 |
13 |
14 |
Text Editor
15 |
16 |
A text editor is a program that allows you to edit text files in a way suitable for programming.
17 |
18 |
On Linux the text editor I recommend is gedit. Whatever other basic text editor comes installed with your distribution will also work well. If you are a Vim or Emacs user these are fine to use. Please don't use an IDE. It isn't required for such a small project and won't help in understanding what is going on.
19 |
20 |
On Mac a simple text editor that can be used is TextWrangler. If you have a different preference this is fine, but please don't use XCode for text editing. This is a small project and using an IDE won't help you understand what is going on.
21 |
22 |
On Windows my text editor of choice is Notepad++. If you have another preference this is fine. Please don't use Visual Studio as it does not have proper support for C programming. If you attempt to use it you will run into many problems.
23 |
24 |
25 |
Compiler
26 |
27 |
The compiler is a program that transforms the C source code into a program your computer can run. The installation process for these is different depending on what operating system you are running.
28 |
29 |
Compiling and running C programs is also going to require really basic usage of the command line. This I will not cover, so I am going to assume you have at least some familiarity with using the command line. If you are are worried about this then search for online tutorials on using it, relevant to your operating system.
30 |
31 |
On Linux you can install a compiler by downloading some packages. If you are running Ubuntu or Debian you can install everything you need with the following command sudo apt-get install build-essential. If you are running Fedora or a similar Linux variant you can use this command su -c "yum groupinstall development-tools".
32 |
33 |
On Mac you can install a compiler by downloading and installing the latest version of XCode from Apple. If you are unsure of how to do this you can search online for "installing xcode" and follow any advice shown. You will then need to install the Command Line Tools. On Mac OS X 10.9 this can be done by running the command xcode-select --install from the command line. On versions of Mac OS X prior to 10.9 this can be done by going to XCode Preferences, Downloads, and selecting Command Line Tools for Installation.
34 |
35 |
On Windows you can install a compiler by downloading and installing MinGW. Once installed you need to add the compiler and other programs to your system PATH variable. To do this follow these instructions appending the value ;C:\MinGW\bin to the variable called PATH. You can create this variable if it doesn't exist. You may need to restart cmd.exe for the changes to take effect. This will allow you to run a compiler from the command line cmd.exe. It will also install other programs which make cmd.exe act like a Unix command line.
36 |
37 |
38 |
Testing the Compiler
39 |
40 |
To test if your C compiler is installed correctly type the following into the command line.
41 |
42 |
cc --version
43 |
44 |
If you get some information about the compiler version echoed back then it should be installed correctly. You are ready to go! If you get any sort of error message about an unrecognised or not found command, then it is not ready. You may need to restart the command line or your computer for changes to take effect.
45 |
46 |
47 |
Different compiler commands.
48 |
49 |
On some systems (such as Windows) the compiler command might have a different name such as gcc. Try this if the system cannot find the cc command.
50 |
51 |
52 |
Hello World
53 |
54 |
Now that your environment is set up, start by opening your text editor and inputting the following program. Create a directory where you are going to put your work for this book, and save this file as hello_world.c. This is your first C program!
This may initially make very little sense. I'll try to explain it step by step.
71 |
72 |
In the first line we include what is called a header. This statement allows us to use the functions from stdio.h, the standard input and output library which comes included with C. One of the functions from this library is the puts function you see later on in the program.
73 |
74 |
Next we declare a function called main. This function is declared to output an int, and take as input an int called argc and a char** called argv. All C programs must contain this function. All programs start running from this function.
75 |
76 |
Inside main the puts function is called with the argument "Hello, world!". This outputs the message Hello, world! to the command line. The function puts is short for put string. The second statement inside the function is return 0;. This tells the main function to finish and return 0. When a C program returns 0 this indicates there have been no errors running the program.
77 |
78 |
79 |
Compilation
80 |
81 |
Before we can run this program we need to compile it. This will produce the actual executable we can run on our computer. Open up the command line and browse to the directory that hello_world.c is saved in. You can then compile your program using the following command.
82 |
83 |
cc -std=c99 -Wall hello_world.c -o hello_world
84 |
85 |
This compiles the code in hello_world.c, reporting any warnings, and outputs the program to a new file called hello_world. We use the -std=c99 flag to tell the compiler which version or standard of C we are programming with. This lets the compiler ensure our code is standardised, so that people with different operating systems or compilers will be able to use our code.
86 |
87 |
If successful you should see the output file in the current directory. This can be run by typing ./hello_world (or just hello_world on Windows). If everything is correct you should see a friendly Hello, world! message appear.
88 |
89 |
Congratulations! You've just compiled and run your first C program.
90 |
91 |
92 |
Errors
93 |
94 |
If there are some problems with your C program the compilation process may fail. These issues can range from simple syntax errors, to other complicated problems that are harder to understand.
95 |
96 |
Sometimes the error message from the compiler will make sense, but if you are having trouble understanding it try searching online for it. You should see if you can find a concise explanation of what it means, and work out how to correct it. Remember this: there are many people before you who have struggled with exactly the same problems.
97 |
98 |
99 |
100 |
Rage • A poor debugging technique
101 |
102 |
103 |
Sometimes there will be many compiler errors stemming from one source. Always go through compiler errors from first to last.
104 |
105 |
Sometimes the compiler will compile a program, but when you run it it will crash. Debugging C programs in this situation is hard. It can be an art far beyond the scope of this book.
106 |
107 |
If you are a beginner, the first port of call for debugging a crashing C program would be to print out lots of information as the program is running. Using this method you should try to isolate exactly what part of the code is incorrect and what, if anything, is going wrong. It is a debugging technique which is active. This is the important thing. As long as you are doing something, and not just staring at the code, the process is less painful and the temptation to give up is lessened.
108 |
109 |
For people feeling more confident a program called gdb can be used to debug your C programs. This can be difficult and complicated to use, but it is also very powerful and can give you extremely valuable information and what went wrong and where. Information on how to use gdb can be found online.
110 |
111 |
On Mac the most recent versions of OS X don't come with gdb. Instead you can use lldb which does largely the same job.
112 |
113 |
On Linux or Macvalgrind can be used to aid the debugging of memory leaks and other more nasty errors. Valgrind is a tool that can save you hours, or even days, of debugging. It does not take much to get proficient at it, so investigating it is highly recommended. Information on how to use it can be found online.
114 |
115 |
116 |
Documentation
117 |
118 |
Through this book you may come across a function in some example code that you don't recognise. You might wonder what it does. In this case you will want to look at the online documentation of the standard library. This will explain all the functions included in the standard library, what they do, and how to use them.
119 |
120 |
121 |
Reference
122 |
123 |
124 |
What is this section for?
125 |
126 |
In this section I'll link to the code I've written for this particular chapter of the book. When finishing with a chapter your code should probably look similar to mine. This code can be used for reference if the explanation has been unclear.
127 |
128 |
If you encounter a bug please do not copy and paste my code into your project. Try to track down the bug yourself and use my code as a reference to highlight what may be wrong, or where the error may lie.
129 |
130 |
131 |
132 |
133 |
Bonus Marks
134 |
135 |
136 |
What is this section for?
137 |
138 |
In this section I'll list some things to try for fun, and learning.
139 |
140 |
It is good if you can attempt to do some of these challenges. Some will be difficult and some will be much easier. For this reason don't worry if you can't figure them all out. Some might not even be possible!
141 |
142 |
Many will require some research on the internet. This is an integral part of learning a new language so should not be avoided. The ability to teach yourself things is one of the most valuable skills in programming.
143 |
144 |
145 |
146 |
147 |
› Change the Hello World! greeting given by your program to something different.
148 |
› What happens when no main function is given?
149 |
› Use the online documentation to lookup the puts function.
150 |
› Look up how to use gdb and run it with your program.
In this chapter I've prepared a quick overview of the basic features of C. There are very few features in C, and the syntax is relatively simple. But this doesn't mean it is easy. All the depth hides below the surface. Because of this we're going to cover the features and syntax fairly quickly now, and see them in greater depth as we continue.
12 |
13 |
The goal of this chapter is to get everyone on the same page. People totally new to C should therefore take some time over it, while those with some existing experience may find it easier to skim and return to later as required.
14 |
15 |
16 |
Programs
17 |
18 |
A program in C consists of only function definitions and structure definitions.
19 |
20 |
Therefore a source file is simply a list of functions and types. These functions can call each other or themselves, and can use any data types that have been declared or are built into the language.
21 |
22 |
It is possible to call functions in other libraries, or to use their data types. This is how layers of complexity are accumulated in C programming.
23 |
24 |
As we saw in the previous chapter, the execution of a C program always starts in the function called main. From here it calls more and more functions, to perform all the actions it requires.
25 |
26 |
27 |
Variables
28 |
29 |
Functions in C consist of manipulating variables. These are items of data which we give a name to.
30 |
31 |
Every variable in C has an explicit type. These types are declared by ourselves or built into the language. We can declare a new variable by writing the name of its type, followed by its name, and optionally setting it to some value using =. This declaration is a statement, and we terminate all statements in C with a semicolon ;.
32 |
33 |
To create a new int called count we could write the following...
34 |
35 |
int count;
36 |
37 |
Or to declare it and set the value...
38 |
39 |
int count = 10;
40 |
41 |
Here are some descriptions and examples of some of the built in types.
42 |
43 |
44 |
void
Empty Type
45 |
char
Single Character/Byte
char last_initial = 'H';
46 |
int
Integer
int age = 23;
47 |
long
Integer that can hold larger values
long age_of_universe = 13798000000;
48 |
float
Decimal Number
float liters_per_pint = 0.568f;
49 |
double
Decimal Number with more precision
double speed_of_swallow = 0.01072896;
50 |
51 |
52 |
53 |
Function Declarations
54 |
55 |
A function is a computation that manipulates variables, and optionally changes the state of the program. It takes as input some variables and returns some single variable as output.
56 |
57 |
To declare a function we write the type of the variable it returns, the name of the function, and then in parenthesis a list of the variables it takes as input, separated by commas. The contents of the function are put inside curly brackets {}, and lists all of the statements the function executes, terminated by semicolons ;. A return statement is used to let the function finish and output a variable.
58 |
59 |
For example a function that takes two int variables called x and y and adds them together could look like this.
60 |
61 |
int add_together(int x, int y) {
62 | int result = x + y;
63 | return result;
64 | }
65 |
66 |
We call functions by writing their name and putting the arguments to the function in parentheses, separated by commas. For example to call the above function and store the result in a variable added we would write the following.
67 |
68 |
int added = add_together(10, 18);
69 |
70 |
71 |
Structure Declarations
72 |
73 |
Structures are used to declare new types. Structures are several variables bundled together into a single package.
74 |
75 |
We can use structure to represent more complex data types. For example to represent a point in 2D space we could create a structure called point that packs together two float (decimal) values called x and y. To declare structures we can use the struct keyword in conjunction with the typedef keyword. Our declaration would look like this.
We should place this definition above any functions that wish to use it. This type is no different to the built in types, and we can use it in all the same ways. To access an individual field we use a dot ., followed by the name of the field, such as x.
A pointer is a variation on a normal type where the type name is suffixed with an asterisk. For example we could declare a pointer to an integer by writing int*. We already saw a pointer type char** argv. This is a pointer to pointers to characters, and is used as input to main function.
102 |
103 |
Pointers are used for a whole number of different things such as for strings or lists. These are a difficult part of C and will be explained in much greater detail in later chapters. We won't make use of them for a while, so for now it is good to simply know they exist, and how to spot them. Don't let them scare you off!
104 |
105 |
106 |
Strings
107 |
108 |
In C strings are represented by the pointer type char*. Under the hood they are stored as a list of characters, where the final character is a special character called the null terminator. Strings are a complicated and important part of C, which we'll learn to use effectively in the next few chapters.
109 |
110 |
Strings can also be declared literally by putting text between quotation marks. We used this in the previous chapter with our string "Hello, World!". For now, remember that if you see char*, you can read it as a string.
111 |
112 |
113 |
Conditionals
114 |
115 |
Conditional statements let the program perform some code only if certain conditions are met.
116 |
117 |
To perform code under some condition we use the if statement. This is written as if followed by some condition in parentheses, followed by the code to execute in curly brackets. An if statement can be followed by an optional else statement, followed by other statements in curly brackets. The code in these brackets will be performed in the case the conditional is false.
118 |
119 |
We can test for multiple conditions using the logical operators || for or, and && for and.
120 |
121 |
Inside a conditional statement's parentheses any value that is not 0 will evaluate to true. This is important to remember as many conditions use this to check things implicitly.
122 |
123 |
If we wished to check if an int called x was greater than 10 and less than 100, we would write the following.
124 |
125 |
if (x > 10 && x < 100) {
126 | puts("x is greater than 10 and less than 100!");
127 | } else {
128 | puts("x is less than 11 or greater than 99!");
129 | }
130 |
131 |
132 |
Loops
133 |
134 |
Loops allow for some code to be repeated until some condition becomes false, or some counter elapses.
135 |
136 |
There are two main loops in C. The first is a while loop. This loop repeatedly executes a block of code until some condition becomes false. It is written as while followed by some condition in parentheses, followed by the code to execute in curly brackets. For example a loop that counts downward from 10 to 1 could be written as follows.
137 |
138 |
int i = 10;
139 | while (i > 0) {
140 | puts("Loop Iteration");
141 | i = i - 1;
142 | }
143 |
144 |
The second kind of loop is a for loop. Rather than a condition, this loop requires three expressions separated by semicolons ;. These are an initialiser, a condition and an incrementer. The initialiser is performed before the loop starts. The condition is checked before each iteration of the loop. If it is false, the loop is exited. The incrementer is performed at the end of each iteration of the loop. These loops are often used for counting as they are more compact than the while loop.
145 |
146 |
For example to write a loop that counts up from 0 to 9 we might write the following. In this case the ++ operator increments the variable i.
147 |
148 |
for (int i = 0; i < 10; i++) {
149 | puts("Loop Iteration");
150 | }
151 |
152 |
153 |
Bonus Marks
154 |
155 |
156 |
157 |
› Use a for loop to print out Hello World! five times.
158 |
› Use a while loop to print out Hello World! five times.
159 |
› Declare a function that outputs Hello World!n number of times. Call this from main.
160 |
› What built in types are there other than the ones listed?
161 |
› What other conditional operators are there other than greater than>, and less than<?
162 |
› What other mathematical operators are there other than add+, and subtract-?
163 |
› What is the += operator, and how does it work?
164 |
› What is the do loop, and how does it work?
165 |
› What is the switch statement and how does it work?
166 |
› What is the break keyword and what does it do?
167 |
› What is the continue keyword and what does it do?
A Polish Nobleman • A typical Polish Notation user
7 |
8 |
9 |
10 |
Polish Notation
11 |
12 |
To try out mpc we're going to implement a simple grammar that resembles a mathematical subset of our Lisp. It's called Polish Notation and is a notation for arithmetic where the operator comes before the operands.
13 |
14 |
For example...
15 |
16 |
17 |
1 + 2 + 6
is
+ 1 2 6
18 |
6 + (2 * 9)
is
+ 6 (* 2 9)
19 |
(10 * 2) / (4 + 2)
is
/ (* 10 2) (+ 4 2)
20 |
21 |
22 |
23 |
We need to work out a grammar which describes this notation. We can begin by describing it textually and then later formalise our thoughts.
24 |
25 |
To start, we observe that in polish notation the operator always comes first in an expression, followed by either numbers or other expressions in parentheses. This means we can say "a program is an operator followed by one or more expressions," where "an expression is either a number, or, in parentheses, an operator followed by one or more expressions".
26 |
27 |
More formally...
28 |
29 |
30 |
Program
the start of input, an Operator, one or more Expression, and the end of input.
31 |
Expression
either a Numberor'(', an Operator, one or more Expression, and an ')'.
32 |
Operator
'+', '-', '*', or '/'.
33 |
Number
an optional -, and one or more characters between 0 and 9
34 |
35 |
36 |
37 |
Regular Expressions
38 |
39 |
We should be able to encode most of the above rules using things we know already, but Number and Program might pose some trouble. They contain a couple of constructs we've not learnt how to express yet. We don't know how to express the start or the end of input, optional characters, or range of characters.
40 |
41 |
These can be expressed, but they require something called a Regular Expression. Regular expressions are a way of writing grammars for small sections of text such as words or numbers. Grammars written using regular expressions can't consist of multiple rules, but they do give precise and concise control over what is matched and what isn't. Here are some basic rules for writing regular expressions.
42 |
43 |
44 |
.
Any character is required.
45 |
a
The character a is required.
46 |
[abcdef]
Any character in the set abcdef is required.
47 |
[a-f]
Any character in the range a to f is required.
48 |
a?
The character a is optional.
49 |
a*
Zero or more of the character a are required.
50 |
a+
One or more of the character a are required.
51 |
^
The start of input is required.
52 |
$
The end of input is required.
53 |
54 |
55 |
These are all the regular expression rules we need for now. Whole books have been written on learning regular expressions. For the curious much more information can be found online or from these sources. We will be using them in later chapters, so some basic knowledge will be required, but you won't need to master them for now.
56 |
57 |
In an mpc grammar we write regular expressions by putting them between forward slashes /. Using the above guide our Number rule can be expressed as a regular expression using the string /-?[0-9]+/.
58 |
59 |
60 |
Installing mpc
61 |
62 |
Before we work on writing this grammar we first need to include the mpc headers, and then link to the mpc library, just as we did for editline on Linux and Mac. Starting with your code from chapter 4, you can rename the file to parsing.c and download mpc.h and mpc.c from the mpc repo. Put these in the same directory as your source file.
63 |
64 |
To includempc put #include "mpc.h" at the top of the file. To link to mpc put mpc.c directly into the compile command. On Linux you will also have to link to the maths library by adding the flag -lm.
65 |
66 |
On Linux and Mac
67 |
68 |
cc -std=c99 -Wall parsing.c mpc.c -ledit -lm -o parsing
69 |
70 |
On Windows
71 |
72 |
cc -std=c99 -Wall parsing.c mpc.c -o parsing
73 |
74 |
75 |
Hold on, don't you mean #include <mpc.h>?
76 |
77 |
There are actually two ways to include files in C. One is using angular brackets <> as we've seen so far, and the other is with quotation marks "".
78 |
79 |
The only difference between the two is that using angular brackets searches the system locations for headers first, while quotation marks searches the current directory first. Because of this system headers such as <stdio.h> are typically put in angular brackets, while local headers such as "mpc.h" are typically put in quotation marks.
80 |
81 |
82 |
83 |
Polish Notation Grammar
84 |
85 |
Formalising the above rules further, and using some regular expressions, we can write a final grammar for the language of polish notation as follows. Read the below code and verify that it matches what we had written textually, and our ideas of polish notation.
We need to add this to the interactive prompt we started on in chapter 4. Put this code right at the beginning of the main function before we print the Version and Exit information. At the end of our program we also need to delete the parsers when we are done with them. Right before main returns we should place the following clean-up code.
I'm getting an error undefined reference to `mpc_lang'
111 |
112 |
That should be mpca_lang, with an a at the end!
113 |
114 |
115 |
Parsing User Input
116 |
117 |
Our new code creates a mpc parser for our Polish Notation language, but we still need to actually use it on the user input supplied each time from the prompt. We need to edit our while loop so that rather than just echoing user input back, it actually attempts to parse the input using our parser. We can do this by replacing the call to printf with the following mpc code, that makes use of our program parser Lispy.
118 |
119 |
/* Attempt to Parse the user Input */
120 | mpc_result_t r;
121 | if (mpc_parse("<stdin>", input, Lispy, &r)) {
122 | /* On Success Print the AST */
123 | mpc_ast_print(r.output);
124 | mpc_ast_delete(r.output);
125 | } else {
126 | /* Otherwise Print the Error */
127 | mpc_err_print(r.error);
128 | mpc_err_delete(r.error);
129 | }
130 |
131 |
This code calls the mpc_parse function with our parser Lispy, and the input string input. It copies the result of the parse into r and returns 1 on success and 0 on failure. We use the address of operator & on r when we pass it to the function. This operator will be explained in more detail in later chapters.
132 |
133 |
On success an internal structure is copied into r, in the field output. We can print out this structure using mpc_ast_print and delete it using mpc_ast_delete.
134 |
135 |
Otherwise there has been an error, which is copied into r in the field error. We can print it out using mpc_err_print and delete it using mpc_err_delete.
136 |
137 |
Compile these updates, and take this program for a spin. Try out different inputs and see how the system reacts. Correct behaviour should look like the following.
138 |
139 |
Lispy Version 0.0.0.0.2
140 | Press Ctrl+c to Exit
141 |
142 | lispy> + 5 (* 2 2)
143 | >
144 | regex
145 | operator|char:1:1 '+'
146 | expr|number|regex:1:3 '5'
147 | expr|>
148 | char:1:5 '('
149 | operator|char:1:6 '*'
150 | expr|number|regex:1:8 '2'
151 | expr|number|regex:1:10 '2'
152 | char:1:11 ')'
153 | regex
154 | lispy> hello
155 | <stdin>:1:1: error: expected whitespace, '+', '-', '*' or '/' at 'h'
156 | lispy> / 1dog
157 | <stdin>:1:4: error: expected one of '0123456789', whitespace, '-', one or more of one of '0123456789', '(' or end of input at 'd'
158 | lispy>
159 |
160 |
161 |
I'm getting an error <stdin>:1:1: error: Parser Undefined!.
162 |
163 |
This error is due to the syntax for your grammar supplied to mpca_lang being incorrect. See if you can work out what part of the grammar is incorrect. You can use the reference code for this chapter to help you find this, and verify how the grammar should look.
164 |
165 |
166 |
167 |
Reference
168 |
169 |
170 |
171 |
Bonus Marks
172 |
173 |
174 |
175 |
› Write a regular expression matching strings of all a or b such as aababa or bbaa.
176 |
› Write a regular expression matching strings of consecutive a and b such as ababab or aba.
177 |
› Write a regular expression matching pit, pot and respite but notpeat, spit, or part.
178 |
› Change the grammar to add a new operator such as %.
179 |
› Change the grammar to recognise operators written in textual format add, sub, mul, div.
180 |
› Change the grammar to recognize decimal numbers such as 0.01, 5.21, or 10.2.
181 |
› Change the grammar to make the operators written conventionally, between two expressions.
182 |
› Use the grammar from the previous chapter to parse Doge. You must add start and end of input.
Some of you may have noticed a problem with the previous chapter's program. Try entering this into the prompt and see what happens.
7 |
8 |
Lispy Version 0.0.0.0.3
9 | Press Ctrl+c to Exit
10 |
11 | lispy> / 10 0
12 |
13 |
Ouch. The program crashed upon trying to divide by zero. It's okay if a program crashes during development, but our final program would hopefully never crash, and should always explain to the user what went wrong.
14 |
15 |
16 |
17 |
Walter White • Heisenberg
18 |
19 |
20 |
At the moment our program can produce syntax errors but it still has no functionality for reporting errors in the evaluation of expressions. We need to build in some kind of error handling functionality to do this. It can be awkward in C, but if we start off on the right track, it will pay off later on when our system gets more complicated.
21 |
22 |
C programs crashing is a fact of life. If anything goes wrong the operating system kicks them out. Programs can crash for many different reasons, and in many different ways. You will see at least one Heisenbug.
23 |
24 |
But there is no magic in how C programs work. If you face a really troublesome bug don't give up or sit and stare at the screen till your eyes bleed. Take this chance to properly learn how to use gdb and valgrind. These will be more weapons in your tool-kit, and after the initial investment, save you a lot of time and pain.
25 |
26 |
Lisp Value
27 |
28 |
There are several ways to deal with errors in C, but in this context my preferred method is to make errors a possible result of evaluating an expression. Then we can say that, in Lispy, an expression will evaluate to either a number, or an error. For example + 1 2 will evaluate to a number, but / 10 0 will evaluate to an error.
29 |
30 |
For this we need a data structure that can act as either one thing or anything. For simplicity sake we are just going to use a struct with fields specific to each thing that can be represented, and a special field type to tell us exactly what fields are meaningful to access.
31 |
32 |
This we are going to call an lval, which stands for Lisp Value.
33 |
34 |
/* Declare New lval Struct */
35 | typedef struct {
36 | int type;
37 | long num;
38 | int err;
39 | } lval;
40 |
41 |
42 |
Enumerations
43 |
44 |
You'll notice the type of the fields type, and err, is int. This means they are represented by a single integer number.
45 |
46 |
The reason we pick int is because we will assign meaning to each integer value, to encode what we require. For example we can make a rule "If type is 0 then the structure is a Number.", or "If type is 1 then the structure is an Error." This is a simple and effective way of doing things.
47 |
48 |
But if we litter our code with stray 0 and 1 then it is going to become increasingly unclear as to what is happening. Instead we can use named constants that have been assigned these integer values. This gives the reader an indication as to why one might be comparing a number to 0 or 1 and what is meant in this context.
49 |
50 |
In C this is supported using an enum.
51 |
52 |
/* Create Enumeration of Possible lval Types */
53 | enum { LVAL_NUM, LVAL_ERR };
54 |
55 |
An enum is a declaration of variables which under the hood are automatically assigned integer constant values. Above describes how we would declare some enumerated values for the type field.
56 |
57 |
We also want to declare an enumeration for the error field. We have three error cases in our particular program. There is division by zero, an unknown operator, or being passed a number that is too large to be represented internally using a long. These can be enumerated as follows.
58 |
59 |
/* Create Enumeration of Possible Error Types */
60 | enum { LERR_DIV_ZERO, LERR_BAD_OP, LERR_BAD_NUM };
61 |
62 |
63 |
Lisp Value Functions
64 |
65 |
Our lval type is almost ready to go. Unlike the previous long type we have no current method for creating new instances of it. To do this we can declare two functions that construct an lval of either an error type or a number type.
These functions first create an lval called v, and assign the fields before returning it.
84 |
85 |
Because our lval function can now be one of two things we can no longer just use printf to output it. We will want to behave differently depending upon the type of the lval that is given. There is a concise way to do this in C using the switch statement. This takes some value as input and compares it to other known values, known as cases. When the values are equal it executes the code that follows up until the next break statement.
86 |
87 |
Using this we can build a function that can print an lval of any type like this.
88 |
89 |
/* Print an "lval" */
90 | void lval_print(lval v) {
91 | switch (v.type) {
92 | /* In the case the type is a number print it */
93 | /* Then 'break' out of the switch. */
94 | case LVAL_NUM: printf("%li", v.num); break;
95 |
96 | /* In the case the type is an error */
97 | case LVAL_ERR:
98 | /* Check what type of error it is and print it */
99 | if (v.err == LERR_DIV_ZERO) {
100 | printf("Error: Division By Zero!");
101 | }
102 | if (v.err == LERR_BAD_OP) {
103 | printf("Error: Invalid Operator!");
104 | }
105 | if (v.err == LERR_BAD_NUM) {
106 | printf("Error: Invalid Number!");
107 | }
108 | break;
109 | }
110 | }
111 |
112 | /* Print an "lval" followed by a newline */
113 | void lval_println(lval v) { lval_print(v); putchar('\n'); }
114 |
115 |
116 |
Evaluating Errors
117 |
118 |
Now that we know how to work with the lval type, we need to change our evaluation functions to use it instead of long.
119 |
120 |
As well as changing the type signatures we need to change the functions such that they work correctly upon encountering either an error as input, or a number as input.
121 |
122 |
In our eval_op function, if we encounter an error we should return it right away, and only do computation if both the arguments are numbers. We should modify our code to return an error rather than attempt to divide by zero. This will fix the crash described at the beginning of this chapter.
123 |
124 |
lval eval_op(lval x, char* op, lval y) {
125 |
126 | /* If either value is an error return it */
127 | if (x.type == LVAL_ERR) { return x; }
128 | if (y.type == LVAL_ERR) { return y; }
129 |
130 | /* Otherwise do maths on the number values */
131 | if (strcmp(op, "+") == 0) { return lval_num(x.num + y.num); }
132 | if (strcmp(op, "-") == 0) { return lval_num(x.num - y.num); }
133 | if (strcmp(op, "*") == 0) { return lval_num(x.num * y.num); }
134 | if (strcmp(op, "/") == 0) {
135 | /* If second operand is zero return error */
136 | return y.num == 0
137 | ? lval_err(LERR_DIV_ZERO)
138 | : lval_num(x.num / y.num);
139 | }
140 |
141 | return lval_err(LERR_BAD_OP);
142 | }
143 |
144 |
145 |
What is that ? doing there?
146 |
147 |
You'll notice that for division to check if the second argument is zero we use a question mark symbol ?, followed by a colon :. This is called the ternary operator, and it allows you to write conditional expressions on one line.
148 |
149 |
It works something like this. <condition> ? <then> : <else>. In other words, if the condition is true it returns what follows the ?, otherwise it returns what follows :.
150 |
151 |
Some people dislike this operator because they believe it makes code unclear. If you are unfamiliar with the ternary operator, you may initially find it awkward to use; but once you get to know it there are rarely problems.
152 |
153 |
154 |
We need to give a similar treatment to our eval function. In this case because we've defined eval_op to robustly handle errors we just need to add the error conditions to our number conversion function.
155 |
156 |
In this case we use the strtol function to convert from string to long. This allows us to check a special variable errno to ensure the conversion goes correctly. This is a more robust way to convert numbers than our previous method using atoi.
157 |
158 |
lval eval(mpc_ast_t* t) {
159 |
160 | if (strstr(t->tag, "number")) {
161 | /* Check if there is some error in conversion */
162 | errno = 0;
163 | long x = strtol(t->contents, NULL, 10);
164 | return errno != ERANGE ? lval_num(x) : lval_err(LERR_BAD_NUM);
165 | }
166 |
167 | char* op = t->children[1]->contents;
168 | lval x = eval(t->children[2]);
169 |
170 | int i = 3;
171 | while (strstr(t->children[i]->tag, "expr")) {
172 | x = eval_op(x, op, eval(t->children[i]));
173 | i++;
174 | }
175 |
176 | return x;
177 | }
178 |
179 |
The final small step is to change how we print the result found by our evaluation to use our newly defined printing function which can print any type of lval.
180 |
181 |
lval result = eval(r.output);
182 | lval_println(result);
183 | mpc_ast_delete(r.output);
184 |
185 |
And we are done! Try running this new program and make sure there are no crashes when dividing by zero.
Some of you who have gotten this far in the book may feel uncomfortable with how it is progressing. You may feel you've managed to follow instructions well enough, but don't have a clear understanding of all of the underlying mechanisms going on behind the scenes.
201 |
202 |
If this is the case I want to reassure you that you are doing well. If you don't understand the internals it's because I may not have explained everything in sufficient depth. This is okay.
203 |
204 |
To be able to progress and get code to work under these conditions is a great skill in programming, and if you've made it this far it shows you have it.
205 |
206 |
In programming we call this plumbing. Roughly speaking this is following instructions to try to tie together a bunch of libraries or components, without fully understanding how they work internally.
207 |
208 |
It requires faith and intuition. Faith is required to believe that if the stars align, and every incantation is correctly performed for this magical machine, the right thing will really happen. And intuition is required to work out what has gone wrong, and how to fix things when they don't go as planned.
209 |
210 |
Unfortunately these can't be taught directly, so if you've made it this far then you've made it over a difficult hump, and in the following chapters I promise we'll finish up with the plumbing, and actually start programming that feels fresh and wholesome.
211 |
212 |
213 |
Reference
214 |
215 |
216 |
217 |
Bonus Marks
218 |
219 |
220 |
221 |
› Run the previous chapter's code through gdb and crash it. See what happens.
222 |
› How do you give an enum a name?
223 |
› What are union data types and how do they work?
224 |
› What are the advantages over using a union instead of struct?
225 |
› Can you use a union in the definition of lval?
226 |
› Extend parsing and evaluation to support the remainder operator %.
227 |
› Extend parsing and evaluation to support decimal types using a double field.
Special thanks to my friends and family for their support, in particular to Francesca Shaw for putting up with me spending all my time on this project, and to Caroline Holden for proof reading.
Many thanks to everyone who has made their images and photos available under Creative Commons. I hope by making this book available to read online for free, I have given a small something back to the creativity and good will of the community.
42 |
43 |
All images are licensed under CC BY 2.0 unless otherwise stated.
Hello, my name is Daniel Holden. I'm from the UK, living in Montreal, and working in the games industry doing research into how Machine Learning how it can help games scale, in particular in the area of Character Animation.
13 |
14 |
You may know me from one of my other projects such as Cello or Corange. As well as hacking on C, I enjoy writing short stories, digital art, and game development.
With an already steep learning curve arrays seemed like a convenient omission to make. Teaching arrays in C is a very easy way to confuse a beginner about pointers, which are a far more important concept to learn. In C, the ways in which arrays and pointers are the same, and yet different, are subtle and numerous. Excluding fixed sized arrays, which have different behaviour altogether, pointers represent a superset of the behaviour of arrays, and so in the context of this book, teaching arrays would have been no more than teaching syntactic sugar.
22 |
23 |
Those interested in arrays are encouraged to find out more. The book Learn C the Hard Way takes the opposite approach to me, and teaches arrays first, with pointers being described as a variation. For those interested in arrays this might prove useful.
24 |
25 |
26 |
Why do you use left-handed pointer syntax?
27 |
28 |
In this book I write the syntax for pointers in a left-handed way int* x;, rather than the standard right-handed convention int *x;.
29 |
30 |
Ultimately this distinction is one of personal preference, but the vast majority of C code, as well as the C standards, are written using the right-handed style. This is clearly the default, and most correct way to write pointers, and so my choice might seem odd.
31 |
32 |
I picked the left-handed version because I believe it is easier to teach to beginners. Having the asterisk on the left hand side emphasises the type. It is clearer to read, and makes it obvious that the asterisk is not a weird operator or modification to the variable. With the omission of arrays, and multi-variable declarations, this notation is also almost entirely consistent within this book, and when not, it is noted. K&R themselves have admitted the confusion of the right-handed syntax, made worse by historical baggage and rogue compiler implementations of the early years. For a learning resource I believe picking the left-handed version was the best approach.
33 |
34 |
Once comfortable with the method behind C's declaration syntax, I encourage programmers to migrate toward the right-handed version.
35 |
36 |
37 |
Why are there no Macros in this Lisp?
38 |
39 |
By far the biggest gripe conventional Lisp programmers have with the Lisp in this book is the lack of Macros. Instead of Macros a new concept of Q-Expressions is used to delay evaluation. To conventional Lisp programmers Q-Expressions are confusing because their semantics differ subtly from the quote Macro.
40 |
41 |
I use Q-Expressions instead of Macros for a couple of reasons.
42 |
43 |
First of all I believe them to be easier for beginners than Macros. When evaluation is delayed it is always explicit, shown by the syntax, and not implicit in the function name. It also means that S-Expressions can never be returned by the prompt or seen in the wild. They are always evaluated.
44 |
45 |
Secondly it is more consistent. It no longer requires the concept of Macros, but instead transforms quoted expressions to become the dominant, more powerful concept that does everything needed by either. With Q-Expressions there are only Functions and Expressions, and the language is even more homo-iconic than before.
46 |
47 |
Finally, Q-Expressions are distinctively more powerful than Macros. Using Q-Expressions it is possible to pass an argument to a function that evaluates to a Q-Expression, making input arguments capable of being dynamic. In conventional Lisps passing an expression to a Macro will always pause the evaluation, and so the arguments cannot be dynamic, only symbolic.
48 |
49 |
50 |
Where are the answers to the exercises?
51 |
52 |
There are none. In the real world no one is going to pass you an answer booklet, or check your work for you. Ensuring your code does as you intended is an essential skill to learn. Some readers ask for the answers because they are concerned that they might not have done the right thing. But testing the right thing in this case is just testing understanding of the question; a pretty pointless exercise. Additionally there is not always a wrong or right way to approach the Bonus Marks. They are intended as suggestions to help people check their understanding, or to explore other ideas and thoughts.
53 |
54 |
If you want to do a Bonus Mark question but are unsure what it means, feel free to send me an e-mail and I will try to clarify it! Otherwise don't fret over the answer. Giving it a try is the important part, not getting it right!
Download links for the E-Book expire after two months. If you want to download the E-Book again or would like an updated version of the E-Book please get in contact at.
Build Your Own Lisp is available in print for the best reading experience. Painstakingly copy-edited, and printed in beautiful colour. Benefits include (but are not limited to) new book smell, bookshelf bragging rights, supporting impoverished students, and a Langton's ant flick book.
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
E-Book
20 |
21 |
22 |
23 |
Build Your Own Lisp is available for purchase in all major E-Book formats, DRM free, for just $4.99.
24 |
25 |
32 |
33 |
After your purchase you should receive an e-mail from me with download links to the book in a variety of formats. Current the book is available in the formats .epub, .mobi, and .pdf. If you want the book, but need it in a different format, please get in contact and I can try to make it available for you!
34 |
35 |
Please check your junk mail folder if you don't receive the e-mail right away.
36 |
37 |
The e-mail should come almost instantly. If you don't recieve it right away something has gone wrong. Please get in contact and I will mail you manually. Gmail in particular for some reason puts mail from buildyourownlisp in the junk mail folder so please check it first. If you still haven't received an e-mail, or at any point you require an updated version of the book with any fixes that may be merged into the website, don't hesitate to get in contact. I will try to fix your issue as soon as possible.
2 | Build Your Own Lisp
3 | Learn C and build your own programming language in 1000 lines of code!
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
If you're looking to learn C, or you've ever wondered how to build your own programming language, this is the book for you.
13 |
14 |
In just a few lines of code, I'll teach you how to use C, and together, we'll start building your very own language.
15 |
16 |
Along the way we'll learn about the weird and wonderful nature of Lisps, how to develop a real-world project, concisely solve problems, and write beautiful code!
17 |
18 |
This book is free to read online, so you can get started right away! But for those who want to show their support, or who want the best reading experience, this book is also available for purchase in print format, or for cheap in all major e-book formats.
19 |
20 |
21 |
22 |
23 |
24 |
25 |
26 |
27 |
28 |
45 |
46 |
47 |
48 |
49 |
"I finally feel complete as a C programmer, having implemented my own Lisp."
50 |
"Every programmer should do something like this, at least once."
51 |
"One of the greatest things I've ever found on the internet..."