├── .gitignore ├── Makefile ├── README.md ├── ast ├── BinaryExprAST.cpp ├── BinaryExprAST.h ├── CallExprAST.cpp ├── CallExprAST.h ├── ExprAST.h ├── FunctionAST.cpp ├── FunctionAST.h ├── NumberExprAST.cpp ├── NumberExprAST.h ├── PrototypeAST.cpp ├── PrototypeAST.h ├── VariableExprAST.cpp └── VariableExprAST.h ├── kaleidoscope ├── kaleidoscope.cpp └── kaleidoscope.h ├── lexer ├── lexer.cpp ├── lexer.h └── token.h ├── logger ├── logger.cpp └── logger.h ├── main.cpp └── parser ├── parser.cpp └── parser.h /.gitignore: -------------------------------------------------------------------------------- 1 | *.gch 2 | *.out 3 | *.dSYM 4 | *.o 5 | 6 | /llvm 7 | .vscode 8 | main 9 | -------------------------------------------------------------------------------- /Makefile: -------------------------------------------------------------------------------- 1 | SOURCES = $(shell find ast kaleidoscope lexer logger parser -name '*.cpp') 2 | HEADERS = $(shell find ast kaleidoscope lexer logger parser -name '*.h') 3 | OBJ = ${SOURCES:.cpp=.o} 4 | 5 | CC = llvm-g++ -stdlib=libc++ -std=c++14 6 | CFLAGS = -g -O3 -I llvm/include -I llvm/build/include -I ./ 7 | LLVMFLAGS = `llvm-config --cxxflags --ldflags --system-libs --libs all` 8 | 9 | .PHONY: main 10 | 11 | main: main.cpp ${OBJ} 12 | ${CC} ${CFLAGS} ${LLVMFLAGS} ${OBJ} $< -o $@ 13 | 14 | clean: 15 | rm -r ${OBJ} 16 | 17 | %.o: %.cpp ${HEADERS} 18 | ${CC} ${CFLAGS} ${LLVMFLAGS} -c $< -o $@ 19 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Kaleidoscope: Implementing a Language with LLVM 2 | 3 | ## How to build it 4 | On MacOS (tested on Ventura 13.0). 5 | ~~~ 6 | # Install llvm (version 15.0) 7 | brew install llvm@15 8 | make 9 | ./main 10 | # This should bring up a simple REPL. 11 | ~~~ 12 | 13 | ## Why? 14 | 15 | Self-education... 16 | 17 | I'm interested in LLVM and want to try simple things with it. 18 | That's why I've started official LLVM tutorial - [Kaleidoscope](http://llvm.org/docs/tutorial). 19 | 20 | ## What's it all about? 21 | 22 | This tutorial runs through the implementation of a simple language, showing how fun and easy it can be. 23 | This tutorial will get you up and started as well as help to build a framework you can extend to other languages. 24 | The code in this tutorial can also be used as a playground to hack on other LLVM specific things. 25 | 26 | The goal of this tutorial is to progressively unveil our language, describing how it is built up over time. 27 | This will let us cover a fairly broad range of language design and LLVM-specific usage issues, showing and explaining the code for it all along the way, without overwhelming you with tons of details up front. 28 | 29 | It is useful to point out ahead of time that this tutorial is really about teaching compiler techniques and LLVM specifically, not about teaching modern and sane software engineering principles. 30 | In practice, this means that we’ll take a number of shortcuts to simplify the exposition. 31 | For example, the code uses global variables all over the place, doesn’t use nice design patterns like visitors, etc... but it is very simple. 32 | If you dig in and use the code as a basis for future projects, fixing these deficiencies shouldn’t be hard. 33 | 34 | ## How it works all together? 35 | 36 | ### Lexer 37 | 38 | The first thing here is a lexer. 39 | Lexer is responsible for getting a stream of chars and translating it into a groups of tokens. 40 | 41 | > A lexer is a software program that performs lexical analysis. Lexical analysis is the process of separating a stream of characters into different words, which in computer science we call 'tokens'. 42 | 43 | Tokens identifiers are stored under `lexer/token.h` file and lexer implementation under `lexer/lexer.cpp` file. 44 | 45 | Tokens are just an `enum` structure, which consists of token identifier and a number assigned to this token. 46 | This way, we can identify tokens through lexical analysis. 47 | 48 | The actual reading of a stream is implemented in `lexer/lexer.cpp` file. 49 | Function `gettok` reads characters one-by-one from `stdin` and groups them in tokens. 50 | So, basically, `gettok` function reads characters and returns numbers (tokens). 51 | 52 | Further, we can use these tokens in parser (semantic analysis). 53 | 54 | ### AST (Abstract Syntax Tree) 55 | 56 | Though, before diving into the parser, we need to implement AST nodes, that we can use during parsing. 57 | 58 | Basic block of each AST node is `ExprAST` node, which is stored under `ast/ExprAST.h` file. 59 | All other nodes are extends from `ExprAST` node. 60 | 61 | Each of AST nodes must implement one method - `codegen()`. 62 | `codegen()` method is responsible for generating LLVM IR, using LLVM IRBuilder API, that's all. 63 | 64 | As you can see in `ast` folder, we have implemented the following AST nodes with appropriate code generation into LLVM IR: 65 | 66 | - Binary Expressions; 67 | - Call Expressions; 68 | - Function Expressions; 69 | - Number Expressions; 70 | - Prototype Expressions; 71 | - Variable Expressions; 72 | 73 | Each of these nodes have a constructor where all mandatory values are initialized. 74 | Based on that information, `codegen()` can build LLVM IR, usine these values. 75 | 76 | The simplest one, i.e. is Number Expression. 77 | `codegen()` for number expression just calls appropriate method in LLVM IR Builder: 78 | 79 | ```c++ 80 | llvm::Value *NumberExprAST::codegen() { 81 | return llvm::ConstantFP::get(TheContext, llvm::APFloat(Val)); 82 | } 83 | ``` 84 | 85 | Now, we have two parts of a compiler which we can combine. 86 | 87 | ### Parser 88 | 89 | Parser is where lexer and AST are combined together. 90 | The actual implementation of a parser stores into `parser/parser.cpp` file. 91 | 92 | Parser uses lexer for getting a stream of tokens, which are used for building an AST, using our AST implementation. 93 | 94 | So, in general, when parser sees a known token, i.e. number token, it tries to create a `NumberExprAST` node. 95 | 96 | When parsing is done, got the last character/token from the stream, we have an AST representation of our code. 97 | We can use it and generate LLVM IR from our AST using `codegen()` method in each AST node. 98 | 99 | This process is done in `main.cpp` file. 100 | `main.cpp` file is the place where all the parts are combined in one place. 101 | -------------------------------------------------------------------------------- /ast/BinaryExprAST.cpp: -------------------------------------------------------------------------------- 1 | #include "ast/BinaryExprAST.h" 2 | #include "kaleidoscope/kaleidoscope.h" 3 | 4 | // Generate LLVM code for binary expressions 5 | llvm::Value *BinaryExprAST::codegen() { 6 | llvm::Value *L = LHS->codegen(); 7 | llvm::Value *R = RHS->codegen(); 8 | 9 | if (!L || !R) { 10 | return nullptr; 11 | } 12 | 13 | switch (Op) { 14 | case '+': 15 | return Builder.CreateFAdd(L, R, "addtmp"); 16 | case '-': 17 | return Builder.CreateFSub(L, R, "subtmp"); 18 | case '*': 19 | return Builder.CreateFMul(L, R, "multmp"); 20 | case '<': 21 | L = Builder.CreateFCmpULT(L, R, "cmptmp"); 22 | return Builder.CreateUIToFP(L, llvm::Type::getDoubleTy(TheContext), "booltmp"); 23 | default: 24 | return LogErrorV("Invalid binary operator"); 25 | } 26 | } 27 | -------------------------------------------------------------------------------- /ast/BinaryExprAST.h: -------------------------------------------------------------------------------- 1 | #ifndef __BINARY_EXPR_AST_H__ 2 | #define __BINARY_EXPR_AST_H__ 3 | 4 | #include "llvm/IR/IRBuilder.h" 5 | #include "logger/logger.h" 6 | #include "ast/ExprAST.h" 7 | 8 | // Expression class for a binary operator 9 | class BinaryExprAST : public ExprAST { 10 | char Op; 11 | std::unique_ptr LHS, RHS; 12 | 13 | public: 14 | BinaryExprAST(char op, std::unique_ptr LHS, std::unique_ptr RHS) : Op(op), LHS(std::move(LHS)), RHS(std::move(RHS)) {} 15 | llvm::Value *codegen() override; 16 | }; 17 | 18 | #endif 19 | -------------------------------------------------------------------------------- /ast/CallExprAST.cpp: -------------------------------------------------------------------------------- 1 | #include "ast/CallExprAST.h" 2 | 3 | // Generate LLVM code for function calls 4 | llvm::Value *CallExprAST::codegen() { 5 | llvm::Function *CalleeF = TheModule->getFunction(Callee); 6 | 7 | if (!CalleeF) { 8 | return LogErrorV("Unknown function referenced"); 9 | } 10 | 11 | if (CalleeF->arg_size() != Args.size()) { 12 | return LogErrorV("Incorrect # arguments passed"); 13 | } 14 | 15 | std::vector ArgsV; 16 | for (unsigned i = 0, e = Args.size(); i != e; i++) { 17 | ArgsV.push_back(Args[i]->codegen()); 18 | 19 | if (!ArgsV.back()) { 20 | return nullptr; 21 | } 22 | } 23 | 24 | return Builder.CreateCall(CalleeF, ArgsV, "calltmp"); 25 | } 26 | -------------------------------------------------------------------------------- /ast/CallExprAST.h: -------------------------------------------------------------------------------- 1 | #ifndef __CALL_EXPR_AST_H__ 2 | #define __CALL_EXPR_AST_H__ 3 | 4 | #include "ast/ExprAST.h" 5 | #include "llvm/IR/IRBuilder.h" 6 | #include "logger/logger.h" 7 | #include "kaleidoscope/kaleidoscope.h" 8 | 9 | // Expression class for function calls 10 | class CallExprAST : public ExprAST { 11 | std::string Callee; 12 | std::vector> Args; 13 | 14 | public: 15 | CallExprAST(const std::string &Callee, std::vector> Args) : Callee(Callee), Args(std::move(Args)) {} 16 | llvm::Value *codegen() override; 17 | }; 18 | 19 | #endif 20 | -------------------------------------------------------------------------------- /ast/ExprAST.h: -------------------------------------------------------------------------------- 1 | #ifndef __EXPR_AST_H__ 2 | #define __EXPR_AST_H__ 3 | 4 | #include "llvm/IR/BasicBlock.h" 5 | 6 | class ExprAST { 7 | public: 8 | virtual ~ExprAST() {} 9 | virtual llvm::Value *codegen() = 0; 10 | }; 11 | 12 | #endif 13 | -------------------------------------------------------------------------------- /ast/FunctionAST.cpp: -------------------------------------------------------------------------------- 1 | #include "ast/FunctionAST.h" 2 | 3 | // Generates LLVM code for functions declarations 4 | llvm::Function *FunctionAST::codegen() { 5 | llvm::Function *TheFunction = TheModule->getFunction(Proto->getName()); 6 | 7 | if (!TheFunction) { 8 | TheFunction = Proto->codegen(); 9 | } 10 | 11 | if (!TheFunction) { 12 | return nullptr; 13 | } 14 | 15 | llvm::BasicBlock *BB = llvm::BasicBlock::Create(TheContext, "entry", TheFunction); 16 | Builder.SetInsertPoint(BB); 17 | NamedValues.clear(); 18 | for (auto &Arg : TheFunction->args()) { 19 | NamedValues[Arg.getName().data()] = &Arg; 20 | } 21 | 22 | if (llvm::Value *RetVal = Body->codegen()) { 23 | Builder.CreateRet(RetVal); 24 | verifyFunction(*TheFunction); 25 | 26 | return TheFunction; 27 | } 28 | 29 | TheFunction->eraseFromParent(); 30 | return nullptr; 31 | } 32 | -------------------------------------------------------------------------------- /ast/FunctionAST.h: -------------------------------------------------------------------------------- 1 | #ifndef __FUNCTION_AST_H__ 2 | #define __FUNCTION_AST_H__ 3 | 4 | #include "ast/PrototypeAST.h" 5 | 6 | // Represents a function definition itself 7 | class FunctionAST { 8 | std::unique_ptr Proto; 9 | std::unique_ptr Body; 10 | 11 | public: 12 | FunctionAST(std::unique_ptr Proto, std::unique_ptr Body) : Proto(std::move(Proto)), Body(std::move(Body)) {} 13 | llvm::Function *codegen(); 14 | }; 15 | 16 | #endif 17 | -------------------------------------------------------------------------------- /ast/NumberExprAST.cpp: -------------------------------------------------------------------------------- 1 | #include "ast/NumberExprAST.h" 2 | 3 | // Generate LLVM code for numeric literals 4 | llvm::Value *NumberExprAST::codegen() { 5 | return llvm::ConstantFP::get(TheContext, llvm::APFloat(Val)); 6 | } 7 | -------------------------------------------------------------------------------- /ast/NumberExprAST.h: -------------------------------------------------------------------------------- 1 | #ifndef __NUMBER_EXPR_AST_H__ 2 | #define __NUMBER_EXPR_AST_H__ 3 | 4 | #include "ast/ExprAST.h" 5 | #include "kaleidoscope/kaleidoscope.h" 6 | 7 | // Expression class for numeric literals like "1.0" 8 | class NumberExprAST : public ExprAST { 9 | double Val; 10 | 11 | public: 12 | NumberExprAST(double Val) : Val(Val) {} 13 | llvm::Value *codegen() override; 14 | }; 15 | 16 | #endif 17 | -------------------------------------------------------------------------------- /ast/PrototypeAST.cpp: -------------------------------------------------------------------------------- 1 | #include "ast/PrototypeAST.h" 2 | 3 | // Generates LLVM code for externals calls 4 | llvm::Function *PrototypeAST::codegen() { 5 | std::vector Doubles(Args.size(), llvm::Type::getDoubleTy(TheContext)); 6 | llvm::FunctionType *FT = llvm::FunctionType::get(llvm::Type::getDoubleTy(TheContext), Doubles, false); 7 | llvm::Function *F = llvm::Function::Create(FT, llvm::Function::ExternalLinkage, Name, TheModule.get()); 8 | 9 | unsigned Idx = 0; 10 | for (auto &Arg : F->args()) { 11 | Arg.setName(Args[Idx++]); 12 | } 13 | 14 | return F; 15 | } 16 | -------------------------------------------------------------------------------- /ast/PrototypeAST.h: -------------------------------------------------------------------------------- 1 | #ifndef __PROTOTYPE_AST_H__ 2 | #define __PROTOTYPE_AST_H__ 3 | 4 | #include "ast/ExprAST.h" 5 | #include "llvm/IR/IRBuilder.h" 6 | #include "kaleidoscope/kaleidoscope.h" 7 | 8 | // Represents the "prototype" for a function, 9 | // which captures its name, and its argument names 10 | class PrototypeAST { 11 | std::string Name; 12 | std::vector Args; 13 | 14 | public: 15 | PrototypeAST(const std::string &name, std::vector Args) : Name(name), Args(std::move(Args)) {} 16 | 17 | llvm::Function *codegen(); 18 | const std::string &getName() const { return Name; } 19 | }; 20 | 21 | #endif 22 | -------------------------------------------------------------------------------- /ast/VariableExprAST.cpp: -------------------------------------------------------------------------------- 1 | #include "ast/VariableExprAST.h" 2 | 3 | // We assume that the variable has already been emitted somewhere 4 | llvm::Value *VariableExprAST::codegen() { 5 | llvm::Value *V = NamedValues[Name]; 6 | 7 | if (!V) { 8 | LogErrorV("Unknown variable name"); 9 | } 10 | 11 | return V; 12 | } 13 | -------------------------------------------------------------------------------- /ast/VariableExprAST.h: -------------------------------------------------------------------------------- 1 | #ifndef __VARIABLE_EXPR_AST_H__ 2 | #define __VARIABLE_EXPR_AST_H__ 3 | 4 | #include "ast/ExprAST.h" 5 | #include "logger/logger.h" 6 | 7 | // Expression class for referencing a variable, like "a" 8 | class VariableExprAST : public ExprAST { 9 | std::string Name; 10 | 11 | public: 12 | VariableExprAST(const std::string &Name) : Name(Name) {} 13 | llvm::Value *codegen() override; 14 | }; 15 | 16 | #endif 17 | -------------------------------------------------------------------------------- /kaleidoscope/kaleidoscope.cpp: -------------------------------------------------------------------------------- 1 | #include "kaleidoscope.h" 2 | 3 | // This is an object that owns LLVM core data structures 4 | llvm::LLVMContext TheContext; 5 | 6 | // This is a helper object that makes easy to generate LLVM instructions 7 | llvm::IRBuilder<> Builder(TheContext); 8 | 9 | // This is an LLVM construct that contains functions and global variables 10 | std::unique_ptr TheModule; 11 | 12 | // This map keeps track of which values are defined in the current scope 13 | std::map NamedValues; -------------------------------------------------------------------------------- /kaleidoscope/kaleidoscope.h: -------------------------------------------------------------------------------- 1 | #ifndef __KALEIDOSCOPE_H__ 2 | #define __KALEIDOSCOPE_H__ 3 | 4 | #include "llvm/ADT/APFloat.h" 5 | #include "llvm/ADT/STLExtras.h" 6 | #include "llvm/IR/BasicBlock.h" 7 | #include "llvm/IR/Constants.h" 8 | #include "llvm/IR/DerivedTypes.h" 9 | #include "llvm/IR/Function.h" 10 | #include "llvm/IR/IRBuilder.h" 11 | #include "llvm/IR/LLVMContext.h" 12 | #include "llvm/IR/Module.h" 13 | #include "llvm/IR/Type.h" 14 | #include "llvm/IR/Verifier.h" 15 | 16 | #include 17 | 18 | // This is an object that owns LLVM core data structures 19 | extern llvm::LLVMContext TheContext; 20 | 21 | // This is a helper object that makes easy to generate LLVM instructions 22 | extern llvm::IRBuilder<> Builder; 23 | 24 | // This is an LLVM construct that contains functions and global variables 25 | extern std::unique_ptr TheModule; 26 | 27 | // This map keeps track of which values are defined in the current scope 28 | extern std::map NamedValues; 29 | 30 | #endif 31 | -------------------------------------------------------------------------------- /lexer/lexer.cpp: -------------------------------------------------------------------------------- 1 | #include "lexer/lexer.h" 2 | #include "lexer/token.h" 3 | 4 | int CurTok; 5 | std::string IdentifierStr; 6 | double NumVal; 7 | 8 | // The actual implementation of the lexer is a single function gettok() 9 | // It's called to return the next token from standard input 10 | // gettok works by calling getchar() function to read chars one at a time 11 | // Then it recognizes them and stores the last character read in LastChar 12 | int gettok() { 13 | static int LastChar = ' '; 14 | 15 | // The first thing we need to do is ignore whitespaces between tokens 16 | while (isspace(LastChar)) { 17 | LastChar = getchar(); 18 | } 19 | 20 | // Next thing is recognize identifier and specific keywords like "def" 21 | if (isalpha(LastChar)) { 22 | IdentifierStr = LastChar; 23 | 24 | // Stacking together all alphanumeric characters into IdentifierStr 25 | while (isalnum(LastChar = getchar())) { 26 | IdentifierStr += LastChar; 27 | } 28 | 29 | if (IdentifierStr == "def") { 30 | return tok_def; 31 | } 32 | 33 | if (IdentifierStr == "extern") { 34 | return tok_extern; 35 | } 36 | 37 | return tok_identifier; 38 | } 39 | 40 | // Stacking together only numeric values 41 | if (isdigit(LastChar) || LastChar == '.') { 42 | std::string NumStr; 43 | 44 | do { 45 | NumStr += LastChar; 46 | LastChar = getchar(); 47 | } while (isdigit(LastChar) || LastChar == '.'); 48 | 49 | // Convert numeric string to numeric value 50 | // that we are store in NumVal 51 | NumVal = strtod(NumStr.c_str(), 0); 52 | return tok_number; 53 | } 54 | 55 | // Handling comments by skipping to the end of the line 56 | // and return the next token 57 | if (LastChar == '#') { 58 | do { 59 | LastChar = getchar(); 60 | } while (LastChar != EOF && LastChar != '\n' && LastChar != '\r'); 61 | 62 | if (LastChar != EOF) { 63 | return gettok(); 64 | } 65 | } 66 | 67 | // Finally, if the input doesn't match one of the above cases 68 | // it's either an operator character like '+' or the end of the file 69 | if (LastChar == EOF) { 70 | return tok_eof; 71 | } 72 | 73 | int ThisChar = LastChar; 74 | LastChar = getchar(); 75 | return ThisChar; 76 | } 77 | 78 | int getNextToken() { 79 | return CurTok = gettok(); 80 | } 81 | -------------------------------------------------------------------------------- /lexer/lexer.h: -------------------------------------------------------------------------------- 1 | #ifndef __LEXER_H__ 2 | #define __LEXER_H__ 3 | 4 | #include 5 | #include 6 | 7 | // Provide a simple token buffer 8 | // CurTok is the current token the parser is looking at 9 | // getNextToken reads another token from the lexer and updates CurTok with its results 10 | extern int CurTok; 11 | int gettok(); 12 | int getNextToken(); 13 | 14 | // If the current token is an identifier 15 | // IdentifierStr will hold the name of the identifier 16 | extern std::string IdentifierStr; 17 | 18 | // If the current token is a numeric literal 19 | // NumVal holds its value 20 | extern double NumVal; 21 | 22 | #endif 23 | -------------------------------------------------------------------------------- /lexer/token.h: -------------------------------------------------------------------------------- 1 | #ifndef __TOKEN_H__ 2 | #define __TOKEN_H__ 3 | 4 | // The lexer returns tokens [0-255] if it's an unknown character 5 | // otherwise it returns one of these for known things 6 | enum Token { 7 | // End Of File 8 | tok_eof = -1, 9 | 10 | // Commands 11 | tok_def = -2, 12 | tok_extern = -3, 13 | 14 | // Primary 15 | tok_identifier = -4, 16 | tok_number = -5, 17 | }; 18 | 19 | #endif 20 | -------------------------------------------------------------------------------- /logger/logger.cpp: -------------------------------------------------------------------------------- 1 | #include "logger/logger.h" 2 | 3 | // Some helpers for error handling 4 | std::unique_ptr LogError(const char *Str) { 5 | fprintf(stderr, "LogError: %s\n", Str); 6 | return nullptr; 7 | } 8 | 9 | std::unique_ptr LogErrorP(const char *Str) { 10 | LogError(Str); 11 | return nullptr; 12 | } 13 | 14 | llvm::Value *LogErrorV(const char *Str) { 15 | LogError(Str); 16 | return nullptr; 17 | } 18 | -------------------------------------------------------------------------------- /logger/logger.h: -------------------------------------------------------------------------------- 1 | #ifndef __LOGGER_H__ 2 | #define __LOGGER_H__ 3 | 4 | #include "ast/ExprAST.h" 5 | #include "ast/PrototypeAST.h" 6 | 7 | std::unique_ptr LogError(const char *Str); 8 | std::unique_ptr LogErrorP(const char *Str); 9 | llvm::Value *LogErrorV(const char *Str); 10 | 11 | #endif 12 | -------------------------------------------------------------------------------- /main.cpp: -------------------------------------------------------------------------------- 1 | // lexer headers 2 | #include "lexer/lexer.h" 3 | #include "lexer/token.h" 4 | 5 | // AST headers 6 | #include "ast/BinaryExprAST.h" 7 | #include "ast/CallExprAST.h" 8 | #include "ast/ExprAST.h" 9 | #include "ast/FunctionAST.h" 10 | #include "ast/NumberExprAST.h" 11 | #include "ast/PrototypeAST.h" 12 | #include "ast/VariableExprAST.h" 13 | 14 | // parser headers 15 | #include "parser/parser.h" 16 | 17 | // logger headers 18 | #include "logger/logger.h" 19 | 20 | // kaleidoscope headers 21 | #include "kaleidoscope/kaleidoscope.h" 22 | 23 | // LLVM headers 24 | #include "llvm/ADT/APFloat.h" 25 | #include "llvm/ADT/STLExtras.h" 26 | #include "llvm/IR/BasicBlock.h" 27 | #include "llvm/IR/Constants.h" 28 | #include "llvm/IR/DerivedTypes.h" 29 | #include "llvm/IR/Function.h" 30 | #include "llvm/IR/IRBuilder.h" 31 | #include "llvm/IR/LLVMContext.h" 32 | #include "llvm/IR/Module.h" 33 | #include "llvm/IR/Type.h" 34 | #include "llvm/IR/Verifier.h" 35 | 36 | // stdlib headers 37 | #include 38 | #include 39 | #include 40 | #include 41 | #include 42 | #include 43 | #include 44 | #include 45 | 46 | using namespace llvm; 47 | 48 | static void HandleDefinition() { 49 | if (auto FnAST = ParseDefinition()) { 50 | if (auto *FnIR = FnAST->codegen()) { 51 | fprintf(stderr, "Read function definition:"); 52 | FnIR->print(errs()); 53 | fprintf(stderr, "\n"); 54 | } 55 | } else { 56 | getNextToken(); 57 | } 58 | } 59 | 60 | static void HandleExtern() { 61 | if (auto ProtoAST = ParseExtern()) { 62 | if (auto *FnIR = ProtoAST->codegen()) { 63 | fprintf(stderr, "Read extern:"); 64 | FnIR->print(errs()); 65 | fprintf(stderr, "\n"); 66 | } 67 | } else { 68 | getNextToken(); 69 | } 70 | } 71 | 72 | static void HandleTopLevelExpression() { 73 | if (auto FnAST = ParseTopLevelExpr()) { 74 | if (auto *FnIR = FnAST->codegen()) { 75 | fprintf(stderr, "Read top-level expression:"); 76 | FnIR->print(errs()); 77 | fprintf(stderr, "\n"); 78 | } 79 | } else { 80 | getNextToken(); 81 | } 82 | } 83 | 84 | static void MainLoop() { 85 | while (true) { 86 | fprintf(stderr, "ready> "); 87 | 88 | switch (CurTok) { 89 | case tok_eof: 90 | return; 91 | case ';': 92 | getNextToken(); 93 | break; 94 | case tok_def: 95 | HandleDefinition(); 96 | break; 97 | case tok_extern: 98 | HandleExtern(); 99 | break; 100 | default: 101 | HandleTopLevelExpression(); 102 | break; 103 | } 104 | } 105 | } 106 | 107 | int main() { 108 | BinopPrecedence['<'] = 10; 109 | BinopPrecedence['+'] = 20; 110 | BinopPrecedence['-'] = 20; 111 | BinopPrecedence['*'] = 40; 112 | 113 | fprintf(stderr, "ready> "); 114 | getNextToken(); 115 | 116 | TheModule = std::make_unique("My awesome JIT", TheContext); 117 | 118 | MainLoop(); 119 | 120 | TheModule->print(errs(), nullptr); 121 | 122 | return 0; 123 | } 124 | -------------------------------------------------------------------------------- /parser/parser.cpp: -------------------------------------------------------------------------------- 1 | #include "parser/parser.h" 2 | 3 | std::map BinopPrecedence; 4 | 5 | static int GetTokPrecedence() { 6 | if (!isascii(CurTok)) { 7 | return -1; 8 | } 9 | 10 | int TokPrec = BinopPrecedence[CurTok]; 11 | if (TokPrec <= 0) return -1; 12 | 13 | return TokPrec; 14 | } 15 | 16 | // This routine expects to be called when the current token is a tok_number 17 | // It takes the current number value and creates a NumberExprAST node 18 | std::unique_ptr ParseNumberExpr() { 19 | auto Result = std::make_unique(NumVal); 20 | getNextToken(); 21 | return std::move(Result); 22 | } 23 | 24 | // This routine parses expressions in "(" and ")" characters 25 | std::unique_ptr ParseParenExpr() { 26 | getNextToken(); 27 | 28 | auto V = ParseExpression(); 29 | if (!V) { 30 | return nullptr; 31 | } 32 | 33 | if (CurTok != ')') { 34 | return LogError("Expected )"); 35 | } 36 | 37 | getNextToken(); 38 | return V; 39 | } 40 | 41 | // This routine expects to be called when current token is tok_identifier 42 | std::unique_ptr ParseIdentifierExpr() { 43 | std::string IdName = IdentifierStr; 44 | 45 | getNextToken(); 46 | 47 | if (CurTok != '(') { 48 | return std::make_unique(IdName); 49 | } 50 | 51 | getNextToken(); 52 | std::vector> Args; 53 | if (CurTok != ')') { 54 | while (true) { 55 | if (auto Arg = ParseExpression()) { 56 | Args.push_back(std::move(Arg)); 57 | } else { 58 | return nullptr; 59 | } 60 | 61 | if (CurTok == ')') { 62 | break; 63 | } 64 | 65 | if (CurTok != ',') { 66 | return LogError("Expected ')' or ',' in argument list"); 67 | } 68 | 69 | getNextToken(); 70 | } 71 | } 72 | 73 | getNextToken(); 74 | 75 | return std::make_unique(IdName, std::move(Args)); 76 | } 77 | 78 | std::unique_ptr ParsePrimary() { 79 | switch (CurTok) { 80 | default: 81 | return LogError("Unknown token when expecting an expression"); 82 | case tok_identifier: 83 | return ParseIdentifierExpr(); 84 | case tok_number: 85 | return ParseNumberExpr(); 86 | case '(': 87 | return ParseParenExpr(); 88 | } 89 | } 90 | 91 | std::unique_ptr ParseBinOpRHS(int ExprPrec, std::unique_ptr LHS) { 92 | while (true) { 93 | int TokPrec = GetTokPrecedence(); 94 | 95 | if (TokPrec < ExprPrec) { 96 | return LHS; 97 | } 98 | 99 | int BinOp = CurTok; 100 | getNextToken(); 101 | 102 | auto RHS = ParsePrimary(); 103 | if (!RHS) { 104 | return nullptr; 105 | } 106 | 107 | int NextPrec = GetTokPrecedence(); 108 | if (TokPrec < NextPrec) { 109 | RHS = ParseBinOpRHS(TokPrec + 1, std::move(RHS)); 110 | if (!RHS) { 111 | return nullptr; 112 | } 113 | } 114 | 115 | LHS = std::make_unique(BinOp, std::move(LHS), std::move(RHS)); 116 | } 117 | } 118 | 119 | std::unique_ptr ParseExpression() { 120 | auto LHS = ParsePrimary(); 121 | 122 | if (!LHS) { 123 | return nullptr; 124 | } 125 | 126 | return ParseBinOpRHS(0, std::move(LHS)); 127 | } 128 | 129 | std::unique_ptr ParsePrototype() { 130 | if (CurTok != tok_identifier) { 131 | return LogErrorP("Expected function name in prototype"); 132 | } 133 | 134 | std::string FnName = IdentifierStr; 135 | getNextToken(); 136 | 137 | if (CurTok != '(') { 138 | return LogErrorP("Expected '(' in prototype"); 139 | } 140 | 141 | std::vector ArgNames; 142 | while (getNextToken() == tok_identifier) { 143 | ArgNames.push_back(IdentifierStr); 144 | } 145 | 146 | if (CurTok != ')') { 147 | return LogErrorP("Expected ')' in prototype"); 148 | } 149 | 150 | getNextToken(); 151 | 152 | return std::make_unique(FnName, std::move(ArgNames)); 153 | } 154 | 155 | std::unique_ptr ParseDefinition() { 156 | getNextToken(); 157 | 158 | auto Proto = ParsePrototype(); 159 | if (!Proto) { 160 | return nullptr; 161 | } 162 | 163 | if (auto E = ParseExpression()) { 164 | return std::make_unique(std::move(Proto), std::move(E)); 165 | } 166 | 167 | return nullptr; 168 | } 169 | 170 | std::unique_ptr ParseTopLevelExpr() { 171 | if (auto E = ParseExpression()) { 172 | auto Proto = std::make_unique("__anon_expr", std::vector()); 173 | return std::make_unique(std::move(Proto), std::move(E)); 174 | } 175 | 176 | return nullptr; 177 | } 178 | 179 | std::unique_ptr ParseExtern() { 180 | getNextToken(); 181 | return ParsePrototype(); 182 | } 183 | -------------------------------------------------------------------------------- /parser/parser.h: -------------------------------------------------------------------------------- 1 | #ifndef __PARSER_H__ 2 | #define __PARSER_H__ 3 | 4 | #include 5 | #include "ast/BinaryExprAST.h" 6 | #include "ast/CallExprAST.h" 7 | #include "ast/ExprAST.h" 8 | #include "ast/FunctionAST.h" 9 | #include "ast/NumberExprAST.h" 10 | #include "ast/PrototypeAST.h" 11 | #include "ast/VariableExprAST.h" 12 | #include "lexer/lexer.h" 13 | #include "lexer/token.h" 14 | 15 | extern std::map BinopPrecedence; 16 | std::unique_ptr ParseNumberExpr(); 17 | std::unique_ptr ParseParenExpr(); 18 | std::unique_ptr ParseIdentifierExpr(); 19 | std::unique_ptr ParsePrimary(); 20 | std::unique_ptr ParseBinOpRHS(int ExprPrec, std::unique_ptr LHS); 21 | std::unique_ptr ParseExpression(); 22 | std::unique_ptr ParsePrototype(); 23 | std::unique_ptr ParseDefinition(); 24 | std::unique_ptr ParseTopLevelExpr(); 25 | std::unique_ptr ParseExtern(); 26 | 27 | #endif 28 | --------------------------------------------------------------------------------