├── LICENSE ├── README.md ├── bf2llvm.c └── bf2llvm.cpp /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2017 itchyny 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # llvm-brainf**k 2 | ### Brainf\*\*k compiler based on LLVM API 3 | This is the first step of my journey around LLVM compiler infrastructure. 4 | 5 | In order to get familiar with LLVM IR instructions, I firstly employ `clang` to see how it compiles a simple C program to LLVM IR. 6 | For example, 7 | ```c 8 | int main() { 9 | char c; 10 | int i; 11 | long l; 12 | c = 'a'; 13 | i = 72; 14 | l = 123456789012345; 15 | } 16 | ``` 17 | this simple C program is compiled to the following LLVM IR instructions. 18 | ```llvm 19 | define i32 @main() { 20 | %1 = alloca i8, align 1 21 | %2 = alloca i32, align 4 22 | %3 = alloca i64, align 8 23 | store i8 97, i8* %1, align 1 24 | store i32 72, i32* %2, align 4 25 | store i64 123456789012345, i64* %3, align 8 26 | ret i32 0 27 | } 28 | ``` 29 | Try `clang -S -emit-llvm -O0 test.c` to see this, ignore attributes or some comments. 30 | Now you learn `define`, `alloca`, `store` and `ret` instructions in one minute. 31 | It's very easy isn't it? 32 | 33 | The `clang` compiler shows us what LLVM IR instruction corresponds to C codes. 34 | This is the very first step for learning LLVM compiler infrastructure in my opinion. 35 | 36 | The next step is generating LLVM IR instructions from programs. 37 | Are you going to try C++ API? Are you trying to start the [Kaleidoscope tutorial](http://llvm.org/docs/tutorial/)? 38 | Please wait they are still difficult and complicated for beginners. 39 | Yeah, if you can learn forward, go ahead. 40 | But too difficult for me. 41 | 42 | Brainf\*\*k is like Hello world for learners of compiler and interpreter. 43 | I like this language. 44 | It has simple syntax that we can write its interpreter in an hour, but famous enough that complicated codes are found online (like mandelbrot.b written by Erik Bosman). 45 | We can pick up such Brainf\*\*k codes to check whether our interpreter works correctly or not. 46 | Writing codes for 8 instructions then a mandelbrot art is generated on terminal, it's like a miracle! 47 | 48 | I firstly wrote bf2llvm.c. 49 | Of course, I used `clang -S -emit-llvm` to see which instructions to output for each Brainf\*\*k instructions. 50 | As you can see, it directly outputs LLVM IR instructions. 51 | It has no dependency on LLVM library and easy to what instructions will be generated by the code. 52 | But it's difficult to maintain temporary variable index, it can go into wrong codes easily and it requires the lli command to run the output. 53 | 54 | Secondly I wrote bf2llvm.cpp. 55 | I explored [the document generated by doxygen](http://llvm.org/docs/doxygen/html/classllvm_1_1IRBuilder.html) to see which functions to generate the IR instructions I need. 56 | It's very fun to use the LLVM C++ API! 57 | I need not take care for the index of temporary variables, precise type annotations for many instructions. 58 | The `getOrInsertFunction` function works like a charm that I don't care for declarations outside the main function. 59 | Object file generation is still difficult for me but I learned a lot from [the chapter 8 of the Kaleidoscope tutorial](http://llvm.org/docs/tutorial/LangImpl08.html). 60 | 61 | In my experience of writing the two Brainf\*\*k compilers, now I think I got familiar with LLVM IR instructions than before. 62 | After all, `clang -S -emit-llvm` is a great teacher for me, rather than complicated doxygen document and the Kaleidoscope tutorial. 63 | But now I'm going to try the Kaleidoscope tutorial (again; I gave up a few months ago). 64 | 65 | Thank you LLVM infrastructure and it's contributors, you are great! 66 | I'm still a beginner of compiler techniques but I know you gave me a way to generate executables for my own language. 67 | I'm excited to learn LLVM infrastructure, want to contribute language compilers based on LLVM and create my own language in the future! 68 | -------------------------------------------------------------------------------- /bf2llvm.c: -------------------------------------------------------------------------------- 1 | /* 2 | * Brainf**k -> LLVM IR Compiler 3 | * $ gcc bf2llvm.c -o bf2llvm 4 | * $ echo "+++++++++[>++++++++>+++++++++++>+++++<<<-]>.>++.+++++++..+++.\ 5 | >-.------------.<++++++++.--------.+++.------.--------.>+." | \ 6 | ./bf2llvm | opt -S -O3 | lli 7 | */ 8 | #include 9 | #include 10 | 11 | void emit_header() { 12 | printf("define i32 @main() {\n"); 13 | printf(" %%data = alloca i8*, align 8\n"); 14 | printf(" %%ptr = alloca i8*, align 8\n"); 15 | printf(" %%data_ptr = call i8* @calloc(i64 30000, i64 1)\n"); 16 | printf(" store i8* %%data_ptr, i8** %%data, align 8\n"); 17 | printf(" store i8* %%data_ptr, i8** %%ptr, align 8\n"); 18 | } 19 | 20 | int idx = 1; 21 | void emit_move_ptr(int diff) { 22 | printf(" %%%d = load i8*, i8** %%ptr, align 8\n", idx); 23 | printf(" %%%d = getelementptr inbounds i8, i8* %%%d, i32 %d\n", idx + 1, idx, diff); 24 | printf(" store i8* %%%d, i8** %%ptr, align 8\n", idx + 1); 25 | idx += 2; 26 | } 27 | 28 | void emit_add(int diff) { 29 | printf(" %%%d = load i8*, i8** %%ptr, align 8\n", idx); 30 | printf(" %%%d = load i8, i8* %%%d, align 1\n", idx + 1, idx); 31 | printf(" %%%d = add i8 %%%d, %d\n", idx + 2, idx + 1, diff); 32 | printf(" store i8 %%%d, i8* %%%d, align 1\n", idx + 2, idx); 33 | idx += 3; 34 | } 35 | 36 | void emit_put() { 37 | printf(" %%%d = load i8*, i8** %%ptr, align 8\n", idx); 38 | printf(" %%%d = load i8, i8* %%%d, align 1\n", idx + 1, idx); 39 | printf(" %%%d = sext i8 %%%d to i32\n", idx + 2, idx + 1); 40 | printf(" %%%d = call i32 @putchar(i32 %%%d)\n", idx + 3, idx + 2); 41 | idx += 4; 42 | } 43 | 44 | void emit_get() { 45 | printf(" %%%d = call i32 @getchar()\n", idx); 46 | printf(" %%%d = trunc i32 %%%d to i8\n", idx + 1, idx); 47 | printf(" %%%d = load i8*, i8** %%ptr, align 8\n", idx + 2); 48 | printf(" store i8 %%%d, i8* %%%d, align 1\n", idx + 1, idx + 2); 49 | idx += 3; 50 | } 51 | 52 | void emit_while_start(int while_index) { 53 | printf(" br label %%while_cond%d\n", while_index); 54 | printf("while_cond%d:\n", while_index); 55 | printf(" %%%d = load i8*, i8** %%ptr, align 8\n", idx); 56 | printf(" %%%d = load i8, i8* %%%d, align 1\n", idx + 1, idx); 57 | printf(" %%%d = icmp ne i8 %%%d, 0\n", idx + 2, idx + 1); 58 | printf(" br i1 %%%d, label %%while_body%d, label %%while_end%d\n", idx + 2, while_index, while_index); 59 | printf("while_body%d:\n", while_index); 60 | idx += 3; 61 | } 62 | 63 | void emit_while_end(int while_index) { 64 | printf(" br label %%while_cond%d\n", while_index); 65 | printf("while_end%d:\n", while_index); 66 | } 67 | 68 | void emit_footer() { 69 | printf(" %%%d = load i8*, i8** %%data, align 8\n", idx); 70 | printf(" call void @free(i8* %%%d)\n", idx); 71 | printf(" ret i32 0\n"); 72 | printf("}\n\n"); 73 | printf("declare i8* @calloc(i64, i64)\n\n"); 74 | printf("declare void @free(i8*)\n\n"); 75 | printf("declare i32 @putchar(i32)\n\n"); 76 | printf("declare i32 @getchar()\n"); 77 | } 78 | 79 | int main() { 80 | char c; 81 | int while_index = 0; 82 | int while_indices[1000]; 83 | int* while_index_ptr = while_indices; 84 | emit_header(); 85 | while ((c = getchar()) != EOF) { 86 | switch (c) { 87 | case '>': emit_move_ptr(1); break; 88 | case '<': emit_move_ptr(-1); break; 89 | case '+': emit_add(1); break; 90 | case '-': emit_add(-1); break; 91 | case '[': emit_while_start(*while_index_ptr++ = while_index++); break; 92 | case ']': if (--while_index_ptr < while_indices) { 93 | fprintf(stderr, "unmatching ]\n"); 94 | return 1; 95 | } 96 | emit_while_end(*while_index_ptr); break; 97 | case '.': emit_put(); break; 98 | case ',': emit_get(); break; 99 | } 100 | } 101 | emit_footer(); 102 | return 0; 103 | } 104 | -------------------------------------------------------------------------------- /bf2llvm.cpp: -------------------------------------------------------------------------------- 1 | /* 2 | * Brainf**k compiler based on LLVM API 3 | * $ g++ `llvm-config --cxxflags --ldflags --libs --system-libs` bf2llvm.cpp -o bf2llvm 4 | * $ echo "+++++++++[>++++++++>+++++++++++>+++++<<<-]>.>++.+++++++..+++.\ 5 | >-.------------.<++++++++.--------.+++.------.--------.>+." | \ 6 | ./bf2llvm 7 | * $ gcc output.o 8 | * $ ./a.out 9 | */ 10 | #include 11 | #include 12 | #include "llvm/ADT/Optional.h" 13 | #include "llvm/IR/BasicBlock.h" 14 | #include "llvm/IR/IRBuilder.h" 15 | #include "llvm/IR/LegacyPassManager.h" 16 | #include "llvm/IR/Module.h" 17 | #include "llvm/Support/FileSystem.h" 18 | #include "llvm/Support/TargetRegistry.h" 19 | #include "llvm/Support/TargetSelect.h" 20 | #include "llvm/Support/raw_ostream.h" 21 | #include "llvm/Target/TargetMachine.h" 22 | #include "llvm/Target/TargetOptions.h" 23 | 24 | static llvm::LLVMContext TheContext; 25 | static llvm::IRBuilder<> Builder(TheContext); 26 | static std::unique_ptr TheModule; 27 | 28 | void emit_move_ptr(llvm::Value* ptr, int diff) { 29 | Builder.CreateStore( 30 | Builder.CreateInBoundsGEP( 31 | Builder.getInt8Ty(), 32 | Builder.CreateLoad(ptr), 33 | Builder.getInt32(diff)), 34 | ptr); 35 | } 36 | 37 | void emit_add(llvm::Value* ptr, int diff) { 38 | llvm::Value* tmp = Builder.CreateLoad(ptr); 39 | Builder.CreateStore( 40 | Builder.CreateAdd( 41 | Builder.CreateLoad(tmp), 42 | Builder.getInt8(diff)), 43 | tmp); 44 | } 45 | 46 | void emit_put(llvm::Value* ptr) { 47 | llvm::Function* funcPutChar = llvm::cast( 48 | TheModule->getOrInsertFunction("putchar", 49 | Builder.getInt32Ty(), 50 | Builder.getInt32Ty(), 51 | nullptr)); 52 | Builder.CreateCall( 53 | funcPutChar, 54 | Builder.CreateSExt( 55 | Builder.CreateLoad(Builder.CreateLoad(ptr)), 56 | Builder.getInt32Ty())); 57 | } 58 | 59 | void emit_get(llvm::Value* ptr) { 60 | llvm::Function* funcGetChar = llvm::cast( 61 | TheModule->getOrInsertFunction("getchar", 62 | Builder.getInt32Ty(), 63 | nullptr)); 64 | Builder.CreateStore( 65 | Builder.CreateTrunc( 66 | Builder.CreateCall(funcGetChar), 67 | Builder.getInt8Ty()), 68 | Builder.CreateLoad(ptr)); 69 | } 70 | 71 | struct WhileBlock { 72 | llvm::BasicBlock* cond_block; 73 | llvm::BasicBlock* body_block; 74 | llvm::BasicBlock* end_block; 75 | }; 76 | 77 | void emit_while_start(llvm::Function* func, llvm::Value* ptr, WhileBlock* while_block, int while_index) { 78 | while_block->cond_block = llvm::BasicBlock::Create( 79 | TheContext, std::string("while_cond") + std::to_string(while_index), func); 80 | while_block->body_block = llvm::BasicBlock::Create( 81 | TheContext, std::string("while_body") + std::to_string(while_index), func); 82 | while_block->end_block = llvm::BasicBlock::Create( 83 | TheContext, std::string("while_end") + std::to_string(while_index), func); 84 | Builder.CreateBr(while_block->cond_block); 85 | Builder.SetInsertPoint(while_block->cond_block); 86 | Builder.CreateCondBr( 87 | Builder.CreateICmpNE( 88 | Builder.CreateLoad(Builder.CreateLoad(ptr)), 89 | Builder.getInt8(0)), 90 | while_block->body_block, 91 | while_block->end_block); 92 | Builder.SetInsertPoint(while_block->body_block); 93 | } 94 | 95 | void emit_while_end(WhileBlock* while_block) { 96 | Builder.CreateBr(while_block->cond_block); 97 | Builder.SetInsertPoint(while_block->end_block); 98 | } 99 | 100 | int main() { 101 | TheModule = llvm::make_unique("top", TheContext); 102 | llvm::Function* mainFunc = llvm::Function::Create( 103 | llvm::FunctionType::get(llvm::Type::getInt32Ty(TheContext), false), 104 | llvm::Function::ExternalLinkage, "main", TheModule.get()); 105 | Builder.SetInsertPoint(llvm::BasicBlock::Create(TheContext, "", mainFunc)); 106 | 107 | llvm::Value* data = Builder.CreateAlloca(Builder.getInt8PtrTy(), nullptr, "data"); 108 | llvm::Value* ptr = Builder.CreateAlloca(Builder.getInt8PtrTy(), nullptr, "ptr"); 109 | llvm::Function* funcCalloc = llvm::cast( 110 | TheModule->getOrInsertFunction("calloc", 111 | Builder.getInt8PtrTy(), 112 | Builder.getInt64Ty(), Builder.getInt64Ty(), 113 | nullptr)); 114 | llvm::Value* data_ptr = Builder.CreateCall(funcCalloc, {Builder.getInt64(30000), Builder.getInt64(1)}); 115 | Builder.CreateStore(data_ptr, data); 116 | Builder.CreateStore(data_ptr, ptr); 117 | 118 | int while_index = 0; 119 | WhileBlock while_blocks[1000]; 120 | WhileBlock* while_block_ptr = while_blocks; 121 | char c; 122 | while (std::cin.get(c)) { 123 | switch (c) { 124 | case '>': emit_move_ptr(ptr, 1); break; 125 | case '<': emit_move_ptr(ptr, -1); break; 126 | case '+': emit_add(ptr, 1); break; 127 | case '-': emit_add(ptr, -1); break; 128 | case '[': emit_while_start(mainFunc, ptr, while_block_ptr++, while_index++); break; 129 | case ']': if (--while_block_ptr < while_blocks) { 130 | std::cerr << "unmatching ]\n"; 131 | return 1; 132 | } 133 | emit_while_end(while_block_ptr); break; 134 | case '.': emit_put(ptr); break; 135 | case ',': emit_get(ptr); break; 136 | } 137 | } 138 | 139 | llvm::Function* funcFree = llvm::cast( 140 | TheModule->getOrInsertFunction("free", 141 | Builder.getVoidTy(), 142 | Builder.getInt8PtrTy(), 143 | nullptr)); 144 | Builder.CreateCall(funcFree, {Builder.CreateLoad(data)}); 145 | 146 | Builder.CreateRet(Builder.getInt32(0)); 147 | 148 | llvm::InitializeAllTargetInfos(); 149 | llvm::InitializeAllTargets(); 150 | llvm::InitializeAllTargetMCs(); 151 | llvm::InitializeAllAsmParsers(); 152 | llvm::InitializeAllAsmPrinters(); 153 | 154 | std::string TargetTriple = llvm::sys::getDefaultTargetTriple(); 155 | 156 | std::string err; 157 | const llvm::Target* Target = llvm::TargetRegistry::lookupTarget(TargetTriple, err); 158 | if (!Target) { 159 | std::cerr << "Failed to lookup target " + TargetTriple + ": " + err; 160 | return 1; 161 | } 162 | 163 | llvm::TargetOptions opt; 164 | llvm::TargetMachine* TheTargetMachine = Target->createTargetMachine( 165 | TargetTriple, "generic", "", opt, llvm::Optional()); 166 | 167 | TheModule->setTargetTriple(TargetTriple); 168 | TheModule->setDataLayout(TheTargetMachine->createDataLayout()); 169 | 170 | std::string Filename = "output.o"; 171 | std::error_code err_code; 172 | llvm::raw_fd_ostream dest(Filename, err_code, llvm::sys::fs::F_None); 173 | if (err_code) { 174 | std::cerr << "Could not open file: " << err_code.message(); 175 | return 1; 176 | } 177 | 178 | llvm::legacy::PassManager pass; 179 | if (TheTargetMachine->addPassesToEmitFile(pass, dest, llvm::TargetMachine::CGFT_ObjectFile)) { 180 | std::cerr << "TheTargetMachine can't emit a file of this type\n"; 181 | return 1; 182 | } 183 | pass.run(*TheModule); 184 | dest.flush(); 185 | std::cout << "Wrote " << Filename << "\n"; 186 | 187 | return 0; 188 | } 189 | --------------------------------------------------------------------------------