├── CHANGELOG.md ├── .gitignore ├── stack.yaml.lock ├── magus.cabal ├── LICENSE ├── stack.yaml ├── app └── Main.hs └── README.md /CHANGELOG.md: -------------------------------------------------------------------------------- 1 | # Revision history for magus 2 | 3 | ## 0.1.0.0 -- YYYY-mm-dd 4 | 5 | * First version. Released on an unsuspecting world. 6 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | dist 2 | dist-* 3 | cabal-dev 4 | *.o 5 | *.hi 6 | *.chi 7 | *.chs.h 8 | *.dyn_o 9 | *.dyn_hi 10 | .hpc 11 | .hsenv 12 | .cabal-sandbox/ 13 | cabal.sandbox.config 14 | *.prof 15 | *.aux 16 | *.hp 17 | *.eventlog 18 | .stack-work/ 19 | cabal.project.local 20 | cabal.project.local~ 21 | .HTF/ 22 | .ghc.environment.* 23 | *.exe 24 | *.c 25 | .DS_Store 26 | -------------------------------------------------------------------------------- /stack.yaml.lock: -------------------------------------------------------------------------------- 1 | # This file was autogenerated by Stack. 2 | # You should not edit this file by hand. 3 | # For more information, please see the documentation at: 4 | # https://docs.haskellstack.org/en/stable/lock_files 5 | 6 | packages: [] 7 | snapshots: 8 | - completed: 9 | sha256: 428ec8d5ce932190d3cbe266b9eb3c175cd81e984babf876b64019e2cbe4ea68 10 | size: 590100 11 | url: https://raw.githubusercontent.com/commercialhaskell/stackage-snapshots/master/lts/18/28.yaml 12 | original: 13 | url: https://raw.githubusercontent.com/commercialhaskell/stackage-snapshots/master/lts/18/28.yaml 14 | -------------------------------------------------------------------------------- /magus.cabal: -------------------------------------------------------------------------------- 1 | cabal-version: 2.4 2 | name: magus 3 | version: 0.1.0.0 4 | 5 | -- A short (one-line) description of the package. 6 | -- synopsis: 7 | 8 | -- A longer description of the package. 9 | -- description: 10 | 11 | -- A URL where users can report bugs. 12 | -- bug-reports: 13 | 14 | -- The license under which the package is released. 15 | -- license: 16 | author: Bulat Ziganshin 17 | maintainer: bulat.ziganshin@gmail.com 18 | 19 | -- A copyright notice. 20 | -- copyright: 21 | -- category: 22 | extra-source-files: 23 | CHANGELOG.md 24 | README.md 25 | 26 | executable magus 27 | main-is: Main.hs 28 | 29 | -- Modules included in this executable, other than Main. 30 | -- other-modules: 31 | 32 | -- LANGUAGE extensions used by modules in this package. 33 | -- other-extensions: 34 | build-depends: base ^>=4.14.3.0, 35 | pretty, 36 | bytestring, 37 | language-c 38 | hs-source-dirs: app 39 | default-language: Haskell2010 40 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /stack.yaml: -------------------------------------------------------------------------------- 1 | # This file was automatically generated by 'stack init' 2 | # 3 | # Some commonly used options have been documented as comments in this file. 4 | # For advanced use and comprehensive documentation of the format, please see: 5 | # https://docs.haskellstack.org/en/stable/yaml_configuration/ 6 | 7 | # Resolver to choose a 'specific' stackage snapshot or a compiler version. 8 | # A snapshot resolver dictates the compiler version and the set of packages 9 | # to be used for project dependencies. For example: 10 | # 11 | # resolver: lts-3.5 12 | # resolver: nightly-2015-09-21 13 | # resolver: ghc-7.10.2 14 | # 15 | # The location of a snapshot can be provided as a file or url. Stack assumes 16 | # a snapshot provided as a file might change, whereas a url resource does not. 17 | # 18 | # resolver: ./custom-snapshot.yaml 19 | # resolver: https://example.com/snapshots/2018-01-01.yaml 20 | resolver: 21 | url: https://raw.githubusercontent.com/commercialhaskell/stackage-snapshots/master/lts/18/28.yaml 22 | 23 | # User packages to be built. 24 | # Various formats can be used as shown in the example below. 25 | # 26 | # packages: 27 | # - some-directory 28 | # - https://example.com/foo/bar/baz-0.0.2.tar.gz 29 | # subdirs: 30 | # - auto-update 31 | # - wai 32 | packages: 33 | - . 34 | # Dependency packages to be pulled from upstream that are not in the resolver. 35 | # These entries can reference officially published versions as well as 36 | # forks / in-progress versions pinned to a git hash. For example: 37 | # 38 | # extra-deps: 39 | # - acme-missiles-0.3 40 | # - git: https://github.com/commercialhaskell/stack.git 41 | # commit: e7b331f14bcffb8367cd58fbfc8b40ec7642100a 42 | # 43 | # extra-deps: [] 44 | 45 | # Override default flag values for local packages and extra-deps 46 | # flags: {} 47 | 48 | # Extra package databases containing global packages 49 | # extra-package-dbs: [] 50 | 51 | # Control whether we use the GHC we find on the path 52 | # system-ghc: true 53 | # 54 | # Require a specific version of stack, using version ranges 55 | # require-stack-version: -any # Default 56 | # require-stack-version: ">=2.7" 57 | # 58 | # Override the architecture used by stack, especially useful on Windows 59 | # arch: i386 60 | # arch: x86_64 61 | # 62 | # Extra directories used by stack for building 63 | # extra-include-dirs: [/path/to/dir] 64 | # extra-lib-dirs: [/path/to/dir] 65 | # 66 | # Allow a newer minor version of GHC than the snapshot specifies 67 | # compiler-check: newer-minor 68 | -------------------------------------------------------------------------------- /app/Main.hs: -------------------------------------------------------------------------------- 1 | -- Minimal example: parse a file, and pretty print it again 2 | module Main where 3 | import System.Environment 4 | import System.Exit 5 | import System.IO 6 | import Control.Monad 7 | import Text.PrettyPrint.HughesPJ 8 | import qualified Data.ByteString as BS 9 | import Data.List 10 | 11 | import Language.C -- simple API 12 | import Language.C.System.GCC -- preprocessor used 13 | import Language.C.System.Preprocess 14 | import Language.C.Data.Name 15 | import Language.C.Syntax.Utils 16 | import Language.C.Data.Ident 17 | 18 | ------------------------------------------------------------------------------------------------------------------------------------- 19 | 20 | data ShowPlaceholder = ShowPlaceholder 21 | instance Show ShowPlaceholder where 22 | showsPrec _ ShowPlaceholder = showString "_" 23 | 24 | decorate :: ShowS -> ShowS 25 | decorate app = showString "(" . app . showString ")" 26 | 27 | ------------------------------------------------------------------------------------------------------------------------------------- 28 | 29 | usageMsg :: String -> String 30 | usageMsg prg = render $ 31 | text "Usage:" <+> text prg <+> hsep (map text ["CPP_OPTIONS","input_file.c"]) 32 | 33 | errorOnLeft :: (Show a) => String -> (Either a b) -> IO b 34 | errorOnLeft msg = either (error . (msg++).show) return 35 | errorOnLeftM :: (Show a) => String -> IO (Either a b) -> IO b 36 | errorOnLeftM msg action = action >>= errorOnLeft msg 37 | 38 | -- | @parseCStatements input initialPos@ parses the given preprocessed C-source input and returns the AST or a list of parse errors. 39 | parseCStatements :: InputStream -> Position -> Either ParseError CStat 40 | parseCStatements input initialPosition = 41 | fmap fst $ execParser statementP inputStatement initialPosition builtinTypeNames (namesStartingFrom 0) 42 | where inputStatement = (mapBS "{ ") `BS.append` input `BS.append` (mapBS " }") 43 | mapBS = BS.pack . map (toEnum.fromEnum) 44 | 45 | writeAST ast = do 46 | -- dump AST 47 | putStrLn $ (decorate (shows (fmap (const ShowPlaceholder) ast)) "") 48 | -- pretty print 49 | print $ pretty ast 50 | 51 | main :: IO () 52 | main = do 53 | let usageErr = (hPutStrLn stderr (usageMsg "./magus") >> exitWith (ExitFailure 1)) 54 | args <- getArgs 55 | when (length args < 1) usageErr 56 | let (opts,input_file) = (init args, last args) 57 | 58 | -- read 59 | input_stream <- readInputStream input_file 60 | 61 | -- parse 62 | ast <- errorOnLeft "Parse Error:" $ 63 | parseCStatements input_stream (initPos input_file) 64 | 65 | putStrLn "------------------- original:" 66 | writeAST ast 67 | putStrLn "------------------- transpiled:" 68 | writeAST (transpile ast) 69 | 70 | 71 | transpile = mapSubStmts (const False) $ \stat -> case stat of 72 | CExpr (Just expr) ctx 73 | | Just asm <- mapExpr expr 74 | -> CAsm (CAsmStmt Nothing (CStrLit (CString asm False) ctx) [] [] [] ctx) ctx 75 | _ -> stat 76 | 77 | 78 | mapExpr (CCall (CVar (Ident func id a0) a1) params a2) = 79 | Just$ func++mapParams params 80 | -- CCall (CVar (Ident ("__"++name) id a0) a1) params a2 81 | mapExpr _ = Nothing 82 | 83 | 84 | mapParams [] = "" 85 | mapParams (x:xs) = (" "++)$ intercalate ", "$ map mapParam (xs++[x]) 86 | 87 | 88 | mapParam (CVar (Ident name id a0) a1) = "%%"++name 89 | mapParam (CConst (CIntConst num a0)) = "$"++show num 90 | 91 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | I thank Eugene Shelwien and Dmitry Bortoq for long discussions regarding these topics. Many key ideas of this project were proposed by them, and without their help, it will never became reality. 2 | 3 | 4 | # Make assembly magic great again! 5 | 6 | Modern optimizing C++ compilers almost entirely displaced assembly languages. I don't wrote assembly code for 20 years. Neither I see it used anywhere, except for Intel own libraries. 7 | 8 | But C++ optimizers aren't the silver bullet. Each time I write high-optimized algorithm, I go through fight against compilers. I know from the start an assembler code I want to see, but it's hard to force compiler to generate exactly what I need. Modern compilers feel themselves so smart and move code around, allocate registers at their own discretion, and select assembler instructions what they prefer. 9 | 10 | Nevertheless, I don't write my own asm code for a few reasons: 11 | - portability: we have to support various CPU architectures (x86, x64, ARM...), object/library formats, calling conventions and name mangling conventions 12 | - brevity: a function may have 50 commands, of those only 20 are in main loop. Writing assembler commands is boring by itself, and writing all 50 instructions (while our point of interest is only 20 ones) in low-level way is even more boring 13 | 14 | 15 | ## Portability 16 | 17 | Fortunately, there are various solutions to both classes of problems, in particular for portability: 18 | - name mangling and object/library formats can be converted by Agner Fog [objconv] utility. It's especially important, since it means that all speed-critical code can be compiled by the same best compiler (GCC) and then linked into executables produced by inferior compilers! 19 | - calling convention (ABI) portability falls into 3 classes: 20 | - all x86 code can be made compatible by using `cdecl` declaration 21 | - x64 Windows compilers have the single ABI 22 | - x64 Unix compilers also have the single ABI, even if incompatible with the Windows one 23 | - x86 and x64 code is almost compatible. The main differences are pointer width and number of registers available. Both can be somewhat solved using symbolic register names, f.e. `ptr_reg8` can be translated into either `r8` or `[esp+4*8]` depending on CPU 24 | 25 | Overall, we can make assembly code fully portable within single ISA, and highly portable between x86 and x64, using macro packages to abstract register names and to hide ABI. Alternative approach is to link high-quality code produced by GCC into executables produced by other compilers. 26 | 27 | 28 | ## Brevity 29 | 30 | ### [High-level assemblers] 31 | 32 | Various approaches to make assembly code more compact and readable are available in MASM, [FASM], [HLA], [ForwardCom], and have the common name of [High-level assemblers]. A few examples: 33 | - function/call facility which hides ABI and simplifies/reduces code 34 | - C-style instruction syntax, a la `EAX += EBX` 35 | - structured compound statements (if, while...), sometimes with relational operations (`if EAX > EBX`) 36 | - complex expressions for assignments and if/while conditions (`if EAX*2 > EBX`) 37 | - typeful register declarations and endless virtual registers, where extra registers are spilled into stack 38 | 39 | PTX is a particularly interesting example of virtual assembly language that allows to declare unlimited amount of typed "registers" and supports legacy ISA instructions by emulating them with command sequences. This allows NVidia to make each next generation of video cards incompatible with previous one, and to adapt to varying amount of registers that depends on compilation options. 40 | 41 | 42 | ### Sphinx C-- 43 | 44 | Sphinx C-- is the language providing C-like syntax for assembly code, including computations and if/while statements. The rest is implemented via usual assembly statements. It looks like ideal high-level assembly language for me, but unfortunately original compiler was 16-bit only and various 32/64-bit clones don't took off. 45 | 46 | So my first idea was to make open-source implementation of similar language using a modern parsing approach (such as PEG) in a high-level language (probably, Haskell or OCaml) with massive extensibility features (ability to add new operators and statements). 47 | 48 | 49 | ### Turbo: C with benefits 50 | 51 | And at this moment I recalled Turbo C - old dumb C compiler that allowed to use register names as variables, f.e. `if (_AX > _BX) _AX <<= _CL` plus had plain MS-style asm inline statements. These two features made it quite similar to Sphinx C--, but with important benefit - except for these two extensions, it was plain C code. This allowed to write code that contains both portable and optimized low-level implementation, selected depending on the compiler used: 52 | 53 | ```C 54 | #if TURBO_C 55 | # define bitbuf _AX 56 | # define count _CL 57 | #else 58 | int bitbuf, count; 59 | #endif 60 | 61 | bitbuf <<= count; 62 | ``` 63 | 64 | My experience of program optimization using Turbo C was really nice - I started with existing C algorithm and gradually replaced all complex expressions with single-operation assignments, and then added register bindings, similar to the code above, to declarations of hot variables. And the code remained working at each step of this transformation. 65 | 66 | This pseudo-variable register syntax is probably available in newer Borland C++ compilers too, including free Borland C++ 5.5 version. 67 | 68 | 69 | ### New ideal found 70 | 71 | At this point I realized that all I need is just C/C++ "with benefits": 72 | - compiler shouldn't reorder statements 73 | - support for manual and semi-automatic assignment of concrete registers to variables - use pragmas ignored by usual C/C++ compilers 74 | - all asm commands can be generated via intrinsics - provide equivalent implementation in plain C/C++ 75 | 76 | Similar to Turbo C approach, it will allow to develop plain C/C++ code and debug it using any existing C/C++ compiler. Once the code is working, we can rewrite critical loops to use only low-level operations, directly compilable to single assembly commands. At any moment, it still remains usual C/C++ code whose correctness can be checked with usual C/C++ compiler. 77 | 78 | Once transformation is done, we can compile the code with our Magus C++ compiler and get exactly the asm code we developed. On platforms not supported by Magus, the code still can be compiled with usual C/C++ compilers. 79 | 80 | This approach will provide us all benefits of Sphinx C-- (i.e. code portability, brevity and familiar C syntax), plus allow to share the same code between C/C++ (for portability to any system and debugging) and high-level assembler (for performance). 81 | 82 | Now, once I figured what to do, I started to research various approaches to C/C++ compiling which can be extended with Magus code generator: LCC, TCC, ANTLR C++ parser, gcc/clang IR transformations. But every approach I was able to find was either hard to learn and implement (such as IR transformations), or had limited usefulness (such as modification of any OSS C compiler), so while my goal became perfectly defined, implementation seemed pretty hard. 83 | 84 | Another variation of this idea was employing Nim - it allows to transform code AST at the compile time, which is exactly a kind of transformation I'm looking for. So, once algorithm was developed as low-level Nim code, it can be translated into C/C++ in usual way, or preprocessed by Magus replacing original statements with inline asm code. 85 | 86 | 87 | ### Embedding 88 | 89 | And at this point of discussion Eugene brought two great ideas to the table: 90 | - we don't need to produce asm code directly, instead we can generate gcc inline `asm` statements, which then can be compiled by any major C/C++ compiler (except for MSVC) 91 | - we don't need to parse and process entire input file, instead we can translate only specifically marked regions. This makes principal difference, freeing us from the burden of full C++ language support. Instead, we need to support only language subset used in the statements, and moreover - only part of the whole statement syntax that we find really useful for this type of HPC computing. And even this small C subset can be implemented incrementally if we will start with support for generic `asm` statement. 92 | 93 | And combination of these ideas is absolute win, allowing us to quickly develop minimal practical translator and then extend it at comfortable pace. Since we plan to stay strictly within existing C/C++ syntax, and don't need anything but raw syntax parser, we can choose among well-known existing C/C++ parsers, such as [ANTLR C++14] and [Haskell C-language]. C-language parser has additional advantage - it supports gcc `asm` statements in the AST type, and provides AST pretty-printer, so we can map C statements like `eax=ebx` into corresponding asm statements and then pretty-print code region back to feed any C/C++ compiler compatible with GCC asm statements. 94 | 95 | 96 | ### Implementation 97 | 98 | Implementation plan is: 99 | - [x] Magus inputs a sequence of GCC C statements and outputs compound GCC C statement that includes GCC-style asm statements. 100 | - [x] all identifiers are translated as register names, i.e. `EAX` -> `%%EAX`. 101 | - [x] all function calls are translated as asm commands with the same name, i.e. `CRC32(EAX,1)` -> `CRC32 1, %%EAX` and `EBX = CRC32(EAX,1)` -> `MOV %%EAX, %%EBX; CRC32 1, %%EBX`. 102 | - [ ] map `ax += bx` to inline asm code 103 | - [ ] `ax = bx+cx` 104 | - [ ] `goto lbl; lbl:` 105 | - [ ] `if (CF) goto lbl` 106 | - [ ] `if (ax < bx) goto lbl` 107 | - [ ] `{}` 108 | - [ ] `if {}` 109 | - [ ] `while` 110 | - [ ] `for` 111 | - [ ] gcc-compatible intrinsic set 112 | - [ ] manual and semi-automatic register allocation 113 | - [ ] complex expressions, and allocation of temporary registers to handle that 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | [objconv]: http://www.agner.org/optimize/#objconv 137 | [FASM]: https://en.wikipedia.org/wiki/FASM 138 | [HLA]: https://en.wikipedia.org/wiki/High_Level_Assembly 139 | [ForwardCom]: https://github.com/ForwardCom/code-examples 140 | [High-level assemblers]: https://en.wikipedia.org/wiki/High-level_assembler 141 | [ANTLR C++14]: https://github.com/antlr/grammars-v4/tree/master/cpp 142 | [Haskell C-language]: http://hackage.haskell.org/package/language-c 143 | --------------------------------------------------------------------------------