├── LICENSE ├── README.md ├── ast-matchers.md ├── ast-structure.md ├── choosing-the-tool.md ├── clang-tidy-checks.md ├── control-flow.md ├── diagnostics.md ├── libclang.md ├── libtooling.md ├── refactoring.md ├── source-code.md └── videos.md /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | Copyright (c) 2017 Peter Goldsborough 3 | 4 | Permission is hereby granted, free of charge, to any person obtaining a copy of 5 | this software and associated documentation files (the "Software"), to deal in 6 | the Software without restriction, including without limitation the rights to 7 | use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of 8 | the Software, and to permit persons to whom the Software is furnished to do so, 9 | subject to the following conditions: 10 | 11 | The above copyright notice and this permission notice shall be included in all 12 | copies or substantial portions of the Software. 13 | 14 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 15 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS 16 | FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR 17 | COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER 18 | IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN 19 | CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 20 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # clang-notes 2 | 3 | Study notes on the clang and LLVM compiler infrastructure. 4 | 5 | Todo: 6 | * Control-Flow Analysis 7 | * Refactoring 8 | -------------------------------------------------------------------------------- /ast-matchers.md: -------------------------------------------------------------------------------- 1 | # AST Matchers 2 | 3 | ## Official Documentation 4 | 5 | http://clang.llvm.org/docs/LibASTMatchers.html 6 | 7 | The `ASTMatcher` library, also called `LibASTMatcher`, provides a set of 8 | functions, classes and macros to form a domain-specific-language (DSL) that 9 | allows for intuitive and easy expression of points of interest (nodes) in the 10 | AST and subsequent matching via callbacks and built-in traversal. 11 | 12 | Say, for example, we wanted to match all classes or unions in a C++ source file. 13 | Both of these are referred to as a *record* in clang. Therefore, our base 14 | *matcher* will be `recordDecl()`. We can then add further matchers within this 15 | matcher to narrow our search. If we want all records with the name `Foo`, we can 16 | use a `hasName()` matcher inside the `recordDecl`: `recordDecl(hasName("foo"))`. 17 | Next to these predicate matchers, there exist also "quantifier" matchers like 18 | `allOf()` or `anyOf`. All of these quantifiers take one or more 19 | matchers and return a boolean. What's interesting to know is that all matchers 20 | that can accept multiple inner matchers actually have an implicit `allOf()` 21 | clause, so that you can simply write `matcher(, , ...)` 22 | instead of having to wrap the inner matchers into an `allOf`. For example, we 23 | could write `recordDecl(hasName("Foo"), isDerivedFrom("Bar"))` to look for 24 | classes or unions that are called `Foo` and are derived from `Bar`. 25 | 26 | There are more than a thousand classes in the AST. To figure out how to write a 27 | matcher for a particular kind of node, use this recipe: 28 | 29 | 1. Take some source code containing an example of what you want to match, 30 | 2. Dump the AST (for that region) and see how the statement is built-up, 31 | 3. Use the [ASTMatchers reference](http://clang.llvm.org/docs/LibASTMatchersReference.html) to look for a matcher that fits the *outer-most* kind of node you're interested in, or at least narrows down the search, 32 | 4. Use `clang-query` to verify that your matcher matches the example. 33 | 5. Examine the subsequent inner nodes in the AST-dump of your example. 34 | 6. Repeat for the next inner/child node. 35 | 36 | It is also possible to bind certain nodes in your match expression to names, so 37 | that you can reference them later. You can do this by adding a `.bind("")` 38 | to any noun-matcher in your expression. With "noun-matcher" I mean any matcher 39 | that sounds like a noun (`Decl`, `Stmt`, `Type` etc.) rather than a 40 | verb/attribute (e.g. `hasName` or `isInteger`). Once you've bound a node, you 41 | can retrieve it later inside your callback: 42 | 43 | ```cpp 44 | void run(const MatchFinder::MatchResult& Result) const { 45 | // Will be null if the result is not actually a `CallExpr` 46 | const CallExpr* E = Result.Nodes.getNodeAs("call"); 47 | } 48 | ``` 49 | 50 | The `MatchResult` object here is a lightweight struct that provides all 51 | necessary (and available) information about a particular match: the matched 52 | nodes in the `Nodes` member (of type `BoundNodes`, which holds dynamically typed 53 | nodes), the `Context` (an `ASTContext`) and the `SourceManager`. The `Nodes` 54 | member is of type `clang::ast_matchers::BoundNodes` and provides the important 55 | `getNodeAs()` method to retrieve bound nodes, as well as a `getMap()` member 56 | which returns a map from IDs to `DynTypedNode`s, which are type erased AST 57 | nodes. 58 | 59 | It is also possible to create your own matchers. For this there are two main 60 | possibilities. The first is the `VariadicDynCastAllOfMatcher` type. Matchers 61 | declared with this type form the backbone of a typical matcher hierarchy. They 62 | are declared as `const internal::VariadicDynCastAllOfMatcher 63 | ` and can contain any number of matchers of type `Base`, as long as the 64 | types of those matchers *could* be cast to `Derived`. For example, `recordDecl`s 65 | are defined as `const internal::VariadicDynCastAllOfMatcher 66 | cxxRecordDecl;`. Note that, to be more precise about binding, you can only bind 67 | nodes that are declared with this `VariadicDynCastAllOfMatcher` type. 68 | 69 | The second possibility for defining your own matchers are the `AST_MATCHER*` 70 | macros. An example would be following use, which matches all integer types: 71 | 72 | ```cpp 73 | AST_MATCHER(, isInteger) { 74 | // We have access to the Node, which is of , in here. 75 | return Node.isInteger(); 76 | } 77 | ``` 78 | 79 | Here, `Type` would be something like `Decl`, i.e. some base class that you can match on. 80 | 81 | ## ASTMatcher Reference 82 | 83 | http://clang.llvm.org/docs/LibASTMatchersReference.html 84 | 85 | There are three basic kinds of matchers we can use: 86 | 87 | 1. *Node Matchers* match a specific type of AST node (they are nouns), 88 | 2. *Narrowing Matchers* match attributes of AST nodes (like `isInteger`, `isConst` or `hasName`), 89 | 3. *Traversal Matchers* allow traversing the AST to other nodes by specifying certain relationships. 90 | 91 | All node matchers allow any number of arguments and implicitly wrap them in an 92 | `allOf()` clause. Also, you may only `bind` a name to a node matcher. 93 | 94 | Some examples for each node type follow below. The format is `name(param)`. 95 | 96 | ### Node Matchers 97 | 98 | The parameters of all of these matchers are further matchers, since they are 99 | nodes and have an implicit `allOf()` clause inside of them: 100 | 101 | * `classTemplateDecl`, matches a C++ template declaration, like `template class Z {};`, 102 | * `decl` matches any `Decl`, 103 | * `functionDecl` matches any function declaration, like `void f();`, 104 | * `qualType`: matches qualified types, e.g. `const int`, 105 | * `breakStmt`: matches all `break` statements, 106 | * `cxxCatchStmt`: matches all `catch` blocks. 107 | 108 | Note that, in general, C++-specific nodes have a `cxx` prefix. 109 | 110 | ### Narrowing Matchers 111 | 112 | * `allOf(Matchers...)`: the node must match all inner matchers, 113 | * `anyOf(Matchers...)`: the node must match at least one inner matcher, 114 | * `anything`: the node can match anything, 115 | * `equals(bool|char|integer)`: the node match the given value, 116 | * `isConst()`: the method or function must be const, 117 | * `isLambda()`: the C++ record must be a lambda expression, 118 | * `hasAttr(AttrKind)`: the declaration must have the given attribute enum value, 119 | * `isNoThrow()`: the function declaration must be declared with `noexcept` or `throw()`, 120 | * `isArrow()`: the *member expression* must use an arrow (`->`) as opposed to a dot, 121 | * `hasName(std::string)`: the `NameDecl` must have the given name, 122 | * `matchesName(std::string)`: the `NameDecl` must match the given name regex, 123 | * `isInteger()`: the given `QualType` must be an integer. 124 | 125 | ### Traversal Matchers 126 | 127 | * `eachOf(Matchers...)`, matches if any of its submatchers match, but unlike `anyOf`, generates a binding result for each submatch, rather than the first one found, 128 | * `forEachDescendant(Matcher)`: matches if any *direct or indirect* descendant of the node matches the property and creates a binding result for each such matching descendant, 129 | * `hasDescendant(Matcher)`: like `forEachDescendant`, but generates at most one binding result for a suitable descendant, even if multiple descendants match the matcher, 130 | * `hasAncestor(Matcher)`: matches nodes that have an ancestor that satisfy the given matcher, 131 | * `has(Matcher)`: matches if any child (i.e. direct descendant) satisfies the given matcher, 132 | * `hasParent(Matcher)`: matches if the node has a parent that satisfies the given matcher, 133 | * `hasIndex(Matcher)`: matches the index expression of a subscript operator, 134 | * `hasBase(Matcher)`: matches an array subscript expression, 135 | * `hasLHS(Matcher)`: matches the given matcher against the left hand side of a binary operator or array subscript expression, 136 | * `pointee(Matcher)`: matches if the type pointed to by a pointer or reference satisfies the given type-matcher, 137 | * `pointsTo(Matcher)`: matches if a `QualType` points to something that satisfies the given matcher, 138 | * `hasBody(Matcher)`: matches if the body of a function or a for, while or do statement matches the given matchers. 139 | 140 | Note that there is no `hasChild`, only `has` (which checks for direct 141 | descendants). 142 | 143 | ## Examples 144 | 145 | Match all pointer variables: 146 | 147 | ```cpp 148 | pointerType() 149 | ``` 150 | 151 | Match all const lambdas that take an `auto` variable, are noexcept and contain a goto statement (clang): 152 | 153 | ```cpp 154 | varDecl( 155 | hasType(isConstQualified()), 156 | hasInitializer( 157 | hasType(cxxRecordDecl( 158 | isLambda(), 159 | has(functionTemplateDecl( 160 | has(cxxMethodDecl( 161 | isNoThrow(), 162 | hasBody(compoundStmt(hasDescendant(gotoStmt()))))))))))) 163 | ``` 164 | -------------------------------------------------------------------------------- /ast-structure.md: -------------------------------------------------------------------------------- 1 | # AST Structure 2 | 3 | ## Understanding the Clang AST 4 | 5 | https://jonasdevlieghere.com/understanding-the-clang-ast/ 6 | 7 | `clang` is the compiler frontend for the C language family. In general, a 8 | compiler frontend is responsible for the lexing and parsing steps, the result of 9 | which is an abstract syntax tree (AST), as well as syntactic and semantic 10 | analysis. During these steps, the frontend may create a symbol table and verify 11 | the correctness of the AST. The result of the frontend stage is code in an 12 | *intermediate representation*, that may be handed on to the compiler backend 13 | and, optionally, before that, an optimizer. In the case of clang and LLVM, there 14 | is an optimizer of course (the LLVM `opt` optimizer). 15 | 16 | The three most important classes in the clang AST are `Decl` (declarations), 17 | `Stmt` (statements) and `Type` (types). These classes form the base of many 18 | further subclasses (such as all kinds of declarations, like `FunctionDecl`). 19 | However, it's important to note that these nodes do not share a common base 20 | class. As such there is no interface for traversing arbitrary nodes in the AST, 21 | for example via a pointer to a generic `ASTNode`. One reason for this design 22 | decision could be that even if you had a common base class, this class would not 23 | be very rich, as currently the APIs of `Decl`, `Stmt` and `Type` are all very 24 | different (i.e. the intersection of methods that would form the interface of an 25 | abstract `ASTNode` would be very small). Of course, runtime polymorphism can be 26 | replaced by compile-time polymorphism using templates, but because of this small 27 | intersection of APIs, this would not be so useful. Rather you most likely will 28 | be working with the *visitor pattern*, where you define or override dedicated 29 | functions for each of the three broad classes of nodes (and often more) and then 30 | let clang traverse the AST for you, calling your visitor function for whatever 31 | nodes it encounters. Note that this is slightly different from the C and Python 32 | APIs, which does actually have means of generically pointing to a node of any 33 | type in the AST. As such, the traversal APIs in these languages are sometimes 34 | more flexible and rely less heavily on visitation. 35 | 36 | The `ASTContext` class stores information about the AST that can not be found 37 | from the tree alone. Also, it takes on the role of an "AST manager", providing 38 | an entry point to the AST via the `getTranslationUnitDecl()` function, which 39 | provides the AST of the whole translation tree under inspection. The ASTContext 40 | also stores the identifier table, the source code manager and a list of declared 41 | types. It also has some glue methods to find the parents of a node 42 | (`getParents`) and provides global access to the language options 43 | (`getLangOpts`) specific constructs like the standard `size_t` type node. 44 | 45 | For a given C/C++ file, we can use `clang -Xclang -ast-dump -fsyntax-only 46 | ` to see a dump of the AST structure. For example, the following C++ 47 | program: 48 | 49 | ```cpp 50 | int f(int x, int y) { 51 | return x + y; 52 | } 53 | 54 | auto main() -> int { 55 | return f(1, 2); 56 | } 57 | ``` 58 | 59 | gives the following dump (`clang -Xclang -ast-dump -fsyntax-only -std=c++14 file.cpp`): 60 | 61 | ```cpp 62 | |-TypedefDecl 0x7fe591818858 <> implicit __int128_t '__int128' 63 | | `-BuiltinType 0x7fe591818540 '__int128' 64 | |-TypedefDecl 0x7fe5918188b8 <> implicit __uint128_t 'unsigned __int128' 65 | | `-BuiltinType 0x7fe591818560 'unsigned __int128' 66 | |-TypedefDecl 0x7fe591818bf8 <> implicit __NSConstantString 'struct __NSConstantString_tag' 67 | | `-RecordType 0x7fe5918189a0 'struct __NSConstantString_tag' 68 | | `-CXXRecord 0x7fe591818908 '__NSConstantString_tag' 69 | |-TypedefDecl 0x7fe591818c88 <> implicit __builtin_ms_va_list 'char *' 70 | | `-PointerType 0x7fe591818c50 'char *' 71 | | `-BuiltinType 0x7fe591818360 'char' 72 | |-TypedefDecl 0x7fe591832000 <> implicit __builtin_va_list 'struct __va_list_tag [1]' 73 | | `-ConstantArrayType 0x7fe591818f60 'struct __va_list_tag [1]' 1 74 | | `-RecordType 0x7fe591818d70 'struct __va_list_tag' 75 | | `-CXXRecord 0x7fe591818cd8 '__va_list_tag' 76 | // ----------------------------------------------------------------------------- 77 | |-FunctionDecl 0x7fe5918321a0 line:1:5 used f 'int (int, int)' 78 | | |-ParmVarDecl 0x7fe591832060 col:11 used x 'int' 79 | | |-ParmVarDecl 0x7fe5918320d0 col:18 used y 'int' 80 | | `-CompoundStmt 0x7fe591832358 81 | | `-ReturnStmt 0x7fe591832340 82 | | `-BinaryOperator 0x7fe591832318 'int' '+' 83 | | |-ImplicitCastExpr 0x7fe5918322e8 'int' 84 | | | `-DeclRefExpr 0x7fe591832298 'int' lvalue ParmVar 0x7fe591832060 'x' 'int' 85 | | `-ImplicitCastExpr 0x7fe591832300 'int' 86 | | `-DeclRefExpr 0x7fe5918322c0 'int' lvalue ParmVar 0x7fe5918320d0 'y' 'int' 87 | `-FunctionDecl 0x7fe591832430 line:5:6 main 'auto (void) -> int' 88 | `-CompoundStmt 0x7fe591832660 89 | `-ReturnStmt 0x7fe591832648 90 | `-CallExpr 0x7fe591832610 'int' 91 | |-ImplicitCastExpr 0x7fe5918325f8 'int (*)(int, int)' 92 | | `-DeclRefExpr 0x7fe5918325a0 'int (int, int)' lvalue Function 0x7fe5918321a0 'f' 'int (int, int)' 93 | |-IntegerLiteral 0x7fe591832560 'int' 1 94 | `-IntegerLiteral 0x7fe591832580 'int' 2 95 | ``` 96 | 97 | Note where I inserted the comment line (`// ---...`). Up to this line, all C++ 98 | programs emit the same AST, simply because of my particular system environment 99 | (note the `NS` stuff from macOS). Everything after that is the clang AST output 100 | for the actual program 101 | 102 | Next to a rich API to access information about each particular kind of AST node, 103 | some of the most important methods one deals with when interacting with the 104 | clang AST in C++ are nodes' so-called *glue methods*. Glue methods are methods 105 | that allow you to actually traverse the AST, basically bridging the gap between 106 | the various parts of the AST. For example, an If-statement consists of an 107 | `IfSmt`, with appropriate glue methods `getCond()`, `getThen()` and `getElse()` 108 | to access the related parts of the AST. 109 | 110 | Every token in a C/C++ input stream is identified by a `SourceLocation`. Here, a 111 | token is the most basic entity of a program, of which all higher level semantic 112 | concepts like expressions, classes or functions are composed (tokens can be 113 | classified as keywords, identifiers, literals, punctuation or comments -- see 114 | `Basic/TokenKinds.def`). Every node in the AST contains at least such 115 | `SourceLocation`s, so they are kept small by using IDs that can be used to 116 | lookup further information in the `SourceManager`. Otherwise you would have to 117 | store row, column and source-file information for every single node. 118 | 119 | To traverse the AST and to look for points of interest, we can either use the 120 | `RecursiveASTVistor` or `ASTMatcher` classes. Both of these approaches are 121 | embedded into LibTooling, which is nice as they save you the effort of 122 | traversing the AST yourself -- you only have to operate on it. The only downside 123 | is that this API is quite unstable. The alternative is to use libClang, which is 124 | the stable, high-level C API provided by clang. 125 | 126 | The first option we have is to use the `RecursiveASTVistor` technique. For this 127 | method, we subclass the `RecursiveASTVistor` class, which declares overridable 128 | methods that we can redefine for custom behavior. Among others, the class 129 | provides methods to stop the recursive descent of the AST for `Decl`s 130 | (`visitDecl`), `Type`s (`visitType`) and more. Every such function should return 131 | `true` or `false` to tell the traversal engine to continue or stop traversing. 132 | This could look like so: 133 | 134 | ```cpp 135 | class FindNamedCallVisitor : public RecursiveASTVisitor { 136 | public: 137 | explicit FindNamedCallVisitor(ASTContext* Context, std::string fName) 138 | : Context(Context), fName(fName) {} 139 | 140 | // We want to stop the traversal for call expressions. 141 | bool VisitCallExpr(CallExpr* CallExpression) { 142 | // Grab the function type 143 | QualType q = CallExpression->getType(); 144 | const Type* t = q.getTypePtrOrNull(); 145 | 146 | if (t != NULL) { 147 | FunctionDecl* func = CallExpression->getDirectCallee(); 148 | const std::string funcName = func->getNameInfo().getAsString(); 149 | 150 | // If this is the function we are looking for 151 | if (fName == funcName) { 152 | // Grab the source location (FullSourceLoc = SourceLocation + SourceManager) 153 | FullSourceLoc FullLocation = 154 | Context->getFullLoc(CallExpression->getLocStart()); 155 | 156 | if (FullLocation.isValid()) 157 | llvm::outs() << "Found call at " 158 | << FullLocation.getSpellingLineNumber() << ":" 159 | << FullLocation.getSpellingColumnNumber() << "\n"; 160 | } 161 | } 162 | 163 | return true; 164 | } 165 | 166 | private: 167 | ASTContext *Context; 168 | std::string fName; 169 | }; 170 | ``` 171 | 172 | We would then furthermore require a `Consumer` and `FrontendAction`. The second 173 | option is to use `ASTMatcher`s, whose API provides a sweet DSL for querying the 174 | AST and matching nodes of particular qualities. For this, you would first define 175 | a query and then subclass `MatchFinder::MatchCallback`, which lets you override 176 | its `run()` method, to which LibTooling will then pass every matched node in the 177 | AST. The above example would be implemented like so with ASTMatchers: 178 | 179 | ```cpp 180 | 181 | // Look for all function calls to `func` and bind the result to the name `call` 182 | // so that we can later reference this part of the match. 183 | StatementMatcher CallMatcher = 184 | callExpr(callee(functionDecl(hasName("func")).bind("call"))); 185 | 186 | // ... 187 | 188 | class MyCallback : public MatchFinder::MatchCallback { 189 | public: 190 | virtual void run(const MatchFinder::MatchResult &Result) { 191 | ASTContext* Context = Result.Context; 192 | if (const CallExpr* E = 193 | Result.Nodes.getNodeAs("call")) { 194 | FullSourceLoc FullLocation = Context->getFullLoc(E->getLocStart()); 195 | if (FullLocation.isValid()) { 196 | llvm::outs() << "Found call at " << FullLocation.getSpellingLineNumber() 197 | << ":" << FullLocation.getSpellingColumnNumber() << "\n"; 198 | }; 199 | } 200 | } 201 | }; 202 | ``` 203 | 204 | Note that we can generally only bind names to nodes that sound like nouns, e.g. 205 | `decl`, but not verbs like `hasName`, since those are just attributes. Also note 206 | that one of the nice things about having this interface work with functors is 207 | that we can maintain state in the matcher. 208 | 209 | The last option we have to traverse the clang AST is libClang, the stable C 210 | interface to clang. There are two central concepts w.r.t. AST traversal in 211 | libClang. The first is *cursors*, which are basically pointers to nodes in the 212 | AST. When traversing the AST, we will deal with and operate on cursors. Using 213 | these cursors, we can then get the spelling (name) of a node or its source 214 | location. Next to cursors, the other important concept is the visitation 215 | pattern, which you can use to implement operations over nodes. 216 | 217 | Basically, the steps to traverse the AST with libClang are the following: 218 | 219 | 1. Create an *index* with `clang_CreateIndex()`. An index represents a group of translation units that we are interested in. 220 | 2. Create a `CXTranslationUnit` using `clang_parseTranslationUnit` to load the AST of a file into your program. 221 | 3. Define a visitor returning a `CXChildVisitResult` and taking a `current` and `parent` cursor and optionally any additional `CXClientData` (`void*`) user data. 222 | 223 | The first two steps would look something like this: 224 | 225 | ```cpp 226 | #include 227 | #include 228 | 229 | auto main(int argc, const char* argv[]) -> int { 230 | if (argc < 2) { 231 | std::cout << "Invalid number of arguments!" << std::endl; 232 | } 233 | 234 | // excludeDeclsFromPCH = 1, displayDiagnostics = 1 235 | CXIndex Index = clang_createIndex(1, 1); 236 | 237 | // Expected arguments: 238 | // 1) The index to add the translation unit to, 239 | // 2) The name of the file to parse, 240 | // 3) A pointer to strings of further command line arguments to compile, 241 | // 4) The number of further arguments to compile, 242 | // 5) A pointer to a an array of CXUnsavedFiles structs, 243 | // 6) The number of CXUnsavedFiles (buffers of unsaved files) in the array, 244 | // 7) A bitmask of options. 245 | CXTranslationUnit TU = clang_parseTranslationUnit( 246 | Index, 247 | argv[1], 248 | argv + 2, 249 | argc - 2, 250 | nullptr, 251 | 0, 252 | CXTranslationUnit_SkipFunctionBodies); 253 | 254 | // RAII? 255 | clang_disposeTranslationUnit(TU); 256 | clang_disposeIndex(Index); 257 | } 258 | ``` 259 | 260 | After parsing the TU, we get an AST. We then grab the cursor to the root of the AST, which is the `TranslationUnitDecl`: 261 | 262 | ```cpp 263 | CXCursor cursor = clang_getTranslationUnitCursor(translationUnit); 264 | ``` 265 | 266 | And define and visit a function: 267 | 268 | ```cpp 269 | CXChildVisitResult visit(CXCursor cursor, CXCursor, CXClientData) { 270 | } 271 | 272 | // ... 273 | 274 | clang_visitChildren(cursor, visit, nullptr); 275 | ``` 276 | 277 | Finally we can define the following `visit` function to look for all functions 278 | with the name `foo` in them: 279 | 280 | ```cpp 281 | CXChildVisitResult visit(CXCursor cursor, CXCursor, CXClientData) { 282 | CXCursorKind kind = clang_getCursorKind(cursor); 283 | 284 | // We are looking for functions or methods with the name 'foo' in them. 285 | if (kind == CXCursorKind::CXCursor_FunctionDecl || 286 | kind == CXCursorKind::CXCursor_CXXMethod) { 287 | // The display name is sometimes more descriptive than the spelling name 288 | // (which is just the source code). 289 | auto cursorName = clang_getCursorDisplayName(cursor); 290 | 291 | auto cursorNameString = std::string(clang_getCString(cursorName)); 292 | if (cursorNameString.find("foo") != std::string::npos) { 293 | // Grab the source range, i.e. (start, end) SourceLocation pair. 294 | CXSourceRange range = clang_getCursorExtent(cursor); 295 | 296 | // Grab the start of the range. 297 | CXSourceLocation location = clang_getRangeStart(range); 298 | 299 | // Decompose the SourceLocation into a location in a file. 300 | CXFile file; 301 | unsigned int line; 302 | unsigned int column; 303 | clang_getFileLocation(location, &file, &line, &column, nullptr); 304 | 305 | // Get the name of the file. 306 | auto fileName = clang_getFileName(file); 307 | 308 | std::cout << "Found function" 309 | << " in " << clang_getCString(fileName) 310 | << " at " << line 311 | << ":" << column 312 | << std::endl; 313 | 314 | // Manual cleanup! 315 | clang_disposeString(fileName); 316 | } 317 | 318 | // Manual cleanup! 319 | clang_disposeString(cursorName); 320 | } 321 | 322 | return CXChildVisit_Recurse; 323 | } 324 | ``` 325 | -------------------------------------------------------------------------------- /choosing-the-tool.md: -------------------------------------------------------------------------------- 1 | # Choosing the Right Tool for the Job 2 | 3 | There are three main options to build a tool using the clang library 4 | infrastructure: *libClang*, *plugins* and *libTooling*. These are discussed in 5 | more detail in the following paragraphs. First a description of each option is 6 | provided, followed by a listing of pros and cons. 7 | 8 | ## LibClang 9 | 10 | One of the core elements of the LLVM development philosophy is to continuously 11 | push for progress and don't shy away from breaking the eggs. If widespread 12 | changes to the clang AST or the LLVM IR are deemed necessary to move the project 13 | forward, then those changes are made and clients are forced to react. However, 14 | to nevertheless be able to provide a stable API to clients who need it and 15 | especially those clients who don't need all the nitty gritty details or 16 | intricacies of the clang internals, the project maintains high level C APIs 17 | which have very stable and dependable APIs. C is presumably chosen because it is 18 | a simple language and especially because bindings to other languages like Python 19 | are easily developed. *libClang* is one example of such a stable, high level C 20 | API. It has a relatively abstract and simple interface to traverse the AST and 21 | query it for basic information like the names or types of a node. 22 | 23 | libClang is especially well suited to editor integration, as it also has an API 24 | for simple code completion. 25 | 26 | + easy to use from other languages, 27 | + very stable and backwards compatible API, 28 | + very high level and easy to use, 29 | 30 | - does not provide full control over the AST. 31 | 32 | ## Clang Plugins 33 | 34 | *Clang Plugins* are more sophisticated C++ tools written with the goal of 35 | compiling them into shared libraries that can be linked into the compilation 36 | process dynamically. Clang plugins are well suited to simple linters or other tools that do not need full context over the entire build process. The disadvantage is that they are run over each translation unit individually and thus cannot maintain state across TUs (code bases). 37 | 38 | + runs whenever any dependency changes, 39 | + can make or break a build, 40 | + gives you complete control over the AST. 41 | 42 | - does not have full context over the entire build process 43 | - does not get access to any of the infrastructure around loading a file, so therefore also cannot emit transformed source after processing a file (like you would want to do with a S2S-tool like clang-format). 44 | 45 | ## LibTooling 46 | 47 | While clang plugins can only run over individual files, *LibTooling* tools have 48 | access to the entire compilation process and are thus the most powerful option 49 | among the three. They can be used for static analysis, refactoring and 50 | source-to-source transformation purposes. 51 | 52 | + get full context about the build process and can therefore decide what files to run on, 53 | + gives you full control over the AST. 54 | 55 | - unstable API (!) 56 | - relatively low level (less so with ASTMatchers) 57 | 58 | Note that the main difference between LibTooling and plugins is that the former 59 | produce standalone tools, while the latter would have to be invoked as part of 60 | your build process and thus have to rerun on every (dependency) change. 61 | 62 | ## ClangTidy 63 | 64 | ... 65 | -------------------------------------------------------------------------------- /clang-tidy-checks.md: -------------------------------------------------------------------------------- 1 | # Clang Tidy Checks 2 | 3 | http://bbannier.github.io/blog/2015/05/02/Writing-a-basic-clang-static-analysis-check.html 4 | 5 | Clang Tidy is a linting and static analysis tool built on top of the LibTooling 6 | infrastructure. It also provides the ability to very easily add a check of your 7 | own, which you can then run using your own (patched) build of clang-tidy, or 8 | attempt to merge upstream. 9 | 10 | You can find the clang-tidy source `tools/clang/tools/extra/clang-tidy`. This is 11 | where you'll also see a file `add_new_check.py`, which you can pass a category 12 | and the name of your check (using kebab-case). The script will then generate all 13 | the necessary testing and plugin boilerplate for you to get working on the 14 | plugin right away. 15 | 16 | For example, if we run `python add_new_check.py misc virtual-shadowing`, this 17 | will generate `VirtualShadowingCheck.h` and `VirtualShadowingCheck.cpp` in the 18 | clang tidy source tree as your header and implementation files. If you rebuild 19 | LLVM/clang, you can then already run your check with `clang-tidy 20 | -checks='-*,misc-virtual-shadowing' `. The boilerplate for the check will look something like this: 21 | 22 | ```cpp 23 | class VirtualShadowingCheck : public ClangTidyCheck { 24 | public: 25 | VirtualShadowingCheck(StringRef Name, ClangTidyContext* Context) 26 | : ClangTidyCheck(Name, Context) {} 27 | void registerMatchers(ast_matchers::MatchFinder* Finder) override; 28 | void check(const ast_matchers::MatchFinder::MatchResult &Result) override; 29 | }; 30 | ``` 31 | 32 | The two methods `registerMatchers()` and `check()` you see here are all we need 33 | to define our own check. The former is where you register the `ASTMatchers` you 34 | want to match on and the latter is the callback clang-tidy will call for every 35 | matched AST node. If we were looking for virtual functions which shadow 36 | non-virtual functions of the same name in the parent, we could write something 37 | like this for a matcher inside `registerMatchers` in 38 | `VirtualShadowingCheck.cpp`: 39 | 40 | ```cpp 41 | void VirtualShadowingCheck::registerMatchers(MatchFinder* Finder) { 42 | // Find all virtual methods 43 | Finder->addMatcher(cxxMethodDecl(isVirtual()).bind("method"), this); 44 | } 45 | ``` 46 | 47 | For the `check()`, we will first want a test case. `add_new_check.py` already 48 | generated the test boilerplate for us under 49 | `tools/clang/tools/extra/test/clang-tidy/misc-virtual-shadowing.cpp`: 50 | 51 | ```cpp 52 | // RUN: %check_clang_tidy %s misc-virtual-shadowing %t 53 | 54 | // FIXME: Add something that triggers the check here. 55 | void f(); 56 | // CHECK-MESSAGES: :[[@LINE-1]]:6: warning: function 'f' is insufficiently awesome [misc-virtual-shadowing] 57 | 58 | // FIXME: Verify the applied fix. 59 | // * Make the CHECK patterns specific enough and try to make verified lines 60 | // unique to avoid incorrect matches. 61 | // * Use {{}} for regular expressions. 62 | // CHECK-FIXES: {{^}}void awesome_f();{{$}} 63 | 64 | // FIXME: Add something that doesn't trigger the check here. 65 | void awesome_f2(); 66 | ``` 67 | 68 | We can see that the LLVM/clang test driver uses comments in the source code to 69 | drive its test expectations. We use the `CHECK-MESSAGES` directive to set 70 | expectations for diagnostics and `CHECK-FIXES` to set expectations for fixes. At 71 | the top, the `RUN` directive tells the driver how to execute the test. For our 72 | purposes, we'll change this to the following tests: 73 | 74 | ```cpp 75 | // RUN: $(dirname %s)/check_clang_tidy.sh %s misc-virtual-shadowing %t 76 | // REQUIRES: shell 77 | 78 | struct A { 79 | void f() {} 80 | }; 81 | 82 | struct B : public A { 83 | // CHECK-MESSAGES: :[[@LINE+1]]:3: warning: method hides non-virtual method from a base class [misc-virtual-shadowing] 84 | virtual void f() {} // problematic 85 | }; 86 | 87 | struct C { 88 | virtual void f() {} // OK(1) 89 | }; 90 | 91 | struct D : public C { 92 | virtual void f() {} // OK(2) 93 | }; 94 | ``` 95 | 96 | Here you can see code examples with inline test directives. Basically, clang's 97 | test harness will check that the specified diagnostics are emitted at the 98 | specified line and column (`@LINE+1`:3). Note also how we've used `OK` to denote 99 | negative examples, i.e. cases where the check should not trigger. 100 | 101 | We can now write the body of our `check`: 102 | 103 | ```cpp 104 | void VirtualShadowingCheck::check( 105 | const ast_matchers::MatchFinder::MatchResult &Result) { 106 | 107 | // Returns a CXXMethodDecl* which will be alive as long as the translation 108 | // unit is loaded, which means we can keep store it. 109 | const auto* Method = Result.Nodes.getNodeAs("method"); 110 | const auto* ClassDecl = Method->getParent(); 111 | 112 | if (ClassDecl->getNumBases() == 0) 113 | return; 114 | 115 | bool AnyBaseMatches = ClassDecl->forallBases([TargetName = Method->getName()]( 116 | const CXXRecordDecl* Base) { 117 | for (const auto* BaseMethod : Base->methods()) { 118 | if (BaseMethod->getName() == TargetName) { 119 | return !BaseMethod->isVirtual(); 120 | } 121 | } 122 | return false; 123 | }); 124 | 125 | if (AnyBaseMatches) { 126 | diag(Method->getLocStart(), 127 | "method hides non-virtual method from a base class"); 128 | } 129 | } 130 | ``` 131 | 132 | The method basically grabs the class declaration of each matched method and then 133 | traverses all direct and indirect base classes of that class, looking for one 134 | with a method that has the same name but is not declared virtual. 135 | -------------------------------------------------------------------------------- /control-flow.md: -------------------------------------------------------------------------------- 1 | # Control Flow Analysis 2 | -------------------------------------------------------------------------------- /diagnostics.md: -------------------------------------------------------------------------------- 1 | # Diagnostics 2 | 3 | One of the core strong points of the clang compiler project, in comparison to 4 | GCC, is its user-friendly error messages. Clang tries [very 5 | hard](https://clang.llvm.org/diagnostics.html) to emit correct, precise and 6 | helpful warnings and errors in all kinds of situations. One nice thing is that 7 | next to warnings and errors (generally *diagnostics*), clang also has the 8 | ability to provide correction hints with a very low false positive rate. Such 9 | corrections are termed *FixItHints*. 10 | 11 | Of course, since clang is built as a library, we also have access to these 12 | diagnostics and FixItHints facilities. There is little official documentation on 13 | this, but we can gain some interesting insight into how diagnostics work inside 14 | clang by looking at the source code directly. 15 | 16 | Most of the diagnostics code can be found in `clang/Basic/Diagnostic.h`, which 17 | (unfortunately) contains many classes for FixItHints and diagnostics reporting. 18 | 19 | ## Classes 20 | 21 | ### DiagnosticsEngine 22 | 23 | http://clang.llvm.org/doxygen/classclang_1_1DiagnosticsEngine.html 24 | 25 | The `DiagnosticsEngine` class is the main actor in the clang's 26 | diagnostic-reporting architecture. It mainly handles "meta" aspects of the 27 | diagnostics mechanism, such as whether warnings should be reported as errors, 28 | what the limit on number of template instantiations in a diagnostic is, whether 29 | errors should be printed as a tree, whether colors should be used, to what 30 | extent the compiler should do spell-checking and other options. Besides, the 31 | `DiagnosticsEngine` combines two further parts of the diagnostics: 32 | `DiagnosticsConsumer`s and `DiagnosticBuilder`s. The former are arbitrary 33 | consumers of diagnostic output which are derived from `DiagnosticConsumer`. The 34 | latter is a lightweight class to build error messages. This process can be 35 | initiated by using `DiagnosticsEngine::Report`, which allows you to report an 36 | error or warning and also attach a `FixItHint`. The `DiagnosticEngine` also 37 | gives you access to the number of warnings and errors reported so far. 38 | 39 | An important `enum` defined inside the `DiagnosticEngine` is the diagnostic 40 | *level* (`Level`). Every diagnostic that is emitted is associated with a certain 41 | level, such as `Warning`, `Error` or `Fatal`. On the one hand, this level 42 | determines how the consumer will ultimately output or format the diagnostic. On 43 | the other hand, it also controls whether or not the diagnostic is emitted at 44 | all, since the `DiagnosticsEngine` can be configured to ignore all warnings, for 45 | example. 46 | 47 | ### Diagnostic 48 | 49 | The `Diagnostic` class encapsulates a shared pointer to a `DiagnosticEngine` and 50 | delegates calls to it. It is especially useful to store information about the 51 | currently "in-flight" error or warning (i.e. the one currently being built while 52 | the `DiagnosticBuilder` hasn't been destructed yet). It is primarily used by 53 | `DiagnosticConsumer`s, as they have a method `HandleDiagnostic()`', which takes 54 | a `Diagnostic` object along with the level at which to emit that diagnostic. 55 | 56 | In general, a diagnostic consist of the following pieces of information: 57 | 58 | 1. A unique ID, so that you can query the `DiagnosticEngine` for more information associated with that ID. For even more information about a diagnostic ID, you can use the `DiagnosticIDs` class. 59 | 2. The level at which the diagnostic will be emitted (`DiagnosticIDs::Level`). 60 | 3. A location in the source code at which the error occurred (`FullSourceLoc`). This is also where the caret will ultimately show up. 61 | 4. A message, simply as a `std::string`. 62 | 5. A vector of `CharSourceRange`s, that the consumer can make use of. 63 | 6. A vector of `FixItHint`s, to recommend corrections. 64 | 65 | ### DiagnosticBuilder 66 | 67 | The `DiagnosticBuilder` is a very lightweight class that is part of the 68 | error-emitting protocol. Basically, when you want to emit an error, you will 69 | first request a unique diagnostic ID from the diagnostics engine. You create 70 | this ID by calling `DiagnosticEngine::createCustomDiagID` and passing a message 71 | and a level. Within the message you pass, you can write placeholders like `%0` 72 | or `%1` (like `{0}` or `{1}` in Python), which you can then format using the 73 | `Builder`. Moreover, you can pass along FixIt hints and source ranges, to be 74 | highlighted by the final consumer (renderer). An example in code would look like 75 | this: 76 | 77 | ```cpp 78 | void EmitWarning(DiagnosticsEngine& DE, const Decl& MyDecl) { 79 | auto ID = DE.getCustomDiagID(DiagnosticsEngine::Warning, 80 | "variable %0 is bad"); 81 | 82 | DE.Report(MyDecl->getLocStart(), ID) 83 | .AddString(MyDecl->getNameAsString()); 84 | 85 | // or ... 86 | 87 | auto builder = DE.Report(MyDecl->getLocStart(), ID); 88 | 89 | builder << MyDecl->getNameAsString(); 90 | 91 | // or ... 92 | 93 | builder.AddTaggedVal(MyDecl->getNameAsString(), 94 | DiagnosticsEngine::ArgumentKind::ak_std_string); 95 | 96 | // Emit the diagnostic regardless of suppression settings 97 | // builder.setForceEmit(); 98 | } 99 | ``` 100 | 101 | Note that the builder will emit the diagnostic (internally, send it to the 102 | diagnostics engine) *in the destructor*. Note that there also exist a set of 103 | predefined diagnostic types with associated ID and (sometimes parameterized) 104 | messages. You can find these in the `*.td` files under `include/clang/Basic/`. 105 | Some examples include: 106 | 107 | * `def err_fe_pch_file_overridden : Error< 108 | "file '%0' from the precompiled header has been overridden">;` (`DiagnosticSerializationKinds.td`) 109 | * `def note_constexpr_no_return : Note< 110 | "control reached end of constexpr function">;` (`DiagnosticASTKinds.td`) 111 | * `def note_constexpr_lshift_of_negative : Note<"left shift of negative value %0">;` (`DiagnosticASTKinds.td`) 112 | * `def warn_nested_block_comment : Warning<"'/*' within block comment">, 113 | InGroup;` 114 | 115 | With these, you can make really short error reports like so: 116 | 117 | ```cpp 118 | DE.Report(Location, diag::note_fixit_applied); 119 | ``` 120 | 121 | By the way, inside this folder you also find `DiagnosticsGroup.td`, which 122 | includes the definitions of all warning groups. Here you will find out, for 123 | example, that `-Wall` does not actually enable all warnings, but only a 124 | relatively small subset. 125 | 126 | ### FixItHints 127 | 128 | A *FixItHint* is a suggestion to the user about a particular region of source 129 | code that could be inserted, deleted or modified to definitely fix a compiler 130 | error. The false positive rate for such errors should be very small, i.e. the 131 | hint should almost always be correct for the given situation. We can create our 132 | own FixIt hints very easily and add them to a `DiagnosticBuilder` when we are 133 | emitting a diagnostic. 134 | 135 | Internally, a FixIt hint consists of: 136 | 137 | * A `RemoveRange`, which is a `CharSourceRange` of source code that we want to recommend for deletion 138 | * A `CharSourceRange` of existing source code that should be inserted at a particular insertion location. This is useful for modification. 139 | * A (possibly empty) `std::string` of code to insert. 140 | * A boolean flag, indicating whether or not the current insertion should be inserted before (true) previous insertions within that range (or else precisely at the specified location?). 141 | 142 | To actually create a `FixItHint`, you use various factory functions. These are: 143 | 144 | * `CreateInsertion`: Given a `SourceLocation` where to insert, a `StringRef` of code and the `BeforePreviousInsertions` flag, create a FixIt hint for a pure insertion; 145 | * `CreateInsertionFromRange`: Given a `SourceLocation` where to insert, a `CharSourceRange` of existing source code to insert from and the `BeforePreviousInsertions` flag, suggest an insertion of that code range at the location; 146 | * `CreateRemoval`: Given only a `CharSourceRange` (or, in an overload, a `SourceRange`), suggest the removal of that range; 147 | * `CreateReplacement`: Given a `CharSourceRange` (or `SourceRange`) and a `StringRef` of code, suggest the replacement of the range with the string. 148 | -------------------------------------------------------------------------------- /libclang.md: -------------------------------------------------------------------------------- 1 | # LibClang 2 | 3 | ## Parsing C++ in Python with Clang 4 | 5 | http://eli.thegreenplace.net/2011/07/03/parsing-c-in-python-with-clang 6 | 7 | [LibClang](https://clang.llvm.org/doxygen/group__CINDEX.html) is a high-level C 8 | API to clang. It allows us to do many things we could also do with LibTooling, 9 | but provides a much lighter interface. Most importantly, because this interface 10 | is so high-level, it is guaranteed to be (quite) stable. This is especially 11 | useful for editors or other applications that want to use clang's powerful AST 12 | representation for syntax-highlighting, code-completion or refactoring, but 13 | don't want to be at the mercy of the clang upstream. 14 | 15 | More precisely, libClang is a shared library that packages clang into a 16 | high-level API for AST-traversal. What is nice is that next to the core C API, 17 | there are also Python bindings (and [Go](https://github.com/go-clang/v3.9)). 18 | 19 | To access libClang from Python, the `libClang` dynamic library (`.dylib` or 20 | `.so`) must be available (visible) from wherever you want to import the Python 21 | bindings. Once this requirement is satisfied, you can import `clang.cindex`: 22 | 23 | ```python 24 | import clang.cindex as clang 25 | ``` 26 | 27 | libClang's Python bindings work on translation units. We would then usually 28 | create an *index* on those translation units, so we can access the parsed AST. 29 | An index is actually a group of translation units (TUs) that can be compiled and 30 | linked together. This function enables cross-TU referencing. To create an index 31 | on a set of TUs in Python, we use the following code: 32 | 33 | ```python 34 | index = clang.Index.create() 35 | tu = index.parse(sys.argv[1]) 36 | ``` 37 | 38 | where `clang.cindex` is the Python binding module. `Index.create()` calls the 39 | function `clang_createIndex()` in C. The method `index.parse()` then calls 40 | `clang_parseTranslationUnit`, which is the main entry point to process a TU with 41 | libClang. The result of `index.parse()` is a `CXTranslationUnit` in C and a 42 | `TranslationUnit` in Python. This `TranslationUnit` has many exciting properties 43 | that we can query and manipulate. 44 | 45 | The most important attribute of the `TranslationUnit` is its `cursor`. A cursor 46 | is a pointer/iterator that points to some node in an AST. It abstracts away the 47 | differences between entities in the AST (`Decl` vs `Expr` vs `Type`) and 48 | provides a *unified interface to manipulating the AST*. 49 | 50 | The most interesting attributes of a cursor are: 51 | 52 | * `kind`: specifying the kind of node the cursor is currently pointing at, 53 | * `spelling`: the name of the node (e.g. the name of the class or variable), 54 | * `location`: the location of the node in the source code (row/column), 55 | * `get_children`: the children of the node, to allow for further traversal. 56 | 57 | For `get_children`, the Python and C APIs diverge slightly. The C APIs work with 58 | visitation functions, where you define a function taking the current node and 59 | some context data (such as the parent of the node or some "client data" you 60 | supply yourself) and then walk the tree *for you*. In the Python API, we 61 | *traverse* the AST ourselves (in a very simple and intuitive way). For example, 62 | given the following C/C++ code: 63 | 64 | ```cpp 65 | int main() { int x = 42; } 66 | ``` 67 | 68 | We can use the following recursive function to traverse the AST and print the kinds of every node we find: 69 | 70 | ```python 71 | def visit(cursor): 72 | print('{0} ({1})'.format(cursor.spelling, cursor.kind)) 73 | for child in cursor.get_children(): 74 | visit(child) 75 | ``` 76 | 77 | The Python bindings use [ctypes](https://docs.python.org/3.6/library/ctypes.html) to call the *libclang* shared library. 78 | 79 | # Baby steps with libClang 80 | 81 | http://bastian.rieck.ru/blog/posts/2015/baby_steps_libclang_ast/ 82 | 83 | An *index* groups multiple translation units together. Interestingly, with libclang, we need not necessarily parse the code ourselves, in the program. Rather, we can export an AST file with clang like so: 84 | 85 | ```shell 86 | $ clang -emit-ast 87 | ``` 88 | 89 | This will yield a file called `.ast`, which we can then load into libclang with the following program: 90 | 91 | ```cpp 92 | #include 93 | 94 | int main(int argc, char const* argv[]) { 95 | if (argc != 2) return 1; 96 | 97 | CXIndex index = clang_createIndex(0, 1); 98 | CXTranslationUnit tu = clang_createTranslationUnit(index, argv[1]); 99 | 100 | if (tu == nullptr) return 1; 101 | 102 | CXCursor root = clang_getTranslationUnitCursor(tu); 103 | 104 | clang_disposeTranslationUnit(tu); 105 | clang_disposeIndex(index); 106 | return 0; 107 | } 108 | ``` 109 | 110 | libclang makes heavy use of the *visitor pattern*. As such, when we want to 111 | traverse the AST (via cursors), we need to define a visitor function. It must 112 | have the following signature: 113 | 114 | ```cpp 115 | CXChildVisitResult visitor(CXCursor cursor, CXCursor parent, CXClientData clientData); 116 | ``` 117 | 118 | The relevant parts here are: 119 | 120 | * `CXChildVisitResult`: A structure that tells libclang whether to continue visiting child nodes after a visitation. There are constants defined that you can return as valid values for this: `CXChildVisit_Break` to stop traversal, `CXChildVisit_Continue` to continue traversing the *siblings* of the current node and `CXChildVisit_Recurse` to continue recursing through the children of the current node; 121 | * The `cursor`, which points to the node currently being traversed; 122 | * The `parent`, which points to the parent of the node (or `NULL`, if none exists), 123 | * Some `clientData`, which is a `void*` to any data you want to pass along during the traversal. 124 | 125 | Once you have your visitation function defined, you can use 126 | `clang_visitChildren` to traverse the tree. You pass it the root cursor from 127 | which you want to start recursing, the visitor function/callback as well as 128 | optionally any user data you want to take with you during the traversal. This allows us to write something like this: 129 | 130 | ```cpp 131 | #include 132 | #include 133 | #include 134 | 135 | std::string getCursorKindName(CXCursorKind cursorKind) { 136 | CXString kindName = clang_getCursorKindSpelling(cursorKind); 137 | std::string result = clang_getCString(kindName); 138 | 139 | clang_disposeString(kindName); 140 | return result; 141 | } 142 | 143 | std::string getCursorSpelling(CXCursor cursor) { 144 | CXString cursorSpelling = clang_getCursorSpelling(cursor); 145 | std::string result = clang_getCString(cursorSpelling); 146 | 147 | clang_disposeString(cursorSpelling); 148 | return result; 149 | } 150 | 151 | CXChildVisitResult visit(CXCursor cursor, CXCursor, CXClientData data) { 152 | CXSourceLocation location = clang_getCursorLocation(cursor); 153 | if (clang_Location_isFromMainFile(location) == 0) { 154 | return CXChildVisit_Continue; 155 | } 156 | 157 | CXCursorKind cursorKind = clang_getCursorKind(cursor); 158 | 159 | const auto currentLevel = *(reinterpret_cast(data)); 160 | 161 | std::cout << std::string(currentLevel, '-') << " " 162 | << getCursorKindName(cursorKind) 163 | << " (" << getCursorSpelling(cursor) << ")" 164 | << "\n"; 165 | 166 | auto nextLevel = currentLevel + 1; 167 | 168 | clang_visitChildren(cursor, visit, &nextLevel); 169 | 170 | return CXChildVisit_Continue; 171 | } 172 | 173 | int main(int argc, char const* argv[]) { 174 | if (argc != 2) { 175 | std::cout << "Wrong number of arguments" << std::endl; 176 | return 1; 177 | } 178 | 179 | CXIndex index = clang_createIndex(1, 1); 180 | CXTranslationUnit tu; 181 | 182 | CXErrorCode rc = clang_createTranslationUnit2(index, argv[1], &tu); 183 | 184 | if (!tu) { 185 | std::cout << "TU is null" << std::endl; 186 | return 1; 187 | } 188 | 189 | CXCursor root = clang_getTranslationUnitCursor(tu); 190 | 191 | unsigned level = 0; 192 | clang_visitChildren(root, visit, &level); 193 | 194 | std::cout << std::flush; 195 | 196 | clang_disposeTranslationUnit(tu); 197 | clang_disposeIndex(index); 198 | return 0; 199 | } 200 | ``` 201 | 202 | You would pass to this an AST or PCH (pre-compiled header) file. The former can be created using `clang++ -emit-ast -- `. 203 | 204 | https://clang.llvm.org/doxygen/group__CINDEX.html#ga51eb9b38c18743bf2d824c6230e61f93 205 | 206 | ## Exploring the Source 207 | 208 | A general overview of libclang can be found here: https://clang.llvm.org/doxygen/group__CINDEX.html 209 | 210 | The source code (or at least the documentation) is organized into the following 211 | sections: 212 | 213 | - *Compilation database functions* allow reading, manipulating and querying 214 | compilation databases. For example, you can get the compile arguments for a 215 | particular file. A key entity here is the `CXCompileCommands` class. 216 | 217 | - *String manipulation routines* allow reading and disposing `CXStrings`, which 218 | are libClang's representation of strings (you usually want to convert them to C 219 | strings with `clang_getCString()`). 220 | 221 | - *File access functions* contain methods for querying the filename, 222 | modification time and unique ID of files. An abstraction here is the `CXFile`, 223 | which is probably the C version of `FileEntry`. For example, you can get the 224 | `CXFile` for a translation unit. 225 | 226 | - Routines for managing *source locations* contain the usual methods to 227 | manipulate `SourceLocation`s and `SourceRange`s, here prefixed with `CX`. There 228 | are methods to compare source locations for equality, check if a location is in 229 | a system or user-defined file (header) and handle the spelling/expansion 230 | locations of macros. 231 | 232 | - The *diagnostics infrastructure* like `clang_getDiagnostic` which takes a 233 | translation unit and an index for the diagnostic, which returns a `CXDiagnostic` 234 | data structure. This is the interface to the diagnostics functionality of clang, 235 | which is very powerful. 236 | 237 | - Functions to *parse translation units*, which we usually use at the very 238 | beginning of our libClang programs to get an abstract syntax tree representation 239 | of some C/C++/Objective-C source file. 240 | 241 | - *Cursor manipulation and access functions*, including essential functions 242 | like `clang_getCursorKind` or `clang_isTranslationUnit`. 243 | 244 | - Functions to *map between source locations and cursors* like 245 | `clang_getCursor` which maps a translation unit and source locations to the most 246 | specific cursor clang can find for that location. Also has a method 247 | `clang_getCursorExtent` which returns a `CXSourceRange` describing the source 248 | range of the token under the cursor. 249 | 250 | - Functions to *gain information about type* (i.e. language type, like 251 | `int`) of a node (cursor). The most important function is probably 252 | `clang_getCursorType`, which gets the `CXType` for a cursor. 253 | 254 | - *Cross-referencing routines* to get the string names or representations of 255 | cursor, canonical cursors (e.g. the one declaration that also defines a 256 | function) or the definition for a cursor pointing to a declaration 257 | (given that definition is in the same translation unit). 258 | 259 | - *Funky functions* to get information about the mangled names of functions or 260 | C++ constructors/destructors. 261 | 262 | - *C++ specific functions* to get information about templates, constructors, 263 | destructors, virtual functions (for example if a method is pure virtual) and 264 | methods in general, i.e. if they are static or const. Also provides information 265 | about fields of a struct or class being `mutable`. 266 | 267 | - The interface to the *lexer and preprocessor*, which gives access to raw 268 | tokenization (quite cool) and getting information about the extent 269 | (`CXSourceRange`), spelling and of course, most importantly, the kind (!) of a 270 | token (e.g. punctuation, keyword, literal, identifier or comment). 271 | 272 | - *Debugging functions* for doing miscellaneous things like enabling stack traces or getting the spelling (string representation) of a cursor kind. 273 | 274 | - *Code completion* functionality for very complex code completion support. 275 | 276 | - More *miscellaneous functions* for querying the version of clang, for 277 | example. Also has functionality to inspect the includes of a file via visitation 278 | (like in `libTooling`), i.e. you can really inspect every `#include` and sort or 279 | verify them, for example. 280 | 281 | Note that libclang has good support for C++ (like templates): 282 | 283 | https://clang.llvm.org/doxygen/group__CINDEX__CPP.html#gafe1f32ddd935c20f0f455d47c05ec5ab 284 | 285 | And it is perfect for code completion libraries, as it has built-in support for 286 | it (very high-level, literally `completeCodeAt`!): 287 | 288 | https://clang.llvm.org/doxygen/group__CINDEX__CODE__COMPLET.html 289 | -------------------------------------------------------------------------------- /libtooling.md: -------------------------------------------------------------------------------- 1 | # LibTooling 2 | 3 | https://kevinaboos.wordpress.com/2013/07/23/clang-tutorial-part-ii-libtooling-example/ 4 | http://clang.llvm.org/docs/RAVFrontendAction.html 5 | 6 | *LibTooling* is the most powerful way to create your own tools with clang. It 7 | is a library and associated infrastructure that lets you write standalone 8 | executables with full control over all stages of file and AST processing. 9 | 10 | We start out with the main file of a hypothetical libTooling executable: 11 | 12 | ```cpp 13 | #include "clang/Tooling/CommonOptionsParser.h" 14 | #include "clang/Tooling/Tooling.h" 15 | 16 | #include "llvm/Support/CommandLine.h" 17 | 18 | // Set up the command line interface 19 | namespace { 20 | llvm::cl::OptionCategory ToolCategory("MyTool"); 21 | 22 | llvm::cl::extrahelp 23 | CommonHelp(clang::tooling::CommonOptionsParser::HelpMessage); 24 | 25 | llvm::cl::extrahelp MoreHelp("My funny tool") 26 | } // namespace 27 | 28 | auto main(int argc, const char* argv[]) -> int { 29 | using namespace clang::tooling; 30 | 31 | CommonOptionsParser OptionsParser(argc, argv, ToolCategory); 32 | ClangTool Tool(OptionsParser.getCompilations(), // compilation database 33 | OptionsParser.getSourcePathList()); 34 | 35 | auto action = newFrontendActionFactory(); 36 | return Tool.run(action.get()); 37 | } 38 | ``` 39 | 40 | As you can see, we first need to setup the basic command line interface. Inside 41 | `main()` we can then use LLVM and clang's command line facilities to find out 42 | about all source files to compile. These are used to construct a `ClangTool`, 43 | which is the driver for our own check (here called `MyTool`). We use the 44 | `newFrontendActionFactory` function template to instantiate our `MyTool::Action` 45 | frontend action, which we will look into next. 46 | 47 | The basic infrastructure (boilerplate) needed to make a tool are the following 48 | components: 49 | 50 | 1. A `FrontendAction`, which is a certain piece of code we want to run on the frontend. 51 | 2. An `ASTConsumer`, which lets us perform actions on the AST. 52 | 3. A `RecursiveASTVisitor` or `ast_matchers::MatchFinder::MatchResult` to do our node matching and eventual transformations. 53 | 54 | These components will be discussed further below. 55 | 56 | ## `FrontendAction` 57 | 58 | In clang's infrastructure, a `FrontendAction` is any operation that can be 59 | performed by the frontend. Such a class will usually derive from 60 | `ASTFrontendAction` (since `FrontendAction` is an abstract base class) and can 61 | then override a number of virtual functions that provide access to the 62 | compilation process at various stages. For example, we can override the 63 | `BeginSourceFileAction` method, which is invoked for each translation unit 64 | supplied to the tool. This will be where we instantiate the subsequent 65 | components in our pipeline. Besides this method, others include for example 66 | `EndSourceFileAction`, which is invoked after the processing of a TU and gives 67 | us the ability to output our transformed code, for example. Also, inside 68 | `BeginInvocation`, we can modify the `CompilerInstance` even before the 69 | processing stage. The compiler instance class holds contextual information about 70 | the compilation process and environment, such as language options (C++ standard) 71 | or configuration of the diagnostics engine. Also, we can specify if the action 72 | is to use the preprocessor only, in which case no AST consumer will be created. 73 | 74 | Our main focus right now is the consumption and processing of the AST, so we 75 | will be looking at the `CreateASTConsumer` method, which is supposed to return a 76 | `Consumer` for the next stage (see below). This looks like so: 77 | 78 | ```cpp 79 | class Action : public clang::ASTFrontendAction { 80 | public: 81 | using ASTConsumerPointer = std::unique_ptr; 82 | 83 | ASTConsumerPointer CreateASTConsumer(clang::CompilerInstance& Compiler, 84 | llvm::StringRef) override; 85 | 86 | bool BeginSourceFileAction(clang::CompilerInstance& Compiler, 87 | llvm::StringRef Filename) override; 88 | 89 | void EndSourceFileAction() override; 90 | }; 91 | ``` 92 | 93 | For reference, we'll also log something before and after processing a file: 94 | 95 | ```cpp 96 | Action::ASTConsumerPointer 97 | Action::CreateASTConsumer(clang::CompilerInstance& Compiler, llvm::StringRef) { 98 | return std::make_unique(); 99 | } 100 | 101 | bool Action::BeginSourceFileAction(clang::CompilerInstance& Compiler, 102 | llvm::StringRef Filename) { 103 | ASTFrontendAction::BeginSourceFileAction(Compiler, Filename); 104 | llvm::outs() << "Processing file '" << Filename << "' ...\n"; 105 | 106 | return true; 107 | } 108 | 109 | void Action::EndSourceFileAction() { 110 | ASTFrontendAction::EndSourceFileAction(); 111 | llvm::outs() << "Finished processing file ...\n"; 112 | } 113 | ``` 114 | 115 | ## `ASTConsumer` 116 | 117 | The next step in the pipeline is the consumer we created inside the 118 | `CreateASTConsumer` method. The consumer itself does not really have that much 119 | functionality. Its main job is to dispatch the AST visitor, which will either be 120 | a `RecursiveASTVisitor` or a match callback when using the `ASTMatcher` library. 121 | The `ASTConsumer` again provides a number of hooks that we can override to 122 | process the AST at various stages or for various kinds of nodes. For example, 123 | the `HandleTranslationUnit` method is the main entry point to a translation 124 | unit -- it is inside this function that we will instantiate our visitor. 125 | Additional overrideable methods include `HandleVTable`, which is called with a 126 | `CXXRecordDecl*` and lets us know that a vtable is required for the given class 127 | (this is more an implementation detail of the clang frontend). On the other 128 | hand, `HandleInlineFunctionDefinition()` is invoked with a `FunctionDecl*` every 129 | time the definition of an inline function was completed. It is used by the code 130 | generator (`CodeGen/CodeGenAction`), for example. 131 | 132 | There are some other classes inside Clang that inherit from and use the 133 | ASTConsumer API, such as the `BackendConsumer`, `CodeGenerator` and 134 | `SemaConsumer`. The first is quite interesting, as it is used by the 135 | `CodeGenAction` to walk the AST and emit some backend representation of the 136 | code, which can be either LLVM bitcode, the LLVM IR in a human-readable version, 137 | assembly or object files. The `BackendConsumer` used by the `CodeGenAction` then 138 | further delegates to a `CodeGenerator` consumer, which performs the actual 139 | generation. Not that there is one subclass of `CodeGenAction` per backend 140 | representation, e.g. `EmitLLVMAction`, is a `CodeGenAction`, using a 141 | `BackendConsumer` that delegates to a `CodeGenerator`. 142 | 143 | Note that `CreateASTConsumer` inside the frontend action is called as part of 144 | `BeginSourceFile` (not `BeginSourceFileAction`), which is a non-virtual (thus 145 | non-overrideable) function. Note that this means a consumer will persist for the 146 | lifetime of a *translation unit*. Any state that you want to keep for longer 147 | than that will have to be kept in the frontend action or be made a static 148 | member. 149 | 150 | `consumer.hpp` 151 | ```cpp 152 | class Consumer : public clang::ASTConsumer { 153 | public: 154 | void HandleTranslationUnit(clang::ASTContext &Context) override; 155 | 156 | private: 157 | MyVisitor Visitor; 158 | }; 159 | ``` 160 | 161 | `consumer.cpp` 162 | ```cpp 163 | void Consumer::HandleTranslationUnit(clang::ASTContext &Context) { 164 | Visitor.TraverseDecl(Context.getTranslationUnitDecl()); 165 | } 166 | ``` 167 | 168 | ## Visitation 169 | 170 | The visitation of the AST nodes is the final and most interesting part of the 171 | process. This is where we can do our actual analysis on particular declarations, 172 | types, statements and so on. The two main ways of doing this right now is to 173 | write a `RecursiveASTVisitor` subclass or use the `ASTMatcher` interface. 174 | 175 | ### `RecursiveASTVisitor` 176 | 177 | We begin by using the `RecursiveASTVisitor` method to traverse the AST and visit 178 | particular nodes of interest. The `RecursiveASTVisitor` is truly powerful and 179 | allows us to visit all basic kinds of AST node (leaving us to do further 180 | filtering and processing of nodes) in pre- or post-order. This is again done by 181 | overriding methods that are invoked with various kinds of nodes. Before looking 182 | more closely at how to do this, let us discuss what a RecursiveASTVisitor does 183 | in general and how it traverses the AST in practice. It does so in three 184 | distinct steps using three distinct classes of methods: 185 | 186 | 1. Assume we are at some node in the AST and have a pointer to the most-derived 187 | type `` of the node's class, e.g. `ParmVarDecl`. Then clang will call 188 | `Traverse`, which: 189 | 1. If traversing in `pre-order`, calls `WalkUpFrom` and *then* recursively visits all children of the node, 190 | 2. Else *first* recursively visits the children of the node and *then* calls `WalkUpFrom`. 191 | 192 | 2. The responsibility of `WalkUpFrom` is then to: 193 | 1. First call `WalkUpFrom`, where `BaseType` is the superclass type of `Type` (which is known), e.g. `WalkUpFromVarDecl` for `ParmVarDecl`. This way we first walk (recurse) up the class hierarchy, 194 | 2. Then "unwind the stack" and call `Visit` on the way down, going from the most-base type (`VisitDecl`) back down to the original most-derived type (`VisitParmVarDecl`). 195 | 196 | 3. `Visit` is the user-overrideable method where we can do the actual heavy lifting. 197 | 198 | The consequence of the `WalkUpFrom` methods first calling 199 | `WalkUpFrom` is that the calling order of `Visit` is from 200 | least-derived (base) classes to most-derived classes, e.g. 201 | `Decl->NamedDecl->VarDecl->ParmVarDecl`. 202 | 203 | Note that most of these `Traverse`, `WalkUpFrom` and `Visit` methods are 204 | generated from types declared in `td` files and included by the preprocessor and 205 | are thus not visible or documented in the online documentation. That is, online 206 | you'll only see `VisitDecl`, but there is indeed also `VisitParmVarDecl`. 207 | 208 | All `Visit` functions are expected to return a boolean which should be `true` if 209 | the traversal is to continue or `false` if it is to stop after returning from 210 | this node. Note that to configure whether the class is to do post-order or 211 | pre-order traversal, you should override the `shouldTraversePostOrder()` 212 | function. By default, it returns false (i.e. does pre-order traversal.) 213 | 214 | `visitor.hpp` 215 | ```cpp 216 | class Visitor : public RecursiveASTVisitor { 217 | public: 218 | virtual bool VisitDecl(Decl* Declaration); 219 | virtual bool VisitStmt(Stmt* Statement); 220 | }; 221 | ``` 222 | 223 | `visitor.cpp` 224 | ```cpp 225 | 226 | bool Visitor::VisitDecl(Decl* Declaration) { 227 | Declaration->dump(llvm::outs()); 228 | } 229 | 230 | bool Visitor::VisitStmt(Stmt* Statement) { 231 | Statement->dump(llvm::outs()); 232 | } 233 | ``` 234 | 235 | ### `ASTMatcher` 236 | 237 | The second option we have for node visitation is to use the ASTMatcher library. 238 | This is actually a much more powerful method when looking for very specific 239 | kinds of nodes. An example is given below. We first have to change what we do 240 | inside the consumer: 241 | 242 | `consumer.hpp` 243 | ```cpp 244 | class Consumer : public clang::ASTConsumer { 245 | public: 246 | void HandleTranslationUnit(clang::ASTContext &Context) override; 247 | }; 248 | ``` 249 | 250 | `consumer.cpp` 251 | ```cpp 252 | void Consumer::HandleTranslationUnit(clang::ASTContext &Context) { 253 | using namespace clang::ast_matchers; 254 | MatchFinder MatchFinder; 255 | Handler Handler; 256 | MatchFinder.addMatcher(functionDecl(), &Handler); 257 | MatchFinder.matchAST(Context); 258 | } 259 | ``` 260 | 261 | and then write our simple visitor class (now as a match callback): 262 | 263 | `visitor.hpp` 264 | ```cpp 265 | class Handler : public clang::ast_matchers::MatchFinder::MatchCallback { 266 | public: 267 | using MatchResult = clang::ast_matchers::MatchFinder::MatchResult; 268 | void run(const MatchResult& Result) override; 269 | }; 270 | ``` 271 | 272 | `visitor.cpp` 273 | ```cpp 274 | void Handler::run(const MatchResult& Result) { 275 | const auto* Function = Result.Nodes.getNodeAs("root"); 276 | Function->dump(llvm::outs()); 277 | } 278 | ``` 279 | -------------------------------------------------------------------------------- /refactoring.md: -------------------------------------------------------------------------------- 1 | # Refactoring 2 | -------------------------------------------------------------------------------- /source-code.md: -------------------------------------------------------------------------------- 1 | # Source Code 2 | 3 | ## AST/ASTContext.hpp 4 | 5 | The ASTContext provides information about an AST as well as additional utilities 6 | that are required any time one is dealing with the AST. It is basically a 7 | "namespace of functions", "instantiated" for the current AST. As such it 8 | provides functions to modify a type (e.g. to add qualifications to it). It also 9 | gives access to the top-level translation unit, the source manager and the 10 | identifier table. 11 | 12 | ## Basic/SourceLocation 13 | 14 | This file declares and defines various classes used to represent locations in 15 | source code. Here, a *location* is really any $(\mathtt{row, column})$ pair in 16 | code. The main `SourceLocation` class is a lightweight wrapper over an encoded 17 | `unsigned int` that refers to such a location. The `SourceManager` can later 18 | resolve the location to the actual row and column. The MSB (bit 31) of the 19 | underlying SourceLocation ID stores whether or not the source location is inside 20 | a macro *expansion*, or in actual (not macro-expanded) code. The ID must be 21 | greater zero to be valid. 22 | 23 | The file also holds the `SourceRange` class, which simply contains two 24 | `SourceLocation`s. Furthermore, the `CharSourceRange` class can represent either 25 | a range of the first and last *character* in a range, or the first character of 26 | the first and last *token* in the range. For this, the class also stores a 27 | boolean to indicate if it is a *token range* (i.e. the range points to tokens, 28 | not characters). 29 | 30 | Next to `SourceLocation`, there exists `FullSourceLoc`, which aggregates a 31 | `SourceLocation` with the reference to a `SourceManager`, allowing access to 32 | spelling/expansion column and line number, for example. 33 | 34 | ## Basic/SourceManager 35 | 36 | The `SourceManager` is a huge class (and associated source files) that handles 37 | loading, storing and providing access to all files in a translation unit. It is 38 | also the main interface for getting more information about a `SourceLocation`. 39 | For example, `getSpellingLoc(SourceLocation)` will get you the actual source 40 | definition of a token, which will be different in the case of a macro expansion. 41 | Also, `getFileID(SourceLocation)` gets you the `FileID` for a token (a `FileID` 42 | can be either an `#included` file or a macro). The `SourceManager` also gives you `SourceLocation`s for the beginning and end of a `FileID`. 43 | 44 | The method `getSpellingLoc` looks like this: 45 | 46 | ```cpp 47 | SourceLocation getSpellingLoc(SourceLocation Loc) const { 48 | // Handle the non-mapped case inline, defer to out of line code to handle 49 | // expansions. 50 | if (Loc.isFileID()) return Loc; 51 | return getSpellingLocSlowCase(Loc); 52 | } 53 | ``` 54 | 55 | Basically, it checks if the source location already is in source code, rather 56 | than in a macro expansion. If so, it just returns the location. Else, it calls 57 | following function: 58 | 59 | ```cpp 60 | SourceLocation SourceManager::getSpellingLocSlowCase(SourceLocation Loc) const { 61 | do { 62 | std::pair LocInfo = getDecomposedLoc(Loc); 63 | Loc = getSLocEntry(LocInfo.first).getExpansion().getSpellingLoc(); 64 | Loc = Loc.getLocWithOffset(LocInfo.second); 65 | } while (!Loc.isFileID()); 66 | return Loc; 67 | } 68 | ``` 69 | 70 | This will first decompose the source location into its `FileID` and offset into 71 | that file. It will then get the `SourceLoc` for the `Loc`. A `SourceLoc` is a 72 | union of a `FileInfo`, holding information about an `#include`-ed file, and a 73 | `ExpansionInfo`, which encodes the start and end of a macro expansion (the name 74 | of the macro invocation, up to the closing `)` or the end of the macro name, if 75 | it is not a function) as well as the spelling location of the macro. We thus 76 | grab the spelling location of the macro and move to the offset of the original 77 | `Loc` within that macro (which is the same stream of tokens (arguments are not 78 | expanded in this representation?)). If this is now real code, we can stop. If it 79 | is yet another macro expansion, we have to do this again. 80 | 81 | ## AST/ASTMatcherMacros 82 | 83 | This file holds the infrastructure for the matchers macro. Basically, it defines 84 | a set of macros that allow you to easily create new macros for the matching DSL. One of the most important macros is the following for `AST_MATCHER`: 85 | 86 | ```cpp 87 | #define AST_MATCHER(Type, DefineMatcher) \ 88 | namespace internal { \ 89 | class matcher_#DefineMatcher#Matcher \ 90 | : public ::clang::ast_matchers::internal::MatcherInterface { \ 91 | public: \ 92 | explicit matcher_#DefineMatcher#Matcher() = default; \ 93 | bool matches(const Type &Node, \ 94 | ::clang::ast_matchers::internal::ASTMatchFinder* Finder, \ 95 | ::clang::ast_matchers::internal::BoundNodesTreeBuilder \ 96 | * Builder) const override; \ 97 | }; \ 98 | } \ 99 | inline ::clang::ast_matchers::internal::Matcher DefineMatcher() { \ 100 | return ::clang::ast_matchers::internal::makeMatcher( \ 101 | new internal::matcher_#DefineMatcher#Matcher()); \ 102 | } \ 103 | inline bool internal::matcher_#DefineMatcher#Matcher::matches( \ 104 | const Type &Node, \ 105 | ::clang::ast_matchers::internal::ASTMatchFinder* Finder, \ 106 | ::clang::ast_matchers::internal::BoundNodesTreeBuilder* Builder) const 107 | ``` 108 | 109 | This macro allows for defining a new matcher with the name `DefineMatcher` that 110 | returns a `Matcher`. We can see this macro has three main parts: 111 | 112 | 1. A class called `matcher_` that declares the matcher interface. 113 | We can also see that each matcher takes a Node of the type we want to match 114 | (e.g. `Decl`), a `ASTMatchFinder`, which is the interface to the entire matching 115 | process (like a "matching context") and lastly a `BoundNodesTreeBuilder`, which 116 | is necessary to bind names to nodes in the AST (this happens in the `MatcherInterface` inside ASTMatcherInternal.h); 117 | 2. The function that you end up calling is defined (`DefineMatcher`). It calls `makeMatcher` in the internal namespace, which just does type deduction and returns a `Matcher`; 118 | 3. Lastly the beginning of the definition the method we define when we use the macro (first declared in the class). 119 | 120 | As such, a simple matcher would look something like this: 121 | 122 | ```cpp 123 | AST_MATCHER(QualType, isInteger) { 124 | return Node->isIntegerType(); 125 | } 126 | ``` 127 | 128 | this matcher would match for all `QualType` nodes. It defines a `matches` method 129 | that returns a `Matcher`. The matcher itself takes no arguments, as we 130 | can see. For more complex queries, we'll want `AST_MATCHER_P`, which allows us 131 | to take one parameter (there exist variants for two, also for overloads): 132 | 133 | ```cpp 134 | AST_MATCHER_P(BinaryOperator, hasLHS, internal::Matcher, InnerMatcher) { 135 | const auto* LeftHandSide = Node.getLHS(); 136 | if (LeftHandSide == nullptr) return; 137 | return InnerMatcher.matches(* LeftHAndSide, Finder, Builder); 138 | } 139 | ``` 140 | 141 | Another point of interest in this file is the way matchers are declared that allow more than one nested matcher in an implicit `allOf()`. These kinds of matchers are basically most `Decl`s and `Stmt`s. They are declared like this: 142 | 143 | 144 | ```cpp 145 | const internal::VariadicDynCastAllOfMatcher< 146 | Decl, 147 | CXXRecordDecl> cxxRecordDecl; 148 | ``` 149 | -------------------------------------------------------------------------------- /videos.md: -------------------------------------------------------------------------------- 1 | # Videos 2 | 3 | ## The Clang AST - a Tutorial 4 | 5 | https://www.youtube.com/watch?v=VqCkCDFLSsc&t=2322s 6 | 7 | The clang AST is very rich in semantic information, it is fully type-resolved and it has quite a huge source tree (> 100K LOC). 8 | 9 | One class you encounter very often when dealing with clang is the `ASTContext`, which stores information about an AST. More precisely, it has an *identifier table* (symbol table) and source (code) manager. 10 | 11 | The core classes in the clang AST are: 12 | 13 | 1. `Decl`, with further subclasses, such as: 14 | 1. `CXXRecordDecl` 15 | 2. `VarDecl` 16 | 3. `UnresolvedUsingTypenameDeclaration` 17 | 2. `Stmt` 18 | 1. `ComponoundStmt` 19 | 2. `CXXTryStmt` 20 | 3. `BinaryOperator` 21 | Note that in clang, expressions (like operators) are the same thing as statments. 22 | 3. `Type` 23 | 1. `PointerType` 24 | 25 | Nodes in the AST have *pointer identity*. 26 | 27 | There are also *glue classes*, which combine these three basic types of classes. 28 | The `DeclContext` is information a `Decl` may have when it consists of further 29 | Decls. `TemplateArgument`, which store both a statement and a type. 30 | 31 | Classes also have *glue methods*, which allow you to further traverse the graph, 32 | e.g. via the `getElse()` statement of an `IfStmt`. 33 | 34 | How are types modeled in clang? Via `QualTypes` and `Types`. A `QualType` is a 35 | type with qualifications such as `const` or `volatile`. 36 | 37 | Locations are very important in clang. A `SourceLocation` is usually represented 38 | by an ID that is encoded efficiently, as to not have to store row and column 39 | each time. A location also always represents a location, as used by the lexer. 40 | 41 | `SourceLocations` always point to tokens. 42 | 43 | You can use `clang -Xclang -ast-dump -fsyntax-only ` to show you a 44 | representation of the AST of your code. 45 | 46 | ## A Brief Introduction to LLVM 47 | 48 | https://www.youtube.com/watch?v=a5-WaD8VV38 49 | 50 | Compilers used to be monolithic monsters, where each step of the compilation 51 | process was unflexibly engrained in the whole system. LLVM turned this around by 52 | developing a compiler that was composed of many, modular and independent 53 | *libraries*, that could have many different use cases. More precisely, the steps 54 | that the LLVM compiler toolchain provides are 55 | 56 | 1. Lexing, 57 | 2. Parsing, 58 | 3. Syntactic/Semantic Analysis, 59 | 4. Intermediate Representation, 60 | 5. Optimization and 61 | 6. Code Generation. 62 | 63 | Intermediate representation, for which LLVM is famous, can be implemented in various ways, including: 64 | 65 | * Structured format (graph or tree), 66 | * Flat, tuple-based (three-argument-format: opcode + two arguments) 67 | * Flat, stack-based 68 | 69 | LLVM IR is a low-level programming language similar to a RISC assembly language, 70 | but *strongly typed*. Moreover, it has an *infinite set of registers*, which are 71 | *immutable* (i.e. variables are not mutable, so that it's easier to reason about 72 | them). 73 | 74 | A lot of parts of GCC are mediocre and it is generally monolithic. 75 | 76 | ## Using Clang for the Chromium Project 77 | 78 | https://www.youtube.com/watch?v=n9aSa9XVuiQ 79 | 80 | Clang is extensible and built as a set of libraries. 81 | 82 | Other cool clang tools: 83 | 84 | * ThreadSanitizer, checks for race conditions; 85 | * AddressSanitizer, checks for memory leaks / out of bounds access; 86 | * UndefinedBehaviorSanitizer, checks for UB. 87 | 88 | ## Refactoring with C++ 89 | 90 | https://www.youtube.com/watch?v=U98rhV6wONo 91 | 92 | The interfaces to clang: 93 | 94 | * Clang Plugins: run as part of the compilation, provides tight integration with the compiler. Can break build when they fail (because you check something, for example, if an anti-pattern exists and you can fail the build on that occasion). 95 | * libClang: high-level C API, gives you things like cursors or references to code. Stable API though provides a high level of abstraction. 96 | * libTooling: More powerful than libClang, less coupled than plugins. 97 | 98 | libTooling can run over a string, or run over multiple files in a project. 99 | 100 | Before matchers, matching nodes in the AST was more complicated. 101 | 102 | ## Optimizing LLVM for GPGPU 103 | 104 | https://www.youtube.com/watch?v=JHfb8z-iSYk 105 | 106 | CPUs are very different from GPUs. Initially, LLVM was almost 10 times slower than NVCC. 107 | 108 | * CPU is optimized for high throughput and few cores. 109 | * GPU is: 110 | + optimized for graph rendering. 111 | + very light weight threads with branch prediction. 112 | + Can deploy tens of thousands of cores. 113 | + ISA is different and instructions are different. 114 | 115 | Optimization is about changing and reordering code, without changing the 116 | ultimate result of the program, but at the same time tuning the code so that it 117 | can be translated into the most efficient instructions on a particular ISA. 118 | --------------------------------------------------------------------------------