├── Makefile ├── README.md └── src ├── CMakeLists.txt └── lib ├── Analyzer.cc ├── Analyzer.h ├── CMakeLists.txt ├── CallGraph.cc ├── CallGraph.h ├── Common.cc └── Common.h /Makefile: -------------------------------------------------------------------------------- 1 | CUR_DIR = $(shell pwd) 2 | LLVM_BUILD := /path/to/llvm-15/build 3 | ANALYZER_DIR := ${CUR_DIR}/src 4 | ANALYZER_BUILD := ${CUR_DIR}/build 5 | 6 | 7 | UNAME := $(shell uname) 8 | ifeq ($(UNAME), Linux) 9 | NPROC := ${shell nproc} 10 | else 11 | NPROC := ${shell sysctl -n hw.ncpu} 12 | endif 13 | 14 | build_analyzer_func = \ 15 | (mkdir -p ${2} \ 16 | && cd ${2} \ 17 | && PATH=${LLVM_BUILD}/bin:${PATH} \ 18 | LLVM_TOOLS_BINARY_DIR=${LLVM_BUILD}/bin \ 19 | LLVM_LIBRARY_DIRS=${LLVM_BUILD}/lib \ 20 | LLVM_INCLUDE_DIRS=${LLVM_BUILD}/include \ 21 | CC=${LLVM_BUILD}/bin/clang CXX=${LLVM_BUILD}/bin/clang++ \ 22 | cmake ${1} \ 23 | -DCMAKE_BUILD_TYPE=Build \ 24 | -DLLVM_ENABLE_ASSERTIONS=ON \ 25 | -DCMAKE_CXX_FLAGS_BUILD="-std=c++14 -fpic -fno-rtti -g" \ 26 | && make -j${NPROC}) 27 | 28 | 29 | all: kanalyzer 30 | 31 | kanalyzer: 32 | $(call build_analyzer_func, ${ANALYZER_DIR}, ${ANALYZER_BUILD}) 33 | 34 | clean: 35 | rm -rf ${ANALYZER_BUILD} 36 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DeepType: Refining Indirect Call Targets with Strong Multi-layer Type Analysis 2 | Considering the high false positive rate of traditional type-based analysis, we use multi-layer type to describe the type of a function pointer, which consists of function signature along with the composite types holding it. However, multi-layer type introduces challenges in type matching because address-taken functions may be propagated between multi-layer types through information flow, making it hard to collect all potential targets. The original paper of Multi-Layer Type Analysis (MLTA) bypasses the challenges by splitting multi-layer types, which weakens the restrictions provided by multi-layer types, thereby negatively affecting accuracy. 3 | 4 | We proposed an advanced approach, Strong Multi-Layer Type Analysis (SMLTA), to mitigate the false positive targets produced by MLTA. SMLTA adheres to the strong restriction that identifies only those functions as targets whose entire multi-layer types match with the indirect calls. SMLTA addresses the challenges in multi-layer type matching by resolving the relationships between multi-layer types based on the directions of information flow, and utilizes an adapted breadth- first search (BFS) algorithm to discover all multi-layer types engaged in the propagation of target functions. It also employs a conservative strategy to deal with ambiguous type information due to information flow. 5 | 6 | DEEPTYPE is a prototype implementation of SMLTA, which overcomes challenges in multi-layer type matching and utilizes SMLTA to precisely and efficiently identify indirect call targets. It is built on LLVM 15.0 and is tested on Ubuntu 20.04. 7 | 8 | ## Setup Guide 9 | ### Build LLVM 10 | ``` 11 | $ git clone -b release/15.x https://github.com/llvm/llvm-project.git 12 | $ cd /root/of/llvm/project 13 | $ mkdir build 14 | $ cd build 15 | $ cmake -DLLVM_TARGET_ARCH="X86" -DCMAKE_BUILD_TYPE=Release -DLLVM_ENABLE_PROJECTS="clang;lldb;lld" -DLLVM_TARGETS_TO_BUILD="ARM;X86;AArch64" -G "Unix Makefiles" ../llvm 16 | $ cmake --build . 17 | ``` 18 | ### Build DeepType 19 | 1. Set the path of ```LLVM_BUILD``` in Makefile at line 2 20 | 2. Compile DeepType 21 | ``` 22 | $ cd /root/of/DeepType 23 | $ make 24 | ``` 25 | 26 | ## How to Use 27 | The executable file takes bitcode(s) as argument(s). 28 | ``` 29 | $ cd root/of/DeepType/build/lib 30 | $ ./kanalyzer filename.bc 31 | ``` 32 | Although DeepType supports all optimization levels, it has the best precision at O0 optimization level. When compiling the target program into bitcode, use flags ```-g -Xclang -no-opaque-pointers``` to include debugging information and disable opaque pointer mode. 33 | 34 | ## Analysis Results 35 | DeepType outputs the following information of the analyzed program: 36 | 1. A list of indirect calls along with their respective targets. 37 | 2. Total number of indirect calls and indirect call targets. 38 | 3. Average number of indirect call targets (ANT). 39 | 4. Execution time. 40 | 41 | ## Configurations 42 | To evaluate DeepType comprehensively, we developed 3 variants of DeepType: DT-weak, DT-noSH, DT-nocache. 43 | 44 | **DT-weak** stores splitted multi-layer types to help examine the impact of recording entire multi-layer types. To reproduce the experiments, uncomment ```#define DTweak``` in ```/DeepType/src/lib/CallGraph.cc``` and recompile DeepType. 45 | ``` 46 | #define DTweak 47 | //#define DTnoSH 48 | //#define DTnocache 49 | ``` 50 | 51 | **DT-noSH** disables the special handlings in DeepType to reveal the contribution of SMLTA. To reproduce the experiments, uncomment ```#define DTnoSH``` in ```/DeepType/src/lib/CallGraph.cc``` and recompile DeepType. 52 | ``` 53 | //#define DTweak 54 | #define DTnoSH 55 | //#define DTnocache 56 | ``` 57 | 58 | **DT-nocache** disables the cache used in DeepType to measure the runtime overhead of DeepType without cache. To reproduce the experiments, uncomment ```#define DTnocache``` in ```/DeepType/src/lib/CallGraph.cc``` and recompile DeepType. 59 | ``` 60 | //#define DTweak 61 | //#define DTnoSH 62 | #define DTnocache 63 | ``` 64 | 65 | ## Benchmarks 66 | The bitcode of the benchmarks in our paper is available at: https://drive.google.com/file/d/1U9rMr4UC0uxVhAH7p0R3127lJpaaQMuj/view?usp=sharing. 67 | 68 | ## Publication 69 | This project is the artifact of the paper DEEPTYPE: Refining Indirect Call Targets with Strong Multi-layer Type Analysis, which is accepted at the 33rd USENIX Security Symposium (USENIX 2024). 70 | ``` 71 | @inproceedings{xia:deeptype, 72 | title = {{DEEPTYPE: Refining Indirect Call Targets with Strong Multi-layer Type Analysis}}, 73 | author = {Tianrou Xia and Hong Hu and Dinghao Wu}, 74 | booktitle = {Proceedings of the 33rd USENIX Security Symposium (USENIX 2024)}, 75 | month = {aug}, 76 | year = {2024}, 77 | address = {Philadelphia, PA}, 78 | } 79 | ``` 80 | -------------------------------------------------------------------------------- /src/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 3.5.1) 2 | project(KANALYZER) 3 | 4 | find_package(LLVM REQUIRED CONFIG) 5 | 6 | message(STATUS "Found LLVM ${LLVM_PACKAGE_VERSION}") 7 | message(STATUS "Using LLVMConfig.cmake in: ${LLVM_DIR}") 8 | 9 | # Set your project compile flags. 10 | # E.g. if using the C++ header files 11 | # you will need to enable C++14 support 12 | # for your compiler. 13 | # Check for C++14 support and set the compilation flag 14 | include(CheckCXXCompilerFlag) 15 | 16 | include_directories(${LLVM_INCLUDE_DIRS}) 17 | add_definitions(${LLVM_DEFINITIONS}) 18 | 19 | add_subdirectory (lib) 20 | -------------------------------------------------------------------------------- /src/lib/Analyzer.cc: -------------------------------------------------------------------------------- 1 | //===-- Analyzer.cc - the kernel-analysis framework-------------===// 2 | // 3 | // It constructs a global call-graph based on multi-layer type 4 | // analysis. 5 | // 6 | //===-----------------------------------------------------------===// 7 | 8 | #include "llvm/IR/LLVMContext.h" 9 | #include "llvm/IR/PassManager.h" 10 | #include "llvm/IR/Module.h" 11 | #include "llvm/IR/Verifier.h" 12 | #include "llvm/Bitcode/BitcodeReader.h" 13 | #include "llvm/Bitcode/BitcodeWriter.h" 14 | #include "llvm/Support/ManagedStatic.h" 15 | #include "llvm/Support/PrettyStackTrace.h" 16 | #include "llvm/Support/ToolOutputFile.h" 17 | #include "llvm/Support/SystemUtils.h" 18 | #include "llvm/Support/FileSystem.h" 19 | #include "llvm/IRReader/IRReader.h" 20 | #include "llvm/Support/SourceMgr.h" 21 | #include "llvm/Support/Signals.h" 22 | #include "llvm/Support/Path.h" 23 | 24 | #include 25 | #include 26 | #include 27 | #include 28 | #include 29 | #include 30 | #include 31 | 32 | #include "Analyzer.h" 33 | #include "CallGraph.h" 34 | //#include "Config.h" 35 | 36 | using namespace llvm; 37 | using namespace std; 38 | 39 | auto mid = std::chrono::system_clock::now();; 40 | 41 | // Command line parameters. 42 | cl::list InputFilenames( 43 | cl::Positional, cl::OneOrMore, cl::desc("")); 44 | 45 | cl::opt VerboseLevel( 46 | "verbose-level", cl::desc("Print information at which verbose level"), 47 | cl::init(0)); 48 | 49 | cl::opt MLTA( 50 | "mlta", 51 | cl::desc("Multi-layer type analysis for refining indirect-call \ 52 | targets"), 53 | cl::NotHidden, cl::init(false)); 54 | 55 | GlobalContext GlobalCtx; 56 | 57 | 58 | void IterativeModulePass::run(ModuleList &modules) { 59 | 60 | ModuleList::iterator i, e; 61 | OP << "[" << ID << "] Initializing " << modules.size() << " modules " << "\n"; 62 | bool again = true; 63 | while (again) { 64 | again = false; 65 | for (i = modules.begin(), e = modules.end(); i != e; ++i) { 66 | again |= CollectInformation(i->first); 67 | OP << "\n"; 68 | } 69 | } 70 | OP << "\n"; 71 | 72 | mid = std::chrono::system_clock::now(); 73 | 74 | unsigned iter = 0, changed = 1; 75 | while (changed) { 76 | ++iter; 77 | changed = 0; 78 | unsigned counter_modules = 0; 79 | unsigned total_modules = modules.size(); 80 | for (i = modules.begin(), e = modules.end(); i != e; ++i) { 81 | OP << "[" << ID << " / " << iter << "] "; 82 | OP << "[" << ++counter_modules << " / " << total_modules << "] "; 83 | OP << "[" << i->second << "]\n"; 84 | 85 | bool ret = IdentifyTargets(i->first); 86 | if (ret) { 87 | ++changed; 88 | OP << "\t [CHANGED]\n"; 89 | } else 90 | OP << "\n"; 91 | } 92 | OP << "[" << ID << "] Updated in " << changed << " modules.\n"; 93 | } 94 | 95 | //OP << "[" << ID << "] Postprocessing ...\n"; 96 | /* 97 | again = true; 98 | while (again) { 99 | again = false; 100 | for (i = modules.begin(), e = modules.end(); i != e; ++i) { 101 | // TODO: Dump the results. 102 | again |= doFinalization(i->first); 103 | } 104 | } 105 | */ 106 | 107 | OP << "[" << ID << "] Done!\n\n"; 108 | } 109 | 110 | void PrintResults(GlobalContext *GCtx) { 111 | 112 | int TotalTargets = 0; 113 | for (auto IC : GCtx->IndirectCallInsts) { 114 | TotalTargets += GCtx->Callees[IC].size(); 115 | } 116 | unsigned WithTargetIndirectCalls = GCtx->IndirectCallInsts.size() - GCtx->NoTargetCalls; 117 | float AveIndirectTargets = 0; 118 | if (WithTargetIndirectCalls > 0) 119 | AveIndirectTargets = (float)GCtx->NumIndirectCallTargets/(float)WithTargetIndirectCalls; 120 | 121 | OP<<"############## DeepType Result Statistics ##############\n"; 122 | OP<<"# Number of indirect calls: \t\t\t"<IndirectCallInsts.size()<<"\n"; 123 | OP<<"# Number of indirect-call targets: \t\t"<NumIndirectCallTargets<<"\n"; 124 | OP<<"# Number of address-taken functions: \t\t"<AddressTakenFuncs.size()<<"\n"; 125 | //OP<<"# Number of more than 3 layer call site type: \t\t"<NumThreeLayerType<<"\n"; 126 | OP<<"# Number of 0-target i-calls: \t\t\t"<NoTargetCalls<<"\n"; 127 | std::cout << "# Ave. Number of indirect-call targets: \t" 128 | << std::fixed << std::setprecision(2) << AveIndirectTargets << "\n"; 129 | //OP<<"# Number of [1,2)-target i-calls: \t\t"<ZerotTargetCalls<<"\n"; 130 | //OP<<"# Number of [2,4)-target i-calls: \t\t"<OnetTargetCalls<<"\n"; 131 | //OP<<"# Number of [4,8)-target i-calls: \t\t"<TwotTargetCalls<<"\n"; 132 | //OP<<"# Number of [8,16)-target i-calls: \t\t"<ThreetTargetCalls<<"\n"; 133 | //OP<<"# Number of [16,32)-target i-calls: \t\t"<FourtTargetCalls<<"\n"; 134 | //OP<<"# Number of [32,64)-target i-calls: \t\t"<FivetTargetCalls<<"\n"; 135 | //OP<<"# Number of [64,128)-target i-calls: \t\t"<SixtTargetCalls<<"\n"; 136 | //OP<<"# Number of [128,256)-target i-calls: \t\t"<SeventTargetCalls<<"\n"; 137 | //OP<<"# Number of [256,...)-target i-calls: \t\t"<EighttTargetCalls<<"\n"; 138 | } 139 | 140 | int main(int argc, char **argv) { 141 | auto start = std::chrono::system_clock::now(); 142 | 143 | // Print a stack trace if we signal out. 144 | sys::PrintStackTraceOnErrorSignal(argv[0]); 145 | PrettyStackTraceProgram X(argc, argv); 146 | 147 | llvm_shutdown_obj Y; // Call llvm_shutdown() on exit. 148 | 149 | cl::ParseCommandLineOptions(argc, argv, "global analysis\n"); 150 | SMDiagnostic Err; 151 | 152 | // Loading modules 153 | OP << "Total " << InputFilenames.size() << " file(s)\n"; 154 | 155 | for (unsigned i = 0; i < InputFilenames.size(); ++i) { 156 | 157 | LLVMContext *LLVMCtx = new LLVMContext(); 158 | std::unique_ptr M = parseIRFile(InputFilenames[i], Err, *LLVMCtx); 159 | 160 | if (M == NULL) { 161 | OP << argv[0] << ": error loading file '" 162 | << InputFilenames[i] << "'\n"; 163 | continue; 164 | } 165 | 166 | Module *Module = M.release(); 167 | StringRef MName = StringRef(strdup(InputFilenames[i].data())); 168 | GlobalCtx.Modules.push_back(std::make_pair(Module, MName)); 169 | GlobalCtx.ModuleMaps[Module] = InputFilenames[i]; 170 | } 171 | 172 | // 173 | // Main workflow 174 | // 175 | 176 | // Build global callgraph. 177 | CallGraphPass CGPass(&GlobalCtx); 178 | CGPass.run(GlobalCtx.Modules); 179 | 180 | // Print final results 181 | PrintResults(&GlobalCtx); 182 | 183 | auto end = std::chrono::system_clock::now(); 184 | std::cout << "Stage 1 " << std::chrono::duration_cast(mid-start).count() << " ms" << std::endl; 185 | std::cout << "Stage 2 " << std::chrono::duration_cast(end-mid).count() << " ms" << std::endl; 186 | std::cout << "total " << std::chrono::duration_cast(end-start).count() << " ms" << std::endl; 187 | return 0; 188 | } 189 | 190 | -------------------------------------------------------------------------------- /src/lib/Analyzer.h: -------------------------------------------------------------------------------- 1 | #ifndef _ANALYZER_GLOBAL_H 2 | #define _ANALYZER_GLOBAL_H 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include "llvm/Support/CommandLine.h" 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include 22 | #include "Common.h" 23 | 24 | 25 | 26 | // 27 | // typedefs 28 | // 29 | typedef std::vector< std::pair > ModuleList; 30 | // Mapping module to its file name. 31 | typedef std::unordered_map ModuleNameMap; 32 | // The set of all functions. 33 | typedef llvm::SmallPtrSet FuncSet; 34 | // The set of strings. 35 | typedef std::set StrSet; 36 | // The pair of an array and its size 37 | typedef std::pair ArrayPair; 38 | // The set of string pairs. 39 | typedef std::set> StrPairSet; 40 | // Mapping from function name to function. 41 | typedef std::unordered_map NameFuncMap; 42 | typedef llvm::SmallPtrSet CallInstSet; 43 | typedef DenseMap CallerMap; 44 | typedef DenseMap CalleeMap; 45 | typedef std::pair TyRepkeypair; 46 | 47 | 48 | struct GlobalContext { 49 | 50 | GlobalContext() { 51 | // Initialize statistucs. 52 | NumFunctions = 0; 53 | NumFirstLayerTypeCalls = 0; 54 | NumSecondLayerTypeCalls = 0; 55 | NumIndirectCallTargets = 0; 56 | NoTargetCalls = 0; // Number of indirect calls which have no target 57 | ZerotTargetCalls = 0; // [1,2) 2^0 ~ 2^1 58 | OnetTargetCalls = 0; // [2,4) 2^1 ~ 59 | TwotTargetCalls = 0; // [4,8) 2^2 ~ 60 | ThreetTargetCalls = 0; // [8,16) 2^3 ~ 61 | FourtTargetCalls = 0; // [16,32) 2^4 ~ 62 | FivetTargetCalls = 0; // [32,64) 2^5 ~ 63 | SixtTargetCalls = 0; // [64,128) 2^6 ~ 64 | SeventTargetCalls = 0; // [128,256) 2^7 ~ 65 | EighttTargetCalls = 0; // [256,...) 2^8 ~ 66 | } 67 | 68 | // Statistics 69 | unsigned NumFunctions; 70 | unsigned NumFirstLayerTypeCalls; 71 | unsigned NumSecondLayerTypeCalls; 72 | unsigned NumIndirectCallTargets; 73 | unsigned NoTargetCalls; 74 | unsigned ZerotTargetCalls; 75 | unsigned OnetTargetCalls; 76 | unsigned TwotTargetCalls; 77 | unsigned ThreetTargetCalls; 78 | unsigned FourtTargetCalls; 79 | unsigned FivetTargetCalls; 80 | unsigned SixtTargetCalls; 81 | unsigned SeventTargetCalls; 82 | unsigned EighttTargetCalls; 83 | unsigned NumThreeLayerType; 84 | 85 | 86 | // Map global function name to function. 87 | NameFuncMap GlobalFuncs; 88 | 89 | // Functions whose addresses are taken. 90 | FuncSet AddressTakenFuncs; 91 | 92 | // Map a callsite to all potential callee functions. 93 | CalleeMap Callees; 94 | 95 | // Map a function to all potential caller instructions. 96 | CallerMap Callers; 97 | 98 | // Unified functions -- no redundant inline functions 99 | DenseMapUnifiedFuncMap; 100 | setUnifiedFuncSet; 101 | 102 | // Map function signature to functions 103 | DenseMapsigFuncsMap; 104 | 105 | // Indirect call instructions. 106 | std::vectorIndirectCallInsts; 107 | 108 | // Modules. 109 | ModuleList Modules; 110 | ModuleNameMap ModuleMaps; 111 | std::set InvolvedModules; 112 | 113 | }; 114 | 115 | class IterativeModulePass { 116 | protected: 117 | GlobalContext *Ctx; 118 | const char * ID; 119 | public: 120 | IterativeModulePass(GlobalContext *Ctx_, const char *ID_) 121 | : Ctx(Ctx_), ID(ID_) { } 122 | 123 | // Run on each module before iterative pass. 124 | virtual bool CollectInformation(Module *M) 125 | { return true; } 126 | 127 | // Run on each module after iterative pass. 128 | //virtual bool doFinalization(llvm::Module *M) 129 | // { return true; } 130 | 131 | // Iterative pass. 132 | virtual bool IdentifyTargets(llvm::Module *M) 133 | { return false; } 134 | 135 | virtual void run(ModuleList &modules); 136 | }; 137 | 138 | #endif 139 | -------------------------------------------------------------------------------- /src/lib/CMakeLists.txt: -------------------------------------------------------------------------------- 1 | set (AnalyzerSourceCodes 2 | Common.h 3 | Common.cc 4 | Analyzer.h 5 | Analyzer.cc 6 | CallGraph.h 7 | CallGraph.cc 8 | ) 9 | 10 | #file(COPY configs/ DESTINATION configs) 11 | 12 | set(CMAKE_MACOSX_RPATH 0) 13 | 14 | # Build libraries. 15 | add_library (AnalyzerObj OBJECT ${AnalyzerSourceCodes}) 16 | add_library (Analyzer SHARED $) 17 | add_library (AnalyzerStatic STATIC $) 18 | 19 | # Build executable. 20 | set (EXECUTABLE_OUTPUT_PATH ${ANALYZER_BINARY_DIR}) 21 | link_directories (${ANALYZER_BINARY_DIR}/lib) 22 | add_executable(kanalyzer ${AnalyzerSourceCodes}) 23 | target_link_libraries(kanalyzer 24 | LLVMAsmParser 25 | LLVMSupport 26 | LLVMCore 27 | LLVMAnalysis 28 | LLVMIRReader 29 | AnalyzerStatic 30 | ) 31 | -------------------------------------------------------------------------------- /src/lib/CallGraph.cc: -------------------------------------------------------------------------------- 1 | //===-- CallGraph.cc - Build global call-graph------------------===// 2 | // 3 | // This pass builds a global call-graph. The targets of an indirect 4 | // call are identified based on type-analysis, i.e., matching the 5 | // number and type of function parameters. 6 | // 7 | //===-----------------------------------------------------------===// 8 | 9 | #include 10 | #include "llvm/IR/Instruction.h" 11 | #include "llvm/IR/IntrinsicInst.h" 12 | #include "llvm/Support/Debug.h" 13 | #include 14 | #include "llvm/IR/Function.h" 15 | #include "llvm/Support/raw_ostream.h" 16 | #include "llvm/IR/InstrTypes.h" 17 | #include "llvm/IR/BasicBlock.h" 18 | #include "llvm/Analysis/LoopInfo.h" 19 | #include "llvm/Analysis/LoopPass.h" 20 | #include 21 | #include 22 | #include 23 | #include 24 | #include 25 | #include 26 | #include 27 | #include 28 | #include "llvm/IR/CFG.h" 29 | #include "llvm/Transforms/Utils/BasicBlockUtils.h" 30 | #include "llvm/IR/IRBuilder.h" 31 | #include "CallGraph.h" 32 | #include "Common.h" 33 | #include "llvm/IR/DebugInfo.h" 34 | #include "llvm/IR/InstIterator.h" 35 | #include "llvm/IR/Constants.h" 36 | #include "llvm/ADT/StringExtras.h" 37 | #include "llvm/Analysis/CallGraph.h" 38 | #include "llvm/IR/LegacyPassManager.h" 39 | #include "llvm/IR/Operator.h" 40 | 41 | //#define DTweak 42 | //#define DTnoSH 43 | //#define DTnocache 44 | 45 | using namespace llvm; 46 | 47 | std::map CallGraphPass::FirstMap; 48 | std::map CallGraphPass::SecondMap; 49 | std::map CallGraphPass::ThirdMap; 50 | std::map CallGraphPass::FourthMap; 51 | std::map CallGraphPass::FifthMap; 52 | std::map CallGraphPass::SixthMap; 53 | std::map CallGraphPass::SeventhMap; 54 | DenseMap CallGraphPass::WMLTATypeFuncMap; 55 | DenseMap CallGraphPass::MLTypeFuncMap; 56 | DenseMap CallGraphPass::TargetLookupMap; 57 | std::map CallGraphPass::ReferMap; 58 | std::map> CallGraphPass::StructIDNameMap; 59 | std::map CallGraphPass::GVChildParentMap; 60 | std::map, int> CallGraphPass::GVChildParentOffsetMap; 61 | std::map CallGraphPass::GVFuncMap; 62 | std::map, std::string> CallGraphPass::GVFuncTypeMap; 63 | std::set CallGraphPass::ArgAllocaSet; 64 | std::map> CallGraphPass::InstHierarchy; 65 | std::map> CallGraphPass::TypeRelationshipMap; 66 | std::map CallGraphPass::NewTypeRelationshipMap; 67 | std::map CallGraphPass::FriendTyMap; 68 | //std::map CallGraphPass::VariantTypeMap; 69 | std::map CallGraphPass::MatchedTyMap; 70 | std::map CallGraphPass::DerivedClassMap; 71 | std::set CallGraphPass::EscapingSet; 72 | std::set CallGraphPass::UnsupportedSet; 73 | std::set TypeNameSet; 74 | std::setLayerNumSet; 75 | int LayerNumArray[12] = {0,0,0,0,0,0,0,0,0,0,0,0}; 76 | std::set CallGraphPass::ManyTargetType; 77 | 78 | 79 | // Global variables 80 | std::string TyName; 81 | bool AIFlag = false; 82 | AllocaInst *RecordAI; 83 | int CSIdx = 0; 84 | 85 | bool CallGraphPass::IsCompositeType(Type *Ty) { 86 | while (PointerType *PTy = dyn_cast(Ty)) { 87 | Ty = PTy->getPointerElementType(); 88 | } 89 | if (Ty->isStructTy() || Ty->isArrayTy() || Ty->isVectorTy()) 90 | return true; 91 | else 92 | return false; 93 | } 94 | 95 | bool CallGraphPass::IsGeneralPointer(Type *Ty) { 96 | while (PointerType *PTy = dyn_cast(Ty)) { 97 | Ty = PTy->getPointerElementType(); 98 | } 99 | return Ty->isIntegerTy(); 100 | } 101 | 102 | bool CallGraphPass::IsUnsupportedType(Type *Ty) { 103 | if (UnsupportedSet.find(typeHash(Ty)) != UnsupportedSet.end()) 104 | return true; 105 | else 106 | return false; 107 | } 108 | 109 | bool CallGraphPass::IsUnsupportedTypeStr(std::string TyStr) { 110 | if (TyStr.substr(0,1) == "i" && !HasSubString(TyStr, "(") && !HasSubString(TyStr, ")")) { 111 | std::string s = TyStr.substr(1, (TyStr.length()-1)); 112 | while (s.substr((s.length()-1), 1) == "*") { 113 | s = s.substr(0, (s.length()-1)); 114 | } 115 | for (char const &c : s) { 116 | if (std::isdigit(c) == 0) { 117 | return false; 118 | } 119 | } 120 | return true; 121 | } 122 | else { 123 | return false; 124 | } 125 | } 126 | 127 | bool CallGraphPass::IsCompositeTypeStr(std::string TyStr) { 128 | if (TyStr.substr(0,6) == "struct" || 129 | TyStr.substr(0,5) == "array" || 130 | TyStr.substr(0,6) == "vector" || 131 | TyStr.substr(0,5) == "union") { 132 | return true; 133 | } 134 | else { 135 | return false; 136 | } 137 | } 138 | 139 | bool CallGraphPass::IsEscapingType(std::string TypeName) { 140 | if (EscapingSet.find(TypeName) != EscapingSet.end()) 141 | return true; 142 | else 143 | return false; 144 | } 145 | 146 | bool CallGraphPass::IsCompoundInst(Instruction *I) { 147 | unsigned operandNum = I->getNumOperands(); 148 | for (unsigned index = 0; index < operandNum; index++) { 149 | Value *operand = I->getOperand(index); 150 | if (operand->getType()->isPointerTy() && isa(operand)) { 151 | return true; 152 | } 153 | } 154 | return false; 155 | } 156 | 157 | bool CallGraphPass::HasSubString(std::string str, std::string substr) { 158 | std::string::size_type idx; 159 | idx = str.find(substr); 160 | if (idx == std::string::npos) { 161 | return false; 162 | } 163 | else { 164 | return true; 165 | } 166 | } 167 | 168 | std::string CallGraphPass::GetStructIdentity(StructType* STy) { 169 | std::string STyID = ""; 170 | for (Type* Ty : STy->elements()) { 171 | STyID += SingleType2String(Ty); 172 | } 173 | return STyID; 174 | } 175 | 176 | // Given a struct name, remove suffix at the end if exists 177 | std::string CallGraphPass::StructNameTrim(std::string sName) { 178 | std::size_t idx = sName.find_last_of('.'); 179 | std::size_t check = sName.find_first_of('.'); 180 | 181 | // Literal structs are named as struct.num, do not remove .num 182 | // Identified structs may be named as struct.name.suffix, remove .suffix 183 | if (idx != check) { 184 | std::string sNameTail = sName.substr(idx+1); 185 | std::string sNameHead = sName.substr(0, idx); 186 | for (char c: sNameTail) { 187 | if (isdigit(c)) { 188 | continue; 189 | } 190 | else { 191 | return sName; 192 | } 193 | } 194 | return sNameHead; 195 | } 196 | else { 197 | return sName; 198 | } 199 | } 200 | 201 | std::size_t CallGraphPass::FindEndOfStruct(std::string structstr) { 202 | std::size_t end = 0; 203 | for (char c: structstr) { 204 | if (c=='*' || c==',' || c=='(' || c==')') { 205 | break; 206 | } 207 | end++; 208 | } 209 | 210 | return end; 211 | } 212 | 213 | // Given MLType name, remove suffix at the end of a struct in first layer if exists 214 | std::string CallGraphPass::FirstLayerTrim(std::string fName) { 215 | std::size_t idx, end; 216 | std::string clearName = ""; 217 | std::string sName; 218 | std::string head, body, tail; 219 | 220 | while ((idx = fName.find("%struct")) != std::string::npos) { 221 | // Find the index of the first character after struct name 222 | end = FindEndOfStruct(fName.substr(idx)); 223 | 224 | // Divide fName into 3 parts: before the struct, the struct, after the struct 225 | head = fName.substr(0,idx); 226 | body = fName.substr(idx,end); 227 | tail = fName.substr(idx+end); 228 | 229 | sName = StructNameTrim(body); 230 | clearName = clearName + head + sName; 231 | fName = tail; 232 | } 233 | 234 | clearName += fName; 235 | return clearName; 236 | } 237 | 238 | std::string CallGraphPass::GenerateMLTypeName(Value *VO, std::string MLTypeName) { 239 | VO = NextLayerTypeExtraction(VO); 240 | while (VO != NULL) { 241 | if (ConstantExpr *CE = dyn_cast(VO)) { 242 | Instruction *Inst = CE->getAsInstruction(); 243 | if (GetElementPtrInst *GEPInst = dyn_cast(Inst)) { 244 | MLTypeName = GEPInstAnalysis(GEPInst); 245 | } 246 | break; 247 | } 248 | if (TyName != "") { 249 | MLTypeName += "|" + TyName; 250 | TyName = ""; 251 | } 252 | VO = NextLayerTypeExtraction(VO); 253 | } 254 | 255 | return MLTypeName; 256 | } 257 | 258 | 259 | // Transform a layer's type to string 260 | std::string CallGraphPass::SingleType2String (Type *Ty) { 261 | std::string TypeName; 262 | 263 | if (IsCompositeType(Ty)) { 264 | while (PointerType *PTy = dyn_cast(Ty)) { 265 | Ty = PTy->getPointerElementType(); 266 | } 267 | if (StructType* STy = dyn_cast(Ty)) { 268 | TypeName = Ty->getStructName().str(); 269 | 270 | if (TypeName.length() == 0) { 271 | std::string STyID = GetStructIdentity(STy); 272 | if (StructIDNameMap.find(STyID) != StructIDNameMap.end()) { 273 | TypeName = "NAMESET"; 274 | std::set TyNameSet = StructIDNameMap[STyID]; 275 | TypeNameSet.clear(); 276 | TypeNameSet.insert(TyNameSet.begin(), TyNameSet.end()); 277 | } 278 | else { 279 | TypeName = ""; 280 | TypeName = "struct.anon"; 281 | } 282 | } 283 | 284 | TypeName = StructNameTrim(TypeName); 285 | } 286 | else if (Ty->isArrayTy()) { 287 | TypeName = "array"; 288 | std::string array_str; 289 | llvm::raw_string_ostream rso(array_str); 290 | Ty->print(rso); 291 | TypeName += FirstLayerTrim(rso.str()); 292 | } 293 | else if (Ty->isVectorTy()) { 294 | TypeName = "vector"; 295 | std::string vector_str; 296 | llvm::raw_string_ostream rso(vector_str); 297 | Ty->print(rso); 298 | TypeName += FirstLayerTrim(rso.str()); 299 | } 300 | } 301 | else { 302 | std::string type_str; 303 | llvm::raw_string_ostream rso(type_str); 304 | Ty->print(rso); 305 | TypeName = rso.str(); 306 | TypeName = FirstLayerTrim(TypeName); 307 | } 308 | return TypeName; 309 | } 310 | 311 | 312 | Value *CallGraphPass::NextLayerTypeExtraction(Value *v) { 313 | // Case 1: GetElementPtrInst 314 | if (GetElementPtrInst *GEP = dyn_cast(v)) { 315 | // If a GEPI only has the first index 316 | // Then, the source type of the GEPI is not an outer layer type 317 | if (GEP->getNumIndices() == 1) { 318 | TyName = ""; 319 | return GEP->getPointerOperand(); 320 | } 321 | 322 | Type *Ty = GEP->getPointerOperandType(); 323 | if (IsCompositeType(Ty)) { 324 | unsigned opNum = GEP->getNumOperands(); 325 | if (ConstantInt* CInt = dyn_cast(GEP->getOperand(opNum-1))) { 326 | int offsetNum = CInt->getSExtValue(); 327 | std::string offsetstr = std::to_string(offsetNum); 328 | TyName = SingleType2String(Ty) + "#" + offsetstr; 329 | } 330 | else { 331 | TyName = SingleType2String(Ty) + "#?"; 332 | } 333 | return GEP->getPointerOperand(); 334 | } 335 | else { 336 | TyName = ""; 337 | return NULL; 338 | } 339 | 340 | /* 341 | Type *Ty = GEP->getPointerOperandType(); 342 | if (IsCompositeType(Ty)) { 343 | TyName = SingleType2String(Ty); 344 | return GEP->getPointerOperand(); 345 | } 346 | else { 347 | TyName = ""; 348 | return NULL; 349 | } 350 | */ 351 | } 352 | 353 | // Case 2: LoadInst 354 | else if (LoadInst *LI = dyn_cast(v)) { 355 | TyName = ""; 356 | return NextLayerTypeExtraction(LI->getOperand(0)); 357 | } 358 | 359 | // Case 3: ConstantExpr 360 | else if (ConstantExpr *CE = dyn_cast(v)) { 361 | TyName = ""; 362 | return CE; 363 | } 364 | 365 | // Case 4: AllocaInst 366 | else if (AllocaInst *AI = dyn_cast(v)) { 367 | //if (AllocaInst *AI = dyn_cast(UI)) { 368 | TyName = ""; 369 | AIFlag = true; 370 | RecordAI = AI; 371 | return NULL; 372 | } 373 | 374 | // Case 5: Bitcast 375 | else if (CastInst *CI = dyn_cast(v)) { 376 | Type *SrcTy = CI->getSrcTy(); 377 | std::string STyName = SingleType2String(SrcTy); 378 | 379 | #ifdef DTnoSH 380 | TyName = ""; 381 | return NULL; 382 | #endif 383 | 384 | // bitcast {}* to some_type: Special handling 4 385 | if (STyName.length() == 0) { 386 | TyName = ""; 387 | return NextLayerTypeExtraction(CI->getOperand(0)); 388 | } 389 | else { 390 | TyName = ""; 391 | return NULL; 392 | } 393 | } 394 | 395 | else { 396 | TyName = ""; 397 | return NULL; 398 | } 399 | 400 | return v; 401 | } 402 | 403 | /* 404 | void CallGraphPass::CalculateVariantTypes(std::string TyStr) { 405 | std::string TyStr_copy = TyStr; 406 | std::replace(TyStr_copy.begin(), TyStr_copy.end(), '#', '|'); 407 | 408 | list TyList = MLTypeName2List(TyStr_copy); 409 | StrSet CumuSet; 410 | StrSet VarSet; 411 | StrSet NewCumuSet; 412 | 413 | std::string LayerTy = TyList.front(); 414 | TyList.pop_front(); 415 | CumuSet.insert(LayerTy); 416 | if (TyList.empty()) { 417 | VarSet.insert(LayerTy); 418 | } 419 | 420 | std::string CumuTy; 421 | 422 | while (!TyList.empty()) { 423 | LayerTy = TyList.front(); 424 | TyList.pop_front(); 425 | 426 | bool digit = true; 427 | for (auto c: LayerTy) { 428 | if (!isdigit(c)) { 429 | digit = false; 430 | break; 431 | } 432 | } 433 | 434 | for (StrSet::iterator it=CumuSet.begin(); it!=CumuSet.end(); it++) { 435 | std::string base = *it; 436 | if (LayerTy == "&") { 437 | VarSet.insert(base); 438 | } 439 | CumuTy = base + "|" + LayerTy; 440 | NewCumuSet.insert(CumuTy); 441 | if (TyList.empty()) { 442 | VarSet.insert(CumuTy); 443 | } 444 | if (LayerTy.substr(0,6) == "struct") { 445 | CumuTy = base + "|struct.anon"; 446 | NewCumuSet.insert(CumuTy); 447 | } 448 | if (digit == true) { 449 | CumuTy = base + "|?"; 450 | } 451 | else { 452 | CumuTy = base + "|&"; 453 | } 454 | VarSet.insert(CumuTy); 455 | } 456 | CumuSet = NewCumuSet; 457 | NewCumuSet.clear(); 458 | } 459 | 460 | VariantTypeMap[TyStr] = VarSet; 461 | return; 462 | }*/ 463 | 464 | void CallGraphPass::UpdateRelationshipMap(std::string CType, std::string FType) { 465 | if (!IsUnsupportedTypeStr(CType) && !IsUnsupportedTypeStr(FType)) { 466 | if (IsCompositeTypeStr(CType) && IsCompositeTypeStr(FType)) { 467 | TypeRelationshipMap[CType].insert(FType); 468 | } 469 | if (!IsCompositeTypeStr(CType) && !IsCompositeTypeStr(FType)) { 470 | TypeRelationshipMap[CType].insert(FType); 471 | } 472 | } 473 | 474 | return; 475 | } 476 | 477 | void CallGraphPass::UpdateMLTypeFuncMap(std::string type, Function* F) { 478 | MLTypeFuncMap[stringHash(type)].insert(F); 479 | ReferMap[stringHash(type)] = type; 480 | 481 | std::string CumuType; 482 | list typeList; 483 | typeList = MLTypeName2List(type); 484 | 485 | std::string FirstLayer = typeList.front(); 486 | typeList.pop_front(); 487 | 488 | if (!typeList.empty()) { // >=2 layers 489 | std::string SecondLayer = typeList.front(); 490 | FirstMap[FirstLayer].insert(SecondLayer); 491 | //errs() << "Cumu Layer: " << FirstLayer << " 2nd Layer: " << SecondLayer << "\n"; 492 | typeList.pop_front(); 493 | 494 | if (!typeList.empty()) { // >=3 layers 495 | CumuType = FirstLayer + "|" + SecondLayer; 496 | std::string ThirdLayer = typeList.front(); 497 | SecondMap[CumuType].insert(ThirdLayer); 498 | //errs() << "Cumu Layer: " << CumuType << " 3rd Layer: " << ThirdLayer << "\n"; 499 | typeList.pop_front(); 500 | 501 | if (!typeList.empty()) { // >=4 layers 502 | CumuType = CumuType + "|" + ThirdLayer; 503 | std::string FourthLayer = typeList.front(); 504 | ThirdMap[CumuType].insert(FourthLayer); 505 | //errs() << "Cumu Layer: " << CumuType << " 4th Layer: " << FourthLayer << "\n"; 506 | typeList.pop_front(); 507 | 508 | if (!typeList.empty()) { // >=5 layers 509 | CumuType = CumuType + "|" + FourthLayer; 510 | std::string FifthLayer = typeList.front(); 511 | FourthMap[CumuType].insert(FifthLayer); 512 | //errs() << "Cumu Layer: " << CumuType << " 5th Layer: " << FifthLayer << "\n"; 513 | typeList.pop_front(); 514 | 515 | if (!typeList.empty()) { // >=6 layers 516 | CumuType = CumuType + "|" + FifthLayer; 517 | std::string SixthLayer = typeList.front(); 518 | FifthMap[CumuType].insert(SixthLayer); 519 | //errs() << "Cumu Layer: " << CumuType << " 6th Layer: " << SixthLayer << "\n"; 520 | typeList.pop_front(); 521 | 522 | if (!typeList.empty()) { // >=7 layers 523 | CumuType = CumuType + "|" + SixthLayer; 524 | std::string SeventhLayer = typeList.front(); 525 | SixthMap[CumuType].insert(SeventhLayer); 526 | //errs() << "Cumu Layer: " << CumuType << " 7th Layer: " << SeventhLayer << "\n"; 527 | typeList.pop_front(); 528 | 529 | if (!typeList.empty()) { // Use & to repressent afterwards types 530 | CumuType = CumuType + "|" + SeventhLayer; 531 | SeventhMap[CumuType].insert("&"); 532 | //errs() << "Cumu Layer: " << CumuType << " 8th Layer: " << "&" << "\n"; 533 | } 534 | } 535 | } 536 | } 537 | } 538 | } 539 | } 540 | 541 | #ifdef DTweak 542 | typeList.clear(); 543 | typeList = MLTypeName2List(type); 544 | std::string LayTy; 545 | 546 | while (!typeList.empty()) { 547 | LayTy = typeList.front(); 548 | WMLTATypeFuncMap[stringHash(LayTy)].insert(F); 549 | 550 | // Confine F to matched types 551 | int pos = LayTy.find_last_of("#"); 552 | std::string LayTyName; 553 | std::string LayTyIdx; 554 | std::string MatchedType; 555 | 556 | if (pos > 0) { // Has index, not first layer type 557 | LayTyName = LayTy.substr(0,pos); 558 | LayTyIdx = LayTy.substr(pos); 559 | if (LayTy.substr(0,6) == "struct") { 560 | // Match with struct.anon#idx 561 | MatchedType = "struct.anon" + LayTyIdx; 562 | WMLTATypeFuncMap[stringHash(MatchedType)].insert(F); 563 | // Match with struct.anon#? 564 | MatchedType = "struct.anon#?"; 565 | WMLTATypeFuncMap[stringHash(MatchedType)].insert(F); 566 | } 567 | 568 | // Store in LayTyName#all to match with LayTyName#? 569 | MatchedType = LayTyName + "#?"; 570 | WMLTATypeFuncMap[stringHash(MatchedType)].insert(F); 571 | } 572 | 573 | typeList.pop_front(); 574 | } 575 | #endif 576 | 577 | return; 578 | } 579 | 580 | bool CallGraphPass::isClassType(std::string SrcTyName) { 581 | if (SrcTyName.find("(%class.") != string::npos) { 582 | return true; 583 | } 584 | return false; 585 | } 586 | 587 | std::string CallGraphPass::StripClassType(std::string classTy) { 588 | if (classTy[0] == '%') { 589 | classTy = classTy.substr(1, (classTy.size()-1)); 590 | } 591 | while (classTy.back() == '*') { 592 | classTy = classTy.substr(0, (classTy.size()-1)); 593 | } 594 | return classTy; 595 | } 596 | 597 | std::string CallGraphPass::GenClassTyName(std::string SrcTyName) { 598 | std::string substr = "%class."; 599 | int start = SrcTyName.find(substr); 600 | int end = SrcTyName.find(",", start); 601 | 602 | std::string classTy = SrcTyName.substr(start, (end-start)); 603 | classTy = StripClassType(classTy); 604 | std::string funcTy1 = SrcTyName.substr(0, start); 605 | std::string funcTy2 = SrcTyName.substr(end+2, (SrcTyName.size()-end-2)); 606 | std::string ClassTyName = funcTy1 + funcTy2 + "|" + classTy; 607 | 608 | return ClassTyName; 609 | } 610 | 611 | 612 | // ==================== Stage 1 ==================== 613 | void CallGraphPass::GVTypeFunctionRecord(GlobalVariable *GV, Function *F, Value *v, std::string FTyName) { 614 | //errs() << "GVTypeFunctionRecord" << "\n"; 615 | int offset; 616 | Value *PV; 617 | Type *vTy; 618 | std::string MLTypeName; 619 | std::string MLTypeName_backup; 620 | std::string STypeName; 621 | MLTypeName = FTyName; 622 | 623 | while (GVChildParentMap.find(v) != GVChildParentMap.end()) { 624 | PV = GVChildParentMap[v]; 625 | offset = GVChildParentOffsetMap[std::make_pair(v, PV)]; 626 | v = PV; 627 | vTy = v->getType(); 628 | STypeName = SingleType2String(vTy); 629 | if (STypeName != "NAMESET") { 630 | MLTypeName += "|" + STypeName + "#" + std::to_string(offset); 631 | 632 | // Record this type in MLTypeFuncMap 633 | UpdateMLTypeFuncMap(MLTypeName, F); 634 | //errs() << "1 " << "Type: " << MLTypeName << " Target: " << F->getName().str() << "\n"; 635 | //errs() << "GV: " << GV->getName().str() << "\n"; 636 | //ReferMap[stringHash(MLTypeName)] = MLTypeName; 637 | 638 | // Reserve info. for interative GV 639 | GVFuncMap[GV].insert(F); 640 | GVFuncTypeMap[make_pair(GV, F)] = MLTypeName; 641 | } 642 | else { 643 | MLTypeName_backup = MLTypeName; 644 | for (std::string TyName: TypeNameSet) { 645 | MLTypeName = MLTypeName_backup; 646 | MLTypeName += "|" + TyName + "#" + std::to_string(offset); 647 | 648 | // Record this type in MLTypeFuncMap 649 | UpdateMLTypeFuncMap(MLTypeName, F); 650 | //errs() << "2 " << "Type: " << MLTypeName << " Target: " << F->getName().str() << "\n"; 651 | //errs() << "GV: " << GV->getName().str() << "\n"; 652 | //ReferMap[stringHash(MLTypeName)] = MLTypeName; 653 | 654 | // Reserve info. for interative GV 655 | GVFuncMap[GV].insert(F); 656 | GVFuncTypeMap[make_pair(GV, F)] = MLTypeName; 657 | } 658 | } 659 | } 660 | 661 | return; 662 | } 663 | 664 | 665 | void CallGraphPass::IterativeGlobalVariable(GlobalVariable *GVouter, GlobalVariable *GVinner, Value *v) { 666 | 667 | int offset; 668 | Value* PV; 669 | std::string MLTypeName; 670 | std::string MLTypeName_backup; 671 | std::string STypeName; 672 | 673 | // Share GVinner's function with GVouter 674 | FuncSet FS = GVFuncMap[GVinner]; 675 | if (FS.empty()) { 676 | return; 677 | } 678 | 679 | for (auto F:FS) { 680 | MLTypeName = GVFuncTypeMap[make_pair(GVinner, F)]; 681 | //errs() << "MLTypeName from GVinners: " << MLTypeName << "\n"; 682 | Type *vTy; 683 | Value *key = v; 684 | while (GVChildParentMap.find(key) != GVChildParentMap.end()) { 685 | PV = GVChildParentMap[key]; 686 | offset = GVChildParentOffsetMap[std::make_pair(key, PV)]; 687 | key = PV; 688 | vTy = key->getType(); 689 | STypeName = SingleType2String(vTy); 690 | if (STypeName != "NAMESET") { 691 | MLTypeName += "|" + STypeName + "#" + std::to_string(offset); 692 | 693 | // Record this type in MLTypeFuncMap 694 | UpdateMLTypeFuncMap(MLTypeName, F); 695 | 696 | // Reserve info. for interative GV 697 | GVFuncMap[GVouter].insert(F); 698 | GVFuncTypeMap[make_pair(GVouter, F)] = MLTypeName; 699 | } 700 | else { 701 | MLTypeName_backup = MLTypeName; 702 | for (std::string TyName: TypeNameSet) { 703 | MLTypeName = MLTypeName_backup; 704 | MLTypeName += "|" + TyName + "#" + std::to_string(offset); 705 | 706 | // Record this type in MLTypeFuncMap 707 | UpdateMLTypeFuncMap(MLTypeName, F); 708 | 709 | // Reserve info. for interative GV 710 | GVFuncMap[GVouter].insert(F); 711 | GVFuncTypeMap[make_pair(GVouter, F)] = MLTypeName; 712 | } 713 | } 714 | } 715 | } 716 | 717 | return; 718 | } 719 | 720 | 721 | void CallGraphPass::GlobalVariableAnalysis(GlobalVariable *GV, Constant *Ini){ 722 | //errs() << "GV: " << GV->getName().str() << "\n"; 723 | GVChildParentMap.clear(); 724 | GVChildParentOffsetMap.clear(); 725 | // Check if the Initializer is ConstantAggregate 726 | if (isa(Ini)){ 727 | list IniList; 728 | IniList.push_back(Ini); 729 | 730 | while (!IniList.empty()) { 731 | User *U = IniList.front(); 732 | IniList.pop_front(); 733 | Type *UTy = U->getType(); 734 | Value *V = U; 735 | std::string FTyName; 736 | 737 | if (IsCompositeType(UTy) && !isa(UTy)) { 738 | int offset = 0; 739 | for (auto oi = U->op_begin(), oe = U->op_end(); oi != oe; ++oi) { 740 | Value *ov = *oi; 741 | //errs() << "element: " << *ov << "\n"; 742 | GVChildParentMap[ov] = V; 743 | GVChildParentOffsetMap[std::make_pair(ov, V)] = offset; 744 | Type *OTy = ov->getType(); 745 | 746 | if (IsCompositeType(OTy) && !isa(OTy)) { 747 | //errs() << "OTy is composite type" << "\n"; 748 | User *ou = dyn_cast(ov); 749 | IniList.push_back(ou); 750 | } 751 | else if (Function *F = dyn_cast(ov)) { 752 | //errs() << "ov is function" << "\n"; 753 | // Record F as a valid target of GV's multi-layer type 754 | FTyName = SingleType2String(ov->getType()); 755 | GVTypeFunctionRecord(GV, F, ov, FTyName); 756 | } 757 | else if (GlobalVariable *GVinner = dyn_cast(ov)) { 758 | //errs() << "ov is gv" << "\n"; 759 | IterativeGlobalVariable(GV, GVinner, ov); 760 | } 761 | else if (ConstantExpr *CE = dyn_cast(ov)) { 762 | Instruction *Inst = CE->getAsInstruction(); 763 | Value *operand = Inst->getOperand(0); 764 | if (GlobalVariable *GVinner = dyn_cast(operand)) { 765 | IterativeGlobalVariable(GV, GVinner, ov); 766 | } 767 | else if (CastInst *CI = dyn_cast(Inst)) { 768 | Value *op = CI->getOperand(0); 769 | //errs() << "op(0): " << *op << "\n"; 770 | Function *F = dyn_cast(ov->stripPointerCasts()); 771 | if (F != NULL) { 772 | //errs() << "bitcast function" << "\n"; 773 | Type *DestTy = CI->getDestTy(); 774 | Type *SrcTy = CI->getSrcTy(); 775 | std::string SrcTyName = SingleType2String(SrcTy); 776 | if (isClassType(SrcTyName)) { 777 | std::string ClassTyName = GenClassTyName(SrcTyName); 778 | UpdateMLTypeFuncMap(ClassTyName, F); 779 | } 780 | else { 781 | FTyName = SingleType2String(DestTy); 782 | GVTypeFunctionRecord(GV, F, ov, FTyName); 783 | } 784 | } 785 | } 786 | } 787 | else if (IsGeneralPointer(OTy) || OTy->isIntegerTy()) { 788 | // Do nothing 789 | } 790 | else if (isa(ov)) { 791 | // Do nothing 792 | } 793 | else { 794 | //errs() << "1 Unexpected global variable initialization: " << *ov << "\n"; 795 | } 796 | offset++; 797 | } 798 | } 799 | else if (Function *F = dyn_cast(V)) { 800 | FTyName = SingleType2String(V->getType()); 801 | GVTypeFunctionRecord(GV, F, V, FTyName); 802 | } 803 | else { 804 | //errs() << "2 Unexpected Global variable Initialization: " << *V << "\n"; 805 | } 806 | } 807 | } 808 | else if (Function *Func = dyn_cast(Ini)) { // Initializer is a function 809 | Type *GTy = Ini->getType(); 810 | std::string GTyName = SingleType2String(GTy); 811 | 812 | UpdateMLTypeFuncMap(GTyName, Func); 813 | } 814 | return; 815 | } 816 | 817 | 818 | // Unfold compound instruction and build an inst hierarchy 819 | // Embeded inst is a child of the compound inst 820 | void CallGraphPass::UnfoldCompoundInst(Instruction *I) { 821 | // Record inst hierarchy info. in InstHierarchy map: 822 | // Multi-level map: compoud Inst - index of operand - embedded inst 823 | unsigned operandNum = I->getNumOperands(); 824 | for (unsigned index = 0; index < operandNum; index++) { 825 | Value *operand = I->getOperand(index); 826 | if (operand->getType()->isPointerTy() && isa(operand)) { 827 | // Unfold compound inst 828 | // Transform the ConstantExpr into an instruction 829 | ConstantExpr *CE = dyn_cast(operand); 830 | Instruction * newInst = CE->getAsInstruction(); 831 | InstHierarchy[I][index] = newInst; 832 | UnfoldCompoundInst(newInst); 833 | } 834 | } 835 | 836 | return; 837 | } 838 | 839 | 840 | void CallGraphPass::MemCpyInstAnalysis(Instruction *I){ 841 | 842 | if (MemCpyInst *MCI = dyn_cast(I)) { 843 | std::string DTypeName; 844 | std::string STypeName; 845 | Value *dest = MCI->getRawDest(); 846 | Value *src = MCI->getRawSource(); 847 | CastInst *destCI; 848 | CastInst *srcCI; 849 | 850 | // Extract SrcType and DestType from MemCpyInst 851 | if (isa(dest)) { 852 | destCI = dyn_cast(dest); 853 | } 854 | else if (ConstantExpr *destCE = dyn_cast(dest)) { 855 | Instruction *destInst = destCE->getAsInstruction(); 856 | if (isa(destInst)) { 857 | destCI = dyn_cast(destInst); 858 | } 859 | } 860 | else {return;} 861 | 862 | if (isa(src)) { 863 | srcCI = dyn_cast(src); 864 | } 865 | else if (ConstantExpr *srcCE = dyn_cast(src)) { 866 | Instruction *srcInst = srcCE->getAsInstruction(); 867 | if (isa(srcInst)) { 868 | srcCI = dyn_cast(srcInst); 869 | } 870 | } 871 | else {return;} 872 | 873 | Type *destTy = destCI->getSrcTy(); 874 | Type *srcTy = srcCI->getSrcTy(); 875 | 876 | // Trace back to get src/dest type name 877 | if (isa(srcTy) && isa(destTy) 878 | && !IsGeneralPointer(srcTy) && !IsGeneralPointer(destTy)) { 879 | 880 | // Get first-layer type 881 | DTypeName = SingleType2String(destTy); 882 | STypeName = SingleType2String(srcTy); 883 | 884 | // Trace back to get next layer(s) 885 | DTypeName = GenerateMLTypeName(destCI->getOperand(0), DTypeName); 886 | STypeName = GenerateMLTypeName(srcCI->getOperand(0), STypeName); 887 | 888 | // STypeName shares targets with DTypeName 889 | UpdateRelationshipMap(DTypeName, STypeName); 890 | } 891 | } 892 | 893 | return; 894 | } 895 | 896 | void CallGraphPass::StoreInstAnalysis(StoreInst *SI) { 897 | 898 | Value *VO = SI->getValueOperand(); 899 | Value *PO = SI->getPointerOperand(); 900 | Type *VTy = VO->getType(); 901 | Type *PTy = PO->getType(); 902 | std::string VTypeName; 903 | std::string PTypeName; 904 | Instruction *EI; // Embedded inst 905 | 906 | if (PointerType *PPTy = dyn_cast(PTy)) { 907 | PTy = PPTy->getPointerElementType(); 908 | } 909 | 910 | // Collect unsupported types 911 | if (IsGeneralPointer(PTy) || isa(PTy)) { 912 | UnsupportedSet.insert(typeHash(PTy)); 913 | } 914 | if (IsGeneralPointer(VTy) || isa(VTy)) { 915 | UnsupportedSet.insert(typeHash(VTy)); 916 | } 917 | 918 | // Analyze PO to get PO's multi-layer type 919 | // PO is an embedded inst 920 | EI = InstHierarchy[SI][1]; 921 | if (EI) { 922 | if (CastInst *ECI = dyn_cast(EI)) { 923 | PTypeName = CastInstAnalysis(ECI); 924 | } 925 | else if (GetElementPtrInst *EGEPI = dyn_cast(EI)) { 926 | PTypeName = GEPInstAnalysis(EGEPI); 927 | } 928 | } 929 | // PO is not an embedded inst 930 | else { 931 | PTypeName = SingleType2String(PTy); 932 | PTypeName = GenerateMLTypeName(PO, PTypeName); 933 | } 934 | 935 | // Analyze VO and record useful info. 936 | // VO is an embedded inst 937 | EI = InstHierarchy[SI][0]; 938 | if (EI) { 939 | if (CastInst *ECI = dyn_cast(EI)) { 940 | VTypeName = CastInstAnalysis(ECI); 941 | } 942 | else if (GetElementPtrInst *EGEPI = dyn_cast(EI)) { 943 | VTypeName = GEPInstAnalysis(EGEPI); 944 | } 945 | UpdateRelationshipMap(PTypeName, VTypeName); 946 | } 947 | // VO is not an embedded inst 948 | else { 949 | // VO is a function 950 | if (Function *F = dyn_cast(VO)) { 951 | if (PTypeName.length() != 0) { 952 | UpdateMLTypeFuncMap(PTypeName, F); 953 | } 954 | } 955 | 956 | // VO is a pointer 957 | else if (isa(VTy)) { 958 | VTypeName = SingleType2String(VTy); 959 | VTypeName = GenerateMLTypeName(VO, VTypeName); 960 | if (VTypeName != PTypeName) { 961 | UpdateRelationshipMap(PTypeName, VTypeName); 962 | } 963 | } 964 | 965 | // VO is a SelectInst 966 | else if (SelectInst *SeI = dyn_cast(VO)) { 967 | Value *TV = SeI->getTrueValue(); 968 | Value *FV = SeI->getFalseValue(); 969 | 970 | // Conservatively record functions in T/F branches as targets 971 | if (Function *F = dyn_cast(TV)) { 972 | if (PTypeName.length() != 0) { 973 | UpdateMLTypeFuncMap(PTypeName, F); 974 | //ReferMap[stringHash(PTypeName)] = PTypeName; 975 | } 976 | } 977 | if (Function *F = dyn_cast(FV)) { 978 | if (PTypeName.length() != 0) { 979 | UpdateMLTypeFuncMap(PTypeName, F); 980 | //ReferMap[stringHash(PTypeName)] = PTypeName; 981 | } 982 | } 983 | } 984 | 985 | // VO is a CastInst 986 | else if (CastInst *CastI = dyn_cast(VO)) { 987 | Type *CastSrc = CastI->getSrcTy(); 988 | VTypeName = SingleType2String(CastSrc); 989 | VTypeName = GenerateMLTypeName(CastI->getOperand(0), VTypeName); 990 | if (VTypeName != PTypeName) { 991 | UpdateRelationshipMap(PTypeName, VTypeName); 992 | } 993 | } 994 | 995 | // VO is a CallInst 996 | else if (CallInst *CaI = dyn_cast(VO)) { 997 | Function *CalledFunc = CaI->getCalledFunction(); 998 | if (CalledFunc) { 999 | if (PTypeName.length() != 0) { 1000 | UpdateMLTypeFuncMap(PTypeName, CalledFunc); 1001 | //ReferMap[stringHash(PTypeName)] = PTypeName; 1002 | } 1003 | } 1004 | } 1005 | 1006 | // VO is a global variable 1007 | else if (GlobalVariable *GV = dyn_cast(VO)) { 1008 | VTypeName = SingleType2String(VTy); 1009 | UpdateRelationshipMap(PTypeName, VTypeName); 1010 | } 1011 | } 1012 | 1013 | return; 1014 | } 1015 | 1016 | void CallGraphPass::SelectInstAnalysis(SelectInst *SI) { 1017 | Value *TV = SI->getTrueValue(); 1018 | Value *FV = SI->getFalseValue(); 1019 | 1020 | // Conservatively record functions in T/F branches as targets 1021 | if (Function *F = dyn_cast(TV)) { 1022 | Type *FTy = F->getType(); 1023 | std::string FTyName = SingleType2String(FTy); 1024 | FTyName += "|&"; 1025 | UpdateMLTypeFuncMap(FTyName, F); 1026 | } 1027 | if (Function *F = dyn_cast(FV)) { 1028 | Type *FTy = F->getType(); 1029 | std::string FTyName = SingleType2String(FTy); 1030 | FTyName += "|&"; 1031 | UpdateMLTypeFuncMap(FTyName, F); 1032 | } 1033 | return; 1034 | } 1035 | 1036 | // Analyze cast instruction and return dest type's multi-layer type name 1037 | std::string CallGraphPass::CastInstAnalysis(CastInst *CI) { 1038 | //errs() << "Cast Inst: " << *CI << "\n"; 1039 | 1040 | std::string STypeName; 1041 | std::string DTypeName; 1042 | Type *SrcTy = CI->getSrcTy(); 1043 | Type *DestTy = CI->getDestTy(); 1044 | 1045 | // Collect unsupported types 1046 | if (IsGeneralPointer(SrcTy) || isa(SrcTy)) { 1047 | UnsupportedSet.insert(typeHash(SrcTy)); 1048 | } 1049 | if (IsGeneralPointer(DestTy) || isa(DestTy)) { 1050 | UnsupportedSet.insert(typeHash(DestTy)); 1051 | } 1052 | 1053 | // Escape type check: 1054 | // If a composite type is cast from/to unsupported type, then it is escape type 1055 | if (IsUnsupportedType(SrcTy) && IsCompositeType(DestTy)) { 1056 | EscapingSet.insert(SingleType2String(DestTy)); 1057 | } 1058 | if (IsUnsupportedType(DestTy) && IsCompositeType(SrcTy)) { 1059 | EscapingSet.insert(SingleType2String(SrcTy)); 1060 | } 1061 | 1062 | // SrcTy is an embedded inst 1063 | if (Instruction *EI = InstHierarchy[CI][0]) { 1064 | if (GetElementPtrInst *EGEPI = dyn_cast(EI)) { 1065 | STypeName = GEPInstAnalysis(EGEPI); 1066 | } 1067 | } 1068 | // SrcTy is not an embedded inst 1069 | else { 1070 | STypeName = SingleType2String(SrcTy); 1071 | STypeName = GenerateMLTypeName(CI->getOperand(0), STypeName); 1072 | } 1073 | 1074 | // DestTy is an embedded inst 1075 | if (Instruction *EI = InstHierarchy[CI][1]) { 1076 | if (GetElementPtrInst *EGEPI = dyn_cast(EI)) { 1077 | DTypeName = GEPInstAnalysis(EGEPI); 1078 | } 1079 | } 1080 | // DestTy is not an embedded inst 1081 | else { 1082 | DTypeName = SingleType2String(DestTy); 1083 | unsigned operandNum = CI->getNumOperands(); 1084 | if (operandNum > 1) { 1085 | DTypeName = GenerateMLTypeName(CI->getOperand(1), DTypeName); 1086 | } 1087 | } 1088 | 1089 | // Record type relationship 1090 | if (DTypeName.length() == 0 && STypeName.length() !=0) { 1091 | UpdateRelationshipMap(DTypeName, STypeName); 1092 | } 1093 | else if (STypeName.length() == 0 && DTypeName.length() !=0) { 1094 | UpdateRelationshipMap(STypeName, DTypeName); 1095 | } 1096 | else if (STypeName.length() == 0 && DTypeName.length() ==0) { 1097 | // Do NOT record in the map 1098 | } 1099 | else { 1100 | UpdateRelationshipMap(STypeName, DTypeName); 1101 | UpdateRelationshipMap(DTypeName, STypeName); 1102 | } 1103 | 1104 | return DTypeName; 1105 | } 1106 | 1107 | 1108 | std::string CallGraphPass::FunctionCastAnalysis(Function *F, CastInst *CastI) { 1109 | Type *SrcTy = CastI->getSrcTy(); 1110 | Type *DestTy = CastI->getDestTy(); 1111 | std::string STypeName = SingleType2String(SrcTy); 1112 | std::string DTypeName = SingleType2String(DestTy); 1113 | 1114 | // Update maps 1115 | UpdateMLTypeFuncMap(STypeName, F); 1116 | UpdateMLTypeFuncMap(DTypeName, F); 1117 | UpdateRelationshipMap(STypeName, DTypeName); 1118 | UpdateRelationshipMap(DTypeName, STypeName); 1119 | 1120 | return DTypeName; 1121 | } 1122 | 1123 | 1124 | void CallGraphPass::CallInstAnalysis(CallInst *CallI) { 1125 | Value *V = CallI->getCalledOperand(); 1126 | std::string FTyName; 1127 | std::string MLTypeName; 1128 | Type *FTy; 1129 | 1130 | if (ConstantExpr *CE = dyn_cast(V)){ 1131 | // e.g., call i8* bitcast (i8*(...) @func to i8*(...))(...) 1132 | Instruction *Inst = CE->getAsInstruction(); 1133 | if (CastInst *CastI = dyn_cast(Inst)) { 1134 | Function *F = dyn_cast(V->stripPointerCasts()); 1135 | if (F != NULL) { 1136 | FunctionCastAnalysis(F, CastI); 1137 | } 1138 | } 1139 | 1140 | } 1141 | 1142 | // Initialization through the arg(s) of call inst 1143 | // e.g., call i32* @func (i8*(i32)* @target) 1144 | // Record target and i8*(i32)* in MLTypeFuncMap. 1145 | // Trace back to see if i8*(i32)* has outer-layer type. 1146 | // If it has, record in TypeRelationshipMap. 1147 | for (auto ai = CallI->arg_begin(), ae = CallI->arg_end(); ai != ae; ++ai) { 1148 | Value *arg = *ai; 1149 | Type *argType = arg->getType(); 1150 | // Trace back to get arg's multi-layer type 1151 | MLTypeName = SingleType2String(argType); 1152 | MLTypeName = GenerateMLTypeName(arg, MLTypeName); 1153 | 1154 | if (Function *Func = dyn_cast(arg)) { 1155 | // arg's first layer type 1156 | FTyName = SingleType2String(Func->getType()); 1157 | 1158 | if (FTyName != MLTypeName) { 1159 | UpdateMLTypeFuncMap(MLTypeName, Func); 1160 | } 1161 | else { 1162 | MLTypeName = FTyName + "|&"; // conservative 1163 | UpdateMLTypeFuncMap(MLTypeName, Func); 1164 | } 1165 | } 1166 | } 1167 | 1168 | if (DbgValueInst *dbgvI = dyn_cast(CallI)) { 1169 | Value *v = dbgvI->getValue(); 1170 | if (v) { 1171 | if (Function *dbgF = dyn_cast(v)) { 1172 | FTyName = SingleType2String(dbgF->getType()); 1173 | FTyName += "|&"; 1174 | UpdateMLTypeFuncMap(FTyName, dbgF); 1175 | } 1176 | } 1177 | } 1178 | 1179 | return; 1180 | } 1181 | 1182 | // Extract multi-layer type from GEPInst 1183 | // Return structName#offset 1184 | std::string CallGraphPass::GEPInstAnalysis(GetElementPtrInst *GEPI) { 1185 | // If the indexes end with 0, there are two equivalent types in this GEPInst 1186 | std::string MLTypeName; 1187 | std::string EqualTypeName; 1188 | std::string LayerTyName; 1189 | list TyNameList; 1190 | list TyNameList_backup; 1191 | Type *LayerTy = NULL; 1192 | unsigned opNum = GEPI->getNumOperands(); 1193 | bool equal = false; 1194 | 1195 | uint64_t Idx; 1196 | int offset = 0; 1197 | int offsetNum; 1198 | Type *SrcTy = GEPI->getSourceElementType(); 1199 | for (auto oi = GEPI->idx_begin(), oe = GEPI->idx_end(); oi != oe; ++oi) { 1200 | // Get Layer Type each operand points to 1201 | offset++; 1202 | SmallVector Ops(GEPI->idx_begin(), GEPI->idx_begin() + offset); 1203 | LayerTy = GetElementPtrInst::getIndexedType(SrcTy, Ops); 1204 | LayerTyName = SingleType2String(LayerTy); 1205 | 1206 | // If an operand other than the first operand (operand(0)) is 0, 1207 | // there exists equivalent type 1208 | if (offset > 1 && GEPI->getOperand(offset) == 0) { 1209 | equal = true; 1210 | } 1211 | 1212 | // Add offset at the end of LayerTyName 1213 | if (offset < (opNum-1)) { 1214 | if (ConstantInt* CInt = dyn_cast(GEPI->getOperand(offset+1))) { 1215 | offsetNum = CInt->getSExtValue(); 1216 | LayerTyName += "#" + std::to_string(offsetNum); 1217 | } 1218 | else { 1219 | LayerTyName += "#?"; 1220 | } 1221 | } 1222 | TyNameList.push_back(LayerTyName); 1223 | } 1224 | TyNameList_backup = TyNameList; 1225 | 1226 | // Get the multi-layer type name in GEPInst 1227 | TyNameList_backup = TyNameList; 1228 | if (!TyNameList.empty()) { 1229 | MLTypeName = TyNameList.back(); 1230 | TyNameList.pop_back(); 1231 | while (!TyNameList.empty()){ 1232 | MLTypeName += "|" + TyNameList.back(); 1233 | TyNameList.pop_back(); 1234 | } 1235 | } 1236 | else { 1237 | MLTypeName = SingleType2String(SrcTy); 1238 | } 1239 | 1240 | // Record equivalent types in TypeRelationshipMap 1241 | if (equal == true) { 1242 | TyNameList_backup.pop_back(); 1243 | if (!TyNameList_backup.empty()) { 1244 | EqualTypeName = TyNameList_backup.back(); 1245 | TyNameList_backup.pop_back(); 1246 | while (!TyNameList_backup.empty()) { 1247 | EqualTypeName += "|" + TyNameList_backup.back(); 1248 | TyNameList_backup.pop_back(); 1249 | } 1250 | 1251 | if (MLTypeName != EqualTypeName) { 1252 | UpdateRelationshipMap(MLTypeName, EqualTypeName); 1253 | UpdateRelationshipMap(EqualTypeName, MLTypeName); 1254 | } 1255 | } 1256 | } 1257 | 1258 | return MLTypeName; 1259 | } 1260 | 1261 | 1262 | bool CallGraphPass::CollectInformation(Module *M) { 1263 | errs() << "Collecting information..." << "\n"; 1264 | 1265 | #ifdef DTnoSH 1266 | goto POS2; 1267 | #endif 1268 | 1269 | // Record named structures in StructNameMap for future use: Special handling 2 1270 | for (StructType* STy: M->getIdentifiedStructTypes()) { 1271 | if (STy->hasName() && !STy->isOpaque()) { 1272 | std::string STyID = GetStructIdentity(STy); 1273 | std::string STyName = STy->getName().str(); 1274 | STyName = StructNameTrim(STyName); 1275 | StructIDNameMap[STyID].insert(STyName); 1276 | } 1277 | } 1278 | 1279 | POS2: 1280 | // Deal with global variables 1281 | for (Module::global_iterator gi = M->global_begin(); gi != M->global_end(); ++gi) { 1282 | GlobalVariable* GV = &*gi; 1283 | if (!GV->hasInitializer()) { 1284 | continue; 1285 | } 1286 | Constant *Ini = GV->getInitializer(); 1287 | GlobalVariableAnalysis(GV, Ini); 1288 | } 1289 | 1290 | for (Module::global_iterator gi = M->global_begin(); gi != M->global_end(); ++gi) { 1291 | GlobalVariable* GV = &*gi; 1292 | if (!GV->hasInitializer()) { 1293 | continue; 1294 | } 1295 | Constant *Ini = GV->getInitializer(); 1296 | GlobalVariableAnalysis(GV, Ini); 1297 | } 1298 | 1299 | // Deal with instructions 1300 | for (Function &F : *M) { 1301 | 1302 | #ifdef DTnoSH 1303 | goto POS3; 1304 | #endif 1305 | 1306 | // Skip dead functions: Special handling 3 1307 | if (F.use_empty() && (F.getName().str() != "main")) { 1308 | continue; 1309 | } 1310 | 1311 | POS3: 1312 | // Record arguments in ArgSet, the type of arg is an incomplete type 1313 | std::set ArgSet; 1314 | for (Function::arg_iterator ai = F.arg_begin(), ae = F.arg_end(); ai != ae; ++ai) { 1315 | Value *arg = ai; 1316 | ArgSet.insert(arg); 1317 | } 1318 | 1319 | if (F.isDeclaration()) 1320 | continue; 1321 | 1322 | // Collect address-taken functions. 1323 | if (F.hasAddressTaken()) { 1324 | Ctx->AddressTakenFuncs.insert(&F); 1325 | } 1326 | 1327 | for (inst_iterator i = inst_begin(F), e = inst_end(F); i != e; ++i) { 1328 | Instruction *I = &*i; 1329 | 1330 | #ifdef DTnoSH 1331 | goto POS1; 1332 | #endif 1333 | 1334 | // Unfold and analyze compound inst: Special handling 1 1335 | if (IsCompoundInst(I)) { 1336 | InstHierarchy.clear(); 1337 | UnfoldCompoundInst(I); 1338 | } 1339 | 1340 | POS1: 1341 | // MemCpyInst 1342 | if (isa(I)) { 1343 | MemCpyInstAnalysis(I); 1344 | } 1345 | 1346 | // StoreInst 1347 | if (StoreInst *SI = dyn_cast(I)) { 1348 | StoreInstAnalysis(SI); 1349 | 1350 | // Deal with incomplete type 1351 | // store %0, %2 1352 | // If %0 is an arg, %2 is an AllocaInst, record %2 in ArgAllocaSet 1353 | // If a CallInst refers to AllocaInst %2, its type is incomplete 1354 | Value *PO = SI->getPointerOperand(); 1355 | Value *VO = SI->getValueOperand(); 1356 | if (ArgSet.find(VO) != ArgSet.end() && isa(PO)) { 1357 | AllocaInst *AI = dyn_cast(PO); 1358 | ArgAllocaSet.insert(AI); 1359 | } 1360 | } 1361 | 1362 | if (SelectInst *SeI = dyn_cast(I)) { 1363 | SelectInstAnalysis(SeI); 1364 | } 1365 | 1366 | // CastInst 1367 | if (CastInst *CastI = dyn_cast(I)) { 1368 | Type *SrcTy = CastI->getSrcTy(); 1369 | Type *DstTy = CastI->getDestTy(); 1370 | CastInstAnalysis(CastI); 1371 | } 1372 | 1373 | // GetElementPtrInst 1374 | if (GetElementPtrInst *GEPI = dyn_cast(I)) { 1375 | GEPInstAnalysis(GEPI); 1376 | } 1377 | 1378 | // CallInst 1379 | if (CallInst *CallI = dyn_cast(I)) { 1380 | CallInstAnalysis(CallI); 1381 | } 1382 | } 1383 | } 1384 | 1385 | return false; 1386 | } 1387 | 1388 | // ==================== Stage 2 ==================== 1389 | 1390 | // Add all functions in FS2 to FS1 1391 | FuncSet CallGraphPass::FSMerge(FuncSet FS1, FuncSet FS2) { 1392 | for (auto F: FS2) { 1393 | FS1.insert(F); 1394 | } 1395 | return FS1; 1396 | } 1397 | 1398 | // Return the intersection of FS1 and FS2 1399 | FuncSet CallGraphPass::FSIntersect(FuncSet FS1, FuncSet FS2) { 1400 | FuncSet FS; 1401 | for (auto F: FS1) { 1402 | if (FS2.find(F) != FS2.end()) { 1403 | FS.insert(F); 1404 | } 1405 | } 1406 | return FS; 1407 | } 1408 | 1409 | StrSet CallGraphPass::TypeSetMerge(StrSet S1, StrSet S2) { 1410 | for (auto str: S2) { 1411 | S1.insert(str); 1412 | } 1413 | return S1; 1414 | } 1415 | 1416 | StrPairSet CallGraphPass::FindSubTypes(std::string MLTypeName) { 1417 | StrPairSet SubTypeSet; 1418 | 1419 | // First pair: "" & MLTypeName 1420 | SubTypeSet.insert(std::make_pair("",MLTypeName)); 1421 | 1422 | // Other pairs: divide from "|"'s position 1423 | int idx = 0; 1424 | std::string head; 1425 | std::string tail; 1426 | while((idx = MLTypeName.find("|", idx)) != string::npos) { 1427 | head = MLTypeName.substr(0, idx+1); 1428 | tail = MLTypeName.substr(idx+1, (MLTypeName.size()-idx-1)); 1429 | SubTypeSet.insert(std::make_pair(head, tail)); 1430 | idx++; 1431 | } 1432 | 1433 | return SubTypeSet; 1434 | } 1435 | 1436 | void CallGraphPass::PrintResults(CallInst *CI, FuncSet FS, std::string MLTypeName) { 1437 | 1438 | // Print Call site index 1439 | CSIdx++; 1440 | errs() << CSIdx << " "; 1441 | errs() << "Call Site "; 1442 | CI -> getDebugLoc().print(errs()); 1443 | errs() << "\n"; 1444 | errs() << "Call site type: " << MLTypeName << "\n"; 1445 | 1446 | if (FS.empty()){ 1447 | errs() << "No target." << "\n"; 1448 | } 1449 | else { 1450 | vector FuncNameVec; 1451 | while(!FS.empty()){ 1452 | llvm::Function *CurFunc = *FS.begin(); 1453 | FuncNameVec.push_back(CurFunc->getName().str()); 1454 | FS.erase(CurFunc); 1455 | } 1456 | std::sort(FuncNameVec.begin(), FuncNameVec.end()); 1457 | for (std::string s:FuncNameVec){ 1458 | errs() << s << "\n"; 1459 | } 1460 | } 1461 | 1462 | return; 1463 | } 1464 | 1465 | list CallGraphPass::MLTypeName2List(std::string MLTypeName) { 1466 | 1467 | // Get number of layers 1468 | int LayerNum = 1; 1469 | int i = 0; 1470 | while ((i = MLTypeName.find("|", i)) != string::npos) { 1471 | LayerNum++; 1472 | i++; 1473 | } 1474 | 1475 | std::string LayerName; 1476 | list LayerList; 1477 | i = 0; 1478 | int idx = 0; 1479 | int j = 0; 1480 | while ((i = MLTypeName.find("|", i)) != string::npos) { 1481 | LayerName = MLTypeName.substr(j, i-j); 1482 | LayerList.push_back(LayerName); 1483 | idx++; 1484 | j = i+1; 1485 | i++; 1486 | } 1487 | LayerName = MLTypeName.substr(j, (MLTypeName.length()-j+1)); 1488 | LayerList.push_back(LayerName); 1489 | 1490 | return LayerList; 1491 | } 1492 | 1493 | 1494 | bool CallGraphPass::LayerMatch(std::string s1, std::string s2) { 1495 | // s1 from call site's type 1496 | // s2 from candidate's type (candidates are in MLTypeFuncMap) 1497 | 1498 | if (s1 == s2) { 1499 | return true; 1500 | } 1501 | else if (s1.substr(0,1) == "&" || s2.substr(0,1) == "&") { 1502 | return true; 1503 | } 1504 | else if (s1.substr(0,11) == "struct.anon") { // anonymous struct 1505 | if (s2.substr(0,6) != "struct") {return false;} 1506 | int pos1 = s1.find_last_of("#"); 1507 | int pos2 = s2.find_last_of("#"); 1508 | std::string s1Index = s1.substr(pos1); 1509 | std::string s2Index = s2.substr(pos2); 1510 | if (s1Index == s2Index) { // index is the same 1511 | return true; 1512 | } 1513 | else {return false;} 1514 | } 1515 | else if (s2.substr(0,11) == "struct.anon") { 1516 | if (s1.substr(0,6) != "struct") {return false;} 1517 | int pos1 = s1.find_last_of("#"); 1518 | int pos2 = s2.find_last_of("#"); 1519 | std::string s1Index = s1.substr(pos1); 1520 | std::string s2Index = s2.substr(pos2); 1521 | if (s1Index == s2Index) { // index is the same 1522 | return true; 1523 | } 1524 | else {return false;} 1525 | } 1526 | else if (s1.substr(s1.length()-1,1) == "?" || s2.substr(s2.length()-1,1) == "?" ){ // Unknown index 1527 | int pos1 = s1.find_last_of("#"); 1528 | int pos2 = s2.find_last_of("#"); 1529 | std::string s1Name = s1.substr(0,pos1); 1530 | std::string s2Name = s2.substr(0,pos2); 1531 | if (s1Name == s2Name) { 1532 | return true; 1533 | } 1534 | else {return false;} 1535 | } 1536 | else {return false;} 1537 | } 1538 | 1539 | 1540 | bool CallGraphPass::IsValidType(std::string ft) { 1541 | list ftList = MLTypeName2List(ft); 1542 | 1543 | // Ignore types with more than 7 layers 1544 | if (ftList.size() > 7) { 1545 | return false; 1546 | } 1547 | 1548 | // First layer must be function pointer type 1549 | std::string first = ftList.front(); 1550 | if (first.substr(first.length()-2, 2) != ")*") { 1551 | return false; 1552 | } 1553 | 1554 | // Other layers must be composite type or fuzzy type if exist 1555 | // The adjacent two layers should not be identical 1556 | ftList.pop_front(); 1557 | std::string other, afterPop; 1558 | std::size_t id1, id2; 1559 | while (!ftList.empty()) { 1560 | other = ftList.front(); 1561 | if (other == "&") { 1562 | ftList.pop_front(); 1563 | if (ftList.empty()) { 1564 | return true; 1565 | } 1566 | else { 1567 | return false; 1568 | } 1569 | } 1570 | else if (other.substr(0, 6) == "struct" || 1571 | other.substr(0, 5) == "array" || 1572 | other.substr(0, 6) == "vector") { 1573 | ftList.pop_front(); 1574 | if (!ftList.empty()) { 1575 | afterPop= ftList.front(); 1576 | id1 = other.find("#"); 1577 | id2 = afterPop.find("#"); 1578 | if (id1 != std::string::npos && id2 !=std::string::npos && 1579 | other.substr(0,id1) == afterPop.substr(0,id2)) { 1580 | return false; 1581 | } 1582 | } 1583 | } 1584 | else { 1585 | return false; 1586 | } 1587 | } 1588 | return true; 1589 | } 1590 | 1591 | StrSet CallGraphPass::AddFuzzyTypeAndCopySet(StrSet CumuSet) { 1592 | StrSet MatchedSet; 1593 | std::string FuzzyType; 1594 | for (std::string t: CumuSet) { 1595 | if (t.substr(t.length()-1,1) == "&") { 1596 | MatchedSet.insert(t); 1597 | } 1598 | else { 1599 | FuzzyType = t + "|&"; 1600 | MatchedSet.insert(t); 1601 | MatchedSet.insert(FuzzyType); 1602 | } 1603 | } 1604 | 1605 | return MatchedSet; 1606 | } 1607 | 1608 | StrSet CallGraphPass::LookupTypeRecordMap(std::string t, int map) { 1609 | StrSet Stmp; 1610 | 1611 | if (map == 1) { 1612 | Stmp = FirstMap[t]; 1613 | } 1614 | else if (map == 2) { 1615 | Stmp = SecondMap[t]; 1616 | } 1617 | else if (map == 3) { 1618 | Stmp = ThirdMap[t]; 1619 | } 1620 | else if (map == 4) { 1621 | Stmp = FourthMap[t]; 1622 | } 1623 | else if (map == 5) { 1624 | Stmp = FifthMap[t]; 1625 | } 1626 | else if (map == 6) { 1627 | Stmp = SixthMap[t]; 1628 | } 1629 | else if (map == 7) { 1630 | Stmp = SeventhMap[t]; 1631 | } 1632 | 1633 | return Stmp; 1634 | } 1635 | 1636 | StrSet CallGraphPass::UpdateCumuTySet(StrSet CumuTySet, int map, std::string LayerType) { 1637 | StrSet NewSet; 1638 | StrSet LayerTySet; 1639 | std::string CumuType; 1640 | 1641 | for (std::string t: CumuTySet) { 1642 | if (t.substr(t.length()-1,1) == "&") { 1643 | NewSet.insert(t); 1644 | } 1645 | LayerTySet = LookupTypeRecordMap(t, map); 1646 | for (std::string e: LayerTySet) { 1647 | if (LayerMatch(e, LayerType)) { 1648 | CumuType = t + "|" + e; 1649 | NewSet.insert(CumuType); 1650 | } 1651 | } 1652 | } 1653 | 1654 | return NewSet; 1655 | } 1656 | 1657 | StrSet CallGraphPass::CoverAll(StrSet CumuTySet, int map) { 1658 | StrSet MatchedTySet; 1659 | StrSet LayerTySet; 1660 | StrSet NewCumuTySet; 1661 | std::string CumuType; 1662 | 1663 | while (map <= 7) { 1664 | for (std::string t: CumuTySet) { 1665 | LayerTySet = LookupTypeRecordMap(t, map); 1666 | for (std::string l: LayerTySet) { 1667 | CumuType = t + "|" + l; 1668 | NewCumuTySet.insert(CumuType); 1669 | MatchedTySet.insert(CumuType); 1670 | } 1671 | } 1672 | if (NewCumuTySet.empty()) { 1673 | break; 1674 | } 1675 | CumuTySet = NewCumuTySet; 1676 | NewCumuTySet.clear(); 1677 | map++; 1678 | } 1679 | 1680 | return MatchedTySet; 1681 | } 1682 | 1683 | void CallGraphPass::printSet(StrSet Set) { 1684 | for (std::string t: Set) { 1685 | errs() << t << "\n"; 1686 | } 1687 | return; 1688 | } 1689 | 1690 | void CallGraphPass::ExhaustiveSearch4FriendTypes(std::string Search4Type) { 1691 | // Find friend types step 1: 1692 | // List all sub-types using form: head-body-tail. 1693 | // SplitArray[i][0]: head 1694 | // SplitArray[i][1]: body 1695 | // SplitArray[i][2]: tail 1696 | // Body can be replaced by friend types. 1697 | //errs() << "Looking for sub-types..." << "\n"; 1698 | list MLTyList; 1699 | MLTyList = MLTypeName2List(Search4Type); 1700 | LayerNumSet.insert(MLTyList.size()); 1701 | LayerNumArray[MLTyList.size()-1]++; 1702 | 1703 | int LayerNum = MLTyList.size(); 1704 | std::string MLTyArray[LayerNum]; 1705 | for (int layer = 0; layer < LayerNum; layer++) { 1706 | MLTyArray[layer] = MLTyList.front(); 1707 | MLTyList.pop_front(); 1708 | } 1709 | 1710 | int ArraySize = (1 + LayerNum) * LayerNum / 2; 1711 | std::string SplitArray[ArraySize][3]; 1712 | std::string head, body, tail; 1713 | 1714 | int i = 0; 1715 | for (int LayerSpan = 0; LayerSpan < LayerNum; LayerSpan++) { 1716 | for (int BodyIdx = 0; BodyIdx < LayerNum-LayerSpan; BodyIdx++) { 1717 | head = ""; 1718 | body = ""; 1719 | tail = ""; 1720 | 1721 | if (BodyIdx != 0) { 1722 | for (int j = 0; j < BodyIdx; j++) { 1723 | head += MLTyArray[j] + "|"; 1724 | } 1725 | } 1726 | SplitArray[i][0] = head; 1727 | 1728 | int BodyEnd = BodyIdx + LayerSpan; 1729 | for (int k = BodyIdx; k <= BodyEnd; k++) { 1730 | body += MLTyArray[k] + "|"; 1731 | } 1732 | body.pop_back(); 1733 | SplitArray[i][1] = body; 1734 | 1735 | int TailIdx = BodyIdx+LayerSpan+1; 1736 | if (TailIdx < LayerNum) { 1737 | for (int l = TailIdx; l < LayerNum; l++) { 1738 | tail += "|" + MLTyArray[l]; 1739 | } 1740 | } 1741 | SplitArray[i][2] = tail; 1742 | 1743 | i++; 1744 | } 1745 | } 1746 | 1747 | // Find friend types step 2: 1748 | // Find friend types for each sub-type (body) 1749 | // Concatenate sub-type's friend types with sub-type's head and tail 1750 | // to generate friend types for the entire multi-layer type 1751 | StrSet TS; // Search4Type's friend type set 1752 | StrSet bodyTS; 1753 | StrSet SFSet; // Sub-type's friend type set 1754 | std::string FriendType; 1755 | bool first; 1756 | for (int Aidx = 0; Aidx < ArraySize; Aidx++) { 1757 | head = SplitArray[Aidx][0]; 1758 | body = SplitArray[Aidx][1]; // Look for body's friend types 1759 | tail = SplitArray[Aidx][2]; 1760 | bodyTS.clear(); 1761 | 1762 | // Remove the idx of the first-layer in body 1763 | list bodyList; 1764 | bodyList = MLTypeName2List(body); 1765 | std::string bodyFst = bodyList.front(); 1766 | std::string bodyIdx; 1767 | std::string bodyRaw; 1768 | 1769 | if ((i = bodyFst.find("#")) != string::npos) { 1770 | bodyIdx = bodyFst.substr(i); 1771 | bodyRaw = body; 1772 | bodyRaw.erase(i, bodyIdx.length()); 1773 | } 1774 | else { 1775 | bodyIdx = ""; 1776 | bodyRaw = body; 1777 | } 1778 | 1779 | if (NewTypeRelationshipMap.find(bodyRaw) != NewTypeRelationshipMap.end()) { 1780 | SFSet = NewTypeRelationshipMap[bodyRaw]; 1781 | } 1782 | else { 1783 | if (Aidx == 0) { 1784 | first = true; 1785 | } 1786 | else { 1787 | first = false; 1788 | } 1789 | SFSet = UpgradeTypeRelationshipMap(bodyRaw, first); 1790 | } 1791 | 1792 | if (SFSet.empty()) { // current body has no friend type 1793 | FriendType = head + body + tail; 1794 | bodyTS.insert(FriendType); 1795 | } 1796 | else { 1797 | for (StrSet::iterator SFit = SFSet.begin(); SFit != SFSet.end(); SFit++) { 1798 | // If the friend type is a fuzzy type, do not put back idx 1799 | if (*SFit == "&") { 1800 | FriendType = head + "&" + tail; 1801 | bodyTS.clear(); 1802 | bodyTS.insert(FriendType); 1803 | break; 1804 | } 1805 | 1806 | // Put back the idx of the first-layer in body 1807 | bodyList.clear(); 1808 | bodyList = MLTypeName2List(*SFit); 1809 | bodyFst = bodyList.front() + bodyIdx; 1810 | bodyList.pop_front(); 1811 | std::string newBody = bodyFst; 1812 | while (!bodyList.empty()) { 1813 | newBody += "|" + bodyList.front(); 1814 | bodyList.pop_front(); 1815 | } 1816 | 1817 | // Generate friend type for the entire multi-layer type 1818 | FriendType = head + newBody + tail; 1819 | bodyTS.insert(FriendType); 1820 | } 1821 | } 1822 | TS.insert(bodyTS.begin(), bodyTS.end()); 1823 | } 1824 | 1825 | // Trim TS, remove invalid types 1826 | std::set TS_copy; 1827 | for (std::string ft: TS) { 1828 | if (IsValidType(ft)) { 1829 | TS_copy.insert(ft); 1830 | } 1831 | } 1832 | TS = TS_copy; 1833 | 1834 | // Remove "&" at the end to generate corresponding non-fuzzy types 1835 | std::string clearType; 1836 | for (std::string ft: TS) { 1837 | if (ft.substr(ft.length()-1,1) == "&") { 1838 | clearType = ft.substr(0, ft.length()-2); 1839 | TS_copy.insert(clearType); 1840 | } 1841 | } 1842 | TS.insert(TS_copy.begin(), TS_copy.end()); 1843 | 1844 | // Update FriendTyMap 1845 | if (FriendTyMap.find(Search4Type) == FriendTyMap.end()) { 1846 | FriendTyMap[Search4Type] = TS; 1847 | } 1848 | else { 1849 | FriendTyMap[Search4Type].insert(TS.begin(), TS.end()); 1850 | } 1851 | 1852 | return; 1853 | } 1854 | 1855 | void CallGraphPass::FindCalleesWithSMLTA(CallInst *CI) { 1856 | 1857 | std::string MLTypeName; // Call site's type 1858 | FuncSet FS; 1859 | 1860 | // Get Caller's multi-layer type 1861 | Value *CV = CI->getCalledOperand(); 1862 | Type *LayerTy = CV->getType(); 1863 | MLTypeName = SingleType2String(LayerTy); 1864 | 1865 | if (isClassType(MLTypeName)) { 1866 | MLTypeName = GenClassTyName(MLTypeName); 1867 | // Deal with virtual functions 1868 | auto ai = CI->arg_begin(); 1869 | Value *arg0 = *ai; 1870 | if (LoadInst *LI = dyn_cast(arg0)) { 1871 | Value *LIop = LI->getOperand(0); 1872 | if (DerivedClassMap.find(LIop) != DerivedClassMap.end()) { 1873 | std::string classTyName = DerivedClassMap[LIop]; 1874 | int index = MLTypeName.find("|"); 1875 | MLTypeName = MLTypeName.substr(0, index+1) + classTyName; 1876 | } 1877 | } 1878 | FS = MLTypeFuncMap[stringHash(MLTypeName)]; 1879 | 1880 | } 1881 | else { 1882 | MLTypeName = GenerateMLTypeName(CV, MLTypeName); 1883 | 1884 | // If the called value can be traced back to an AllocaInst in ArgAllocSet 1885 | // This call site has incomplete type, use "&" to mark it 1886 | if (AIFlag == true) { 1887 | AIFlag = false; // turn off AIFlag 1888 | if (ArgAllocaSet.find(RecordAI) != ArgAllocaSet.end()) { 1889 | MLTypeName += "|&"; 1890 | } 1891 | } 1892 | } 1893 | 1894 | if (TargetLookupMap.find(stringHash(MLTypeName)) != TargetLookupMap.end()) { 1895 | FS = TargetLookupMap[stringHash(MLTypeName)]; 1896 | } 1897 | else { 1898 | // Initialize FS, it will be enlarged later 1899 | FS = MLTypeFuncMap[stringHash(MLTypeName)]; 1900 | ExhaustiveSearch4FriendTypes(MLTypeName); 1901 | StrSet FTySet; 1902 | StrSet FriendTySetRound1; 1903 | StrSet FriendTySetRound2; 1904 | StrSet FriendTySetRound3; 1905 | 1906 | //FriendTySetRound1 = FriendTyMap[MLTypeName]; 1907 | //for (auto f: FriendTySetRound1) { 1908 | // ExhaustiveSearch4FriendTypes(f); 1909 | // FriendTySetRound2.insert(FriendTyMap[f].begin(), FriendTyMap[f].end()); 1910 | //FriendTySetRound2 = FriendTyMap[f]; 1911 | //FriendTyMap[MLTypeName].insert(FriendTySetRound2.begin(), FriendTySetRound2.end()); 1912 | //} 1913 | //FriendTyMap[MLTypeName].insert(FriendTySetRound2.begin(), FriendTySetRound2.end()); 1914 | 1915 | FTySet = FriendTyMap[MLTypeName]; 1916 | //errs() << "friend type set size: " << FTySet.size() << "\n"; 1917 | 1918 | // Search for matched types for each element in FTySet 1919 | StrSet MatchedTySet; // Types match with CheckType 1920 | StrSet AllMatchedTySet; // Types match with all elements in FTySet 1921 | StrSet CumuTySet; 1922 | for (std::set::iterator TSit = FTySet.begin(); TSit != FTySet.end(); TSit++) { 1923 | std::string CheckType = *TSit; 1924 | list CheckTyList; 1925 | CheckTyList = MLTypeName2List(CheckType); 1926 | //errs() << "checktype: " << CheckType << "\n"; 1927 | 1928 | // Initialize the sets 1929 | MatchedTySet.clear(); 1930 | CumuTySet.clear(); 1931 | 1932 | std::string FirstLayer = CheckTyList.front(); 1933 | CheckTyList.pop_front(); 1934 | CumuTySet.insert(FirstLayer); 1935 | 1936 | #ifdef DTnocache 1937 | goto NOCACHE; 1938 | #endif 1939 | 1940 | // Lookup the cache for matched types 1941 | MatchedTySet = MatchedTyMap[stringHash(CheckType)]; 1942 | if (!MatchedTySet.empty()) { 1943 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 1944 | continue; 1945 | } 1946 | 1947 | NOCACHE: 1948 | if (CheckTyList.empty()) { // Single-layer type 1949 | //errs() << "single-layer " << "\n"; 1950 | MatchedTySet = AddFuzzyTypeAndCopySet(CumuTySet); 1951 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 1952 | } 1953 | else { // >= 2 layers 1954 | std::string SecondLayer = CheckTyList.front(); 1955 | if (SecondLayer == "&") { 1956 | MatchedTySet = CoverAll(CumuTySet, 1); 1957 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 1958 | MatchedTyMap[stringHash(CheckType)] = MatchedTySet; 1959 | continue; 1960 | } 1961 | CheckTyList.pop_front(); 1962 | CumuTySet = UpdateCumuTySet(CumuTySet, 1, SecondLayer); 1963 | 1964 | if (CheckTyList.empty()) { // Two-layer type 1965 | MatchedTySet = AddFuzzyTypeAndCopySet(CumuTySet); 1966 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 1967 | } 1968 | else { // >= 3 layers 1969 | std::string ThirdLayer = CheckTyList.front(); 1970 | if (ThirdLayer == "&") { 1971 | MatchedTySet = CoverAll(CumuTySet, 2); 1972 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 1973 | MatchedTyMap[stringHash(CheckType)] = MatchedTySet; 1974 | continue; 1975 | } 1976 | CheckTyList.pop_front(); 1977 | CumuTySet = UpdateCumuTySet(CumuTySet, 2, ThirdLayer); 1978 | 1979 | if (CheckTyList.empty()) { // Three-layer type 1980 | MatchedTySet = AddFuzzyTypeAndCopySet(CumuTySet); 1981 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 1982 | } 1983 | else { // >= 4 layers 1984 | std::string FourthLayer = CheckTyList.front(); 1985 | if (FourthLayer == "&") { 1986 | MatchedTySet = CoverAll(CumuTySet, 3); 1987 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 1988 | MatchedTyMap[stringHash(CheckType)] = MatchedTySet; 1989 | continue; 1990 | } 1991 | CheckTyList.pop_front(); 1992 | CumuTySet = UpdateCumuTySet(CumuTySet, 3, FourthLayer); 1993 | 1994 | if (CheckTyList.empty()) { // Four-layer type 1995 | MatchedTySet = AddFuzzyTypeAndCopySet(CumuTySet); 1996 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 1997 | } 1998 | else { // >= 5 layers 1999 | std::string FifthLayer = CheckTyList.front(); 2000 | if (FifthLayer == "&") { 2001 | MatchedTySet = CoverAll(CumuTySet, 4); 2002 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 2003 | MatchedTyMap[stringHash(CheckType)] = MatchedTySet; 2004 | continue; 2005 | } 2006 | CheckTyList.pop_front(); 2007 | CumuTySet = UpdateCumuTySet(CumuTySet, 4, FifthLayer); 2008 | 2009 | if (CheckTyList.empty()) { // Five-layer type 2010 | MatchedTySet = AddFuzzyTypeAndCopySet(CumuTySet); 2011 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 2012 | } 2013 | else { // >= 6 layers 2014 | std::string SixthLayer = CheckTyList.front(); 2015 | if (SixthLayer == "&") { 2016 | MatchedTySet = CoverAll(CumuTySet, 5); 2017 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 2018 | MatchedTyMap[stringHash(CheckType)] = MatchedTySet; 2019 | continue; 2020 | } 2021 | CheckTyList.pop_front(); 2022 | CumuTySet = UpdateCumuTySet(CumuTySet, 5, SixthLayer); 2023 | 2024 | if (CheckTyList.empty()) { // Six-layer type 2025 | MatchedTySet = AddFuzzyTypeAndCopySet(CumuTySet); 2026 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 2027 | } 2028 | else { // >= 7 layers 2029 | std::string SeventhLayer = CheckTyList.front(); 2030 | if (SeventhLayer == "&") { 2031 | MatchedTySet = CoverAll(CumuTySet, 6); 2032 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 2033 | MatchedTyMap[stringHash(CheckType)] = MatchedTySet; 2034 | continue; 2035 | } 2036 | CheckTyList.pop_front(); 2037 | CumuTySet = UpdateCumuTySet(CumuTySet, 6, SeventhLayer); 2038 | 2039 | if (CheckTyList.empty()) { // Seven-layer type 2040 | MatchedTySet = AddFuzzyTypeAndCopySet(CumuTySet); 2041 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 2042 | } 2043 | else { // > 7 layers, not in scope 2044 | std::string OuterLayer = "&"; 2045 | CheckTyList.pop_front(); 2046 | CumuTySet = UpdateCumuTySet(CumuTySet, 7, OuterLayer); 2047 | MatchedTySet = AddFuzzyTypeAndCopySet(CumuTySet); 2048 | AllMatchedTySet.insert(MatchedTySet.begin(), MatchedTySet.end()); 2049 | } 2050 | } 2051 | } 2052 | } 2053 | } 2054 | } 2055 | } 2056 | MatchedTyMap[stringHash(CheckType)] = MatchedTySet; 2057 | } 2058 | 2059 | 2060 | // Lookup MLTypeFuncMap 2061 | //errs() << "all matched types" << "\n"; 2062 | FuncSet FStmp; 2063 | for (std::string t: AllMatchedTySet) { 2064 | //errs() << t << "\n"; 2065 | FStmp = MLTypeFuncMap[stringHash(t)]; 2066 | FS = FSMerge(FS, FStmp); 2067 | FStmp.clear(); 2068 | } 2069 | 2070 | 2071 | #ifdef DTweak 2072 | FS.clear(); 2073 | 2074 | // Traverse all matched types 2075 | for (std::string mt: AllMatchedTySet) { 2076 | FuncSet FS1, FS2; 2077 | std::list typeList; 2078 | typeList = MLTypeName2List(mt); 2079 | 2080 | std::string LayTy = typeList.front(); 2081 | typeList.pop_front(); 2082 | FS1 = WMLTATypeFuncMap[stringHash(LayTy)]; 2083 | 2084 | while (!typeList.empty()) { 2085 | LayTy = typeList.front(); 2086 | typeList.pop_front(); 2087 | FS2 = WMLTATypeFuncMap[stringHash(LayTy)]; 2088 | FS1 = FSIntersect(FS1, FS2); 2089 | } 2090 | 2091 | FS.insert(FS1.begin(), FS1.end()); 2092 | } 2093 | 2094 | /* 2095 | // Traverse all friend types 2096 | for (std::string t: FTySet) { 2097 | FuncSet FS1, FS2, FS3; 2098 | std::list typeList; 2099 | typeList = MLTypeName2List(t); 2100 | 2101 | // Get function set of the 1st-layer 2102 | std::string LayTy = typeList.front(); 2103 | typeList.pop_front(); 2104 | FS1 = WMLTATypeFuncMap[stringHash(LayTy)]; 2105 | 2106 | 2107 | // Get function set of the other layers 2108 | while (!typeList.empty()) { 2109 | LayTy = typeList.front(); 2110 | typeList.pop_front(); 2111 | 2112 | if (LayTy == "&") {continue;} 2113 | FS2 = WMLTATypeFuncMap[stringHash(LayTy)]; 2114 | FS1 = FSIntersect(FS1, FS2); 2115 | } 2116 | 2117 | // Merge function sets of all matched types 2118 | FS.insert(FS1.begin(), FS1.end()); 2119 | } 2120 | */ 2121 | #endif 2122 | 2123 | // Remove the functions that are not address-taken 2124 | FuncSet ATFS; 2125 | for (auto F: FS) { 2126 | if (F->hasAddressTaken()) { 2127 | ATFS.insert(F); 2128 | } 2129 | } 2130 | FS = ATFS; 2131 | 2132 | // Record in TargetLoopupMap; 2133 | TargetLookupMap[stringHash(MLTypeName)] = FS; 2134 | } 2135 | 2136 | 2137 | // Statistics 2138 | errs() << "\n"; 2139 | PrintResults(CI, FS, MLTypeName); 2140 | //errs() << FS.size() << "\n"; 2141 | 2142 | Ctx->NumIndirectCallTargets += FS.size(); 2143 | 2144 | if (FS.size() == 0) { 2145 | Ctx->NoTargetCalls++; 2146 | } 2147 | else if (FS.size() >= 1 && FS.size() < 2 ) { 2148 | Ctx->ZerotTargetCalls++; 2149 | } 2150 | else if (FS.size() >= 2 && FS.size() < 4 ) { 2151 | Ctx->OnetTargetCalls++; 2152 | } 2153 | else if (FS.size() >= 4 && FS.size() < 8 ) { 2154 | Ctx->TwotTargetCalls++; 2155 | } 2156 | else if (FS.size() >= 8 && FS.size() < 16 ) { 2157 | Ctx->ThreetTargetCalls++; 2158 | } 2159 | else if (FS.size() >= 16 && FS.size() < 32 ) { 2160 | Ctx->FourtTargetCalls++; 2161 | } 2162 | else if (FS.size() >= 32 && FS.size() < 64 ) { 2163 | Ctx->FivetTargetCalls++; 2164 | } 2165 | else if (FS.size() >= 64 && FS.size() < 128 ) { 2166 | Ctx->SixtTargetCalls++; 2167 | } 2168 | else if (FS.size() >= 128 && FS.size() < 256 ) { 2169 | Ctx->SeventTargetCalls++; 2170 | } 2171 | else if (FS.size() >= 256 ) { 2172 | Ctx->EighttTargetCalls++; 2173 | } 2174 | 2175 | 2176 | return; 2177 | } 2178 | 2179 | void CallGraphPass::PrintMaps() { 2180 | 2181 | errs() << "========== MLTypeFuncMap ==========" << "\n"; 2182 | DenseMap::iterator mapiter; 2183 | mapiter = MLTypeFuncMap.begin(); 2184 | while (mapiter != MLTypeFuncMap.end()) { 2185 | errs() << "type: " << ReferMap[mapiter->first] << "\n"; 2186 | FuncSet FS = mapiter->second; 2187 | for (auto F: FS) { 2188 | errs() << "target: " << F->getName() << "\n"; 2189 | } 2190 | mapiter++; 2191 | } 2192 | 2193 | errs() << "========== WMLTypeFuncMap ==========" << "\n"; 2194 | DenseMap::iterator wmapiter; 2195 | wmapiter = WMLTATypeFuncMap.begin(); 2196 | while (wmapiter != WMLTATypeFuncMap.end()) { 2197 | errs() << "type: " << wmapiter->first << "\n"; 2198 | FuncSet FS = wmapiter->second; 2199 | for (auto F: FS) { 2200 | errs() << "target: " << F->getName() << "\n"; 2201 | } 2202 | wmapiter++; 2203 | } 2204 | 2205 | errs() << "========== TypeRelationshipMap ==========" << "\n"; 2206 | std::map::iterator iter; 2207 | iter = TypeRelationshipMap.begin(); 2208 | while (iter != TypeRelationshipMap.end()) { 2209 | errs() << "type: " << iter->first << "\n"; 2210 | StrSet ss = iter->second; 2211 | for (auto str: ss) { 2212 | errs() << "friend: " << str << "\n"; 2213 | } 2214 | iter++; 2215 | } 2216 | 2217 | errs() << "========== EscapingSet ==========" << "\n"; 2218 | for (auto s: EscapingSet) { 2219 | errs() << "Escaping type: " << s << "\n"; 2220 | } 2221 | 2222 | return; 2223 | } 2224 | 2225 | 2226 | // Remove cycles between types in TypeRelationshipMap 2227 | // Store the organized type-relationships in NewTypeRelationshipMap 2228 | StrSet CallGraphPass::UpgradeTypeRelationshipMap(std::string key, bool first) { 2229 | StrSet S1, S2, S3; 2230 | int S1_origSize; 2231 | int S1_newSize; 2232 | 2233 | S1_origSize = 0; 2234 | S1_newSize = 0; 2235 | 2236 | // Find friend types for the current key 2237 | S2.insert(key); 2238 | int loop = 0; 2239 | while(!S2.empty()) { 2240 | loop++; 2241 | for (StrSet::iterator it2 = S2.begin(); it2 != S2.end(); it2++) { 2242 | std::string S2Ele = *it2; 2243 | StrSet S2Friend = TypeRelationshipMap[S2Ele]; 2244 | for (std::string ft: S2Friend) { 2245 | if (S2Ele.substr(0,6) == "struct" 2246 | || S2Ele.substr(0,5) == "array" 2247 | || S2Ele.substr(0,6) == "vector") { 2248 | if (ft.substr(0,6) == "struct" 2249 | || ft.substr(0,5) == "array" 2250 | || ft.substr(0,6) == "vector" 2251 | || ft.substr(0,5) == "union") { 2252 | S3.insert(ft); 2253 | } 2254 | } 2255 | else { 2256 | if (!IsUnsupportedTypeStr(ft)) { 2257 | S3.insert(ft); 2258 | } 2259 | } 2260 | } 2261 | } 2262 | 2263 | S1_origSize = S1.size(); 2264 | S1 = TypeSetMerge(S1, S2); 2265 | S1_newSize = S1.size(); 2266 | 2267 | if (loop > 1) { 2268 | for (auto e: S1) { 2269 | if (IsEscapingType(e)) { 2270 | return S1; 2271 | } 2272 | } 2273 | } 2274 | 2275 | if (S1_origSize == S1_newSize) { 2276 | // All friends of this key is in S1 2277 | break; 2278 | } 2279 | else { 2280 | // Continue to find new friend types 2281 | S2.empty(); 2282 | S2 = S3; 2283 | S3.empty(); 2284 | } 2285 | } 2286 | 2287 | // Store to NewTypeRelationshipMap 2288 | NewTypeRelationshipMap[key] = S1; 2289 | 2290 | 2291 | return S1; 2292 | } 2293 | 2294 | bool CallGraphPass::IdentifyTargets(Module *M) { 2295 | 2296 | //PrintMaps(); 2297 | //errs() << "size: " << MLTypeFuncMap.size() << "\n"; 2298 | 2299 | errs() << "Identify indirect call targets with SMLTA..." << "\n"; 2300 | for (Module::iterator f = M->begin(), fe = M->end(); f != fe; ++f) { 2301 | Function *F = &*f; 2302 | 2303 | // Skip dead functions 2304 | if (F->use_empty() && (F->getName().str() != "main")) { 2305 | continue; 2306 | } 2307 | 2308 | DerivedClassMap.clear(); 2309 | for (inst_iterator i = inst_begin(F), e = inst_end(F); i != e; ++i) { 2310 | if (StoreInst *SI = dyn_cast(&*i)) { 2311 | Value *VO = SI->getValueOperand(); 2312 | Value *PO = SI->getPointerOperand(); 2313 | if (CastInst *CastI = dyn_cast(VO)) { 2314 | Type *SrcTy = CastI->getSrcTy(); 2315 | std::string SrcTyName = SingleType2String(SrcTy); 2316 | if (SrcTyName.substr(0,6) == "class.") { 2317 | SrcTyName = StripClassType(SrcTyName); 2318 | DerivedClassMap[PO] = SrcTyName; 2319 | } 2320 | } 2321 | } 2322 | if (CallInst *CI = dyn_cast(&*i)) { 2323 | // Indirect call 2324 | if (CI->isIndirectCall()) { 2325 | FindCalleesWithSMLTA(CI); 2326 | 2327 | // Record indirect calls 2328 | Ctx->IndirectCallInsts.push_back(CI); 2329 | } 2330 | // Direct call 2331 | else { 2332 | // Not goal of this work 2333 | } 2334 | } 2335 | } 2336 | } 2337 | 2338 | //for (auto it=LayerNumSet.cbegin(); it!=LayerNumSet.cend(); it++) { 2339 | // errs() << *it << " "; 2340 | //} 2341 | //errs() << "LayerNumArray: " << "\n"; 2342 | //for (int i=0; i<12; i++) { 2343 | // errs() << LayerNumArray[i] << " "; 2344 | //} 2345 | 2346 | return false; 2347 | } 2348 | -------------------------------------------------------------------------------- /src/lib/CallGraph.h: -------------------------------------------------------------------------------- 1 | #ifndef _CALL_GRAPH_H 2 | #define _CALL_GRAPH_H 3 | 4 | #include "Analyzer.h" 5 | #include "Common.h" 6 | #include 7 | 8 | using namespace llvm; 9 | 10 | class CallGraphPass : public IterativeModulePass { 11 | 12 | private: 13 | const DataLayout *DL; 14 | // char * or void * 15 | Type *Int8PtrTy; 16 | // long interger type 17 | Type *IntPtrTy; 18 | 19 | // ================Data Structures================ 20 | // Multi-layer mapping 21 | static std::map FirstMap; 22 | static std::map SecondMap; 23 | static std::map ThirdMap; 24 | static std::map FourthMap; 25 | static std::map FifthMap; 26 | static std::map SixthMap; 27 | static std::map SeventhMap; 28 | 29 | // WMLTA Type-Func Map 30 | static DenseMap WMLTATypeFuncMap; 31 | 32 | // Type-Func Map 33 | static DenseMap MLTypeFuncMap; 34 | 35 | // A cache for quick lookup 36 | static DenseMap TargetLookupMap; 37 | 38 | // Hash-Type Map 39 | static std::map ReferMap; 40 | 41 | // StructID-StructName, help anonymous structs find names 42 | static std::map StructIDNameMap; 43 | 44 | // Extract complete multi-layer type for global variables 45 | static std::map GVChildParentMap; 46 | static std::map, int> GVChildParentOffsetMap; 47 | static std::map GVFuncMap; 48 | static std::map, std::string> GVFuncTypeMap; 49 | 50 | // Help check if an argument is an input 51 | static std::set ArgAllocaSet; 52 | 53 | // Deal with compound instructions 54 | static std::map> InstHierarchy; 55 | 56 | // Type-Type Map, record friend types 57 | static std::map TypeRelationshipMap; 58 | 59 | // Organized Type-Type Map: a cache for quick search of friend types 60 | static std::map NewTypeRelationshipMap; 61 | 62 | // A cache for quick search of variant types 63 | //static std::map VariantTypeMap; 64 | 65 | // A cache for quick search of matched types 66 | static std::map MatchedTyMap; 67 | 68 | // A cache for quick search of a multi-layer type's friend types 69 | static std::map FriendTyMap; 70 | 71 | // Record derived classes for virtual functions 72 | static std::map DerivedClassMap; 73 | 74 | // Record escaping types 75 | static std::set EscapingSet; 76 | static std::set UnsupportedSet; 77 | 78 | // For evaluation use 79 | static std::set ManyTargetType; 80 | 81 | 82 | // ==================Functions==================== 83 | void IterativeGlobalVariable(GlobalVariable *GVouter, GlobalVariable *GVinner, Value *v); 84 | void GVTypeFunctionRecord(GlobalVariable *GV, Function *F, Value *v, std::string FTyName); 85 | Value *NextLayerTypeExtraction(Value *v); 86 | bool isClassType(std::string SrcTyName); 87 | std::string StripClassType(std::string classTy); 88 | std::string GenClassTyName(std::string SrcTyName); 89 | void GlobalVariableAnalysis(GlobalVariable *GV, Constant *Ini); 90 | void UnfoldCompoundInst(Instruction *I); 91 | void CompoundInstAnalysis(Instruction *I); 92 | void MemCpyInstAnalysis(Instruction *MCI); 93 | void StoreInstAnalysis(StoreInst *SI); 94 | void SelectInstAnalysis(SelectInst *SI); 95 | std::string CastInstAnalysis(CastInst *CastI); 96 | std::string FunctionCastAnalysis(Function *F, CastInst *CastI); 97 | void CallInstAnalysis(CallInst *CallI); 98 | std::string GEPInstAnalysis(GetElementPtrInst *GEPI); 99 | std::string SingleType2String (Type *Ty); 100 | bool IsCompositeType(Type *Ty); 101 | bool IsCompositeTypeStr(std::string TyStr); 102 | bool IsCompoundInst(Instruction *I); 103 | bool IsGeneralPointer(Type *Ty); 104 | bool IsUnsupportedType(Type *Ty); 105 | bool IsUnsupportedTypeStr(std::string TyStr); 106 | bool IsEscapingType(std::string TyName); 107 | std::string GetStructIdentity(StructType* STy); 108 | bool HasSubString(std::string str, std::string substr); 109 | std::string GenerateMLTypeName(Value *VO, std::string MLTypeName); 110 | std::string StructNameTrim(std::string sName); 111 | std::size_t FindEndOfStruct(std::string structstr); 112 | std::string FirstLayerTrim(std::string fName); 113 | void UpdateRelationshipMap(std::string CType, std::string FType); 114 | void UpdateMLTypeFuncMap(std::string type, Function* F); 115 | void CalculateVariantTypes(std::string TyStr); 116 | 117 | void FindCalleesWithSMLTA(CallInst *CI); 118 | void ExhaustiveSearch4FriendTypes(std::string Search4Type); 119 | FuncSet FSMerge(FuncSet FS1, FuncSet FS2); 120 | FuncSet FSIntersect(FuncSet FS1, FuncSet FS2); 121 | StrSet TypeSetMerge(StrSet S1, StrSet S2); 122 | StrPairSet FindSubTypes(std::string MLTypeName); 123 | void PrintResults(CallInst *CI, FuncSet FS, std::string MLTypeName); 124 | list MLTypeName2List(std::string MLTypeName); 125 | void PrintMaps(); 126 | StrSet UpgradeTypeRelationshipMap(std::string key, bool first); 127 | bool LayerMatch(std::string s1, std::string s2); 128 | bool MLTypeMatch(list CList, list GList); 129 | bool IsValidType(std::string ft); 130 | StrSet AddFuzzyTypeAndCopySet(StrSet CumuSet); 131 | StrSet LookupTypeRecordMap(std::string t, int map); 132 | StrSet UpdateCumuTySet(StrSet CumuTySet, int map, std::string LayerType); 133 | StrSet CoverAll(StrSet CumuTySet, int map); 134 | void printSet(StrSet Set); 135 | 136 | public: 137 | CallGraphPass(GlobalContext *Ctx_) 138 | : IterativeModulePass(Ctx_, "CallGraph") { } 139 | 140 | virtual bool CollectInformation(Module *M); 141 | //virtual bool doFinalization(llvm::Module *); 142 | virtual bool IdentifyTargets(llvm::Module *M); 143 | 144 | }; 145 | 146 | #endif 147 | -------------------------------------------------------------------------------- /src/lib/Common.cc: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include "Common.h" 8 | //#include "Config.h" 9 | 10 | #define LINUX_SOURCE_DIR1 "/home/kjlu/projects/kernel-analysis/compile-kernel/code/srcs/linux-stable-4.20.0" 11 | #define LINUX_SOURCE_ANDROID "/home/kjlu/projects/kernels/compile-linux/code/srcs/linux-android-4.9" 12 | #define LINUX_SOURCE_FREEBSD "/home/kjlu/projects/kernel-analysis/compile-kernel/code/srcs/freebsd-12" 13 | #define LINUX_SOURCE_DIR_4_19_1 "/home/pakki001/research/linux" 14 | #define LINUX_SOURCE_DIR_4_19_2 "/home/aditya/linux-src" 15 | #define LINUX_SOURCE_DIR_4_20_0 "/home/qiushi/Desktop/Sec-Check/kernel-analysis/compile-kernel/code/srcs/linux-stable-4.20.0" 16 | 17 | 18 | string getFileName(DILocation *Loc, DISubprogram *SP) { 19 | string FN; 20 | if (Loc) 21 | FN= Loc->getFilename().str(); 22 | else if (SP) 23 | FN= SP->getFilename().str(); 24 | else 25 | return ""; 26 | 27 | char *user = getlogin(); 28 | const char *filename = FN.c_str(); 29 | filename = strchr(filename, '/') + 1; 30 | filename = strchr(filename, '/'); 31 | int idx = filename - FN.c_str(); 32 | if (strstr(user, "kjlu")) { 33 | //if (FN.find("linux-stable") != std::string::npos) 34 | //FN.replace(0, idx, LINUX_SOURCE_FREEBSD); 35 | FN.replace(0, idx, LINUX_SOURCE_DIR1); 36 | //else 37 | // FN.replace(0, 54, LINUX_SOURCE_DIR1); 38 | } else if (strstr(user, "pakki001")) { 39 | FN.replace(0, idx, LINUX_SOURCE_DIR_4_19_1); 40 | } else if (strstr(user, "aditya")) { 41 | FN.replace(0, idx, LINUX_SOURCE_DIR_4_19_2); 42 | } else if (strstr(user, "qiushi")) { 43 | FN.replace(0, idx, LINUX_SOURCE_DIR_4_20_0); 44 | } else { 45 | OP << "== Warning: please specify the path of linux source."; 46 | } 47 | return FN; 48 | } 49 | 50 | /// Check if the value is a constant. 51 | bool isConstant(Value *V) { 52 | // Invalid input. 53 | if (!V) 54 | return false; 55 | 56 | // The value is a constant. 57 | Constant *Ct = dyn_cast(V); 58 | if (Ct) 59 | return true; 60 | 61 | return false; 62 | } 63 | 64 | /// Get the source code line 65 | string getSourceLine(string fn_str, unsigned lineno) { 66 | std::ifstream sourcefile(fn_str); 67 | string line; 68 | sourcefile.seekg(ios::beg); 69 | 70 | for(int n = 0; n < lineno - 1; ++n){ 71 | sourcefile.ignore(std::numeric_limits::max(), '\n'); 72 | } 73 | getline(sourcefile, line); 74 | 75 | return line; 76 | } 77 | 78 | string getSourceFuncName(Instruction *I) { 79 | 80 | DILocation *Loc = getSourceLocation(I); 81 | if (!Loc) 82 | return ""; 83 | unsigned lineno = Loc->getLine(); 84 | std::string fn_str = getFileName(Loc); 85 | string line = getSourceLine(fn_str, lineno); 86 | 87 | while(line[0] == ' ' || line[0] == '\t') 88 | line.erase(line.begin()); 89 | line = line.substr(0, line.find('(')); 90 | return line; 91 | } 92 | 93 | string extractMacro(string line, Instruction *I) { 94 | string macro, word; 95 | std::regex caps("[^\\(][_A-Z][_A-Z0-9]+[\\);,]+"); 96 | 97 | if (CallInst *CI = dyn_cast(I)) { 98 | caps = "[-]?[_A-Z][_A-Z0-9]+[\\);,]+"; 99 | 100 | } else { 101 | std::size_t lhs = -1; 102 | stringstream iss(line.substr(lhs+1)); 103 | smatch match; 104 | 105 | while (iss >> word) { 106 | if (regex_search(word, match, caps)) { 107 | macro = word; 108 | return macro; 109 | } 110 | } 111 | } 112 | 113 | return macro; 114 | } 115 | 116 | /* 117 | /// Get called function name of V. 118 | StringRef getCalledFuncName(Instruction *I) { 119 | 120 | Value *V; 121 | if (CallInst *CI = dyn_cast(I)) 122 | V = CI->getCalledOperand(); 123 | else if (InvokeInst *II = dyn_cast(I)) 124 | V = II->getCalledValue(); 125 | assert(V); 126 | 127 | InlineAsm *IA = dyn_cast(V); 128 | if (IA) 129 | return StringRef(IA->getAsmString()); 130 | 131 | User *UV = dyn_cast(V); 132 | if (UV) { 133 | if (UV->getNumOperands() > 0) { 134 | Value *VUV = UV->getOperand(0); 135 | return VUV->getName(); 136 | } 137 | } 138 | 139 | return V->getName(); 140 | }*/ 141 | 142 | DILocation *getSourceLocation(Instruction *I) { 143 | if (!I) 144 | return NULL; 145 | 146 | MDNode *N = I->getMetadata("dbg"); 147 | if (!N) 148 | return NULL; 149 | 150 | DILocation *Loc = dyn_cast(N); 151 | if (!Loc || Loc->getLine() < 1) 152 | return NULL; 153 | 154 | return Loc; 155 | } 156 | 157 | /// Print out source code information to facilitate manual analyses. 158 | void printSourceCodeInfo(Value *V) { 159 | Instruction *I = dyn_cast(V); 160 | if (!I) 161 | return; 162 | 163 | DILocation *Loc = getSourceLocation(I); 164 | if (!Loc) 165 | return; 166 | 167 | unsigned LineNo = Loc->getLine(); 168 | std::string FN = getFileName(Loc); 169 | string line = getSourceLine(FN, LineNo); 170 | FN = Loc->getFilename().str(); 171 | const char *filename = FN.c_str(); 172 | filename = strchr(filename, '/') + 1; 173 | filename = strchr(filename, '/') + 1; 174 | int idx = filename - FN.c_str(); 175 | 176 | while(line[0] == ' ' || line[0] == '\t') 177 | line.erase(line.begin()); 178 | OP << " [" 179 | << "\033[34m" << "Code" << "\033[0m" << "] " 180 | << FN.replace(0, idx, "") 181 | << " +" << LineNo << ": " 182 | << "\033[35m" << line << "\033[0m" <<'\n'; 183 | } 184 | 185 | void printSourceCodeInfo(Function *F) { 186 | 187 | DISubprogram *SP = F->getSubprogram(); 188 | 189 | if (SP) { 190 | std::string FN = getFileName(NULL, SP); 191 | string line = getSourceLine(FN, SP->getLine()); 192 | while(line[0] == ' ' || line[0] == '\t') 193 | line.erase(line.begin()); 194 | 195 | OP << " [" 196 | << "\033[34m" << "Code" << "\033[0m" << "] " 197 | << SP->getFilename() 198 | << " +" << SP->getLine() << ": " 199 | << "\033[35m" << line << "\033[0m" <<'\n'; 200 | } 201 | } 202 | 203 | string getMacroInfo(Value *V) { 204 | 205 | Instruction *I = dyn_cast(V); 206 | if (!I) return ""; 207 | 208 | DILocation *Loc = getSourceLocation(I); 209 | if (!Loc) return ""; 210 | 211 | unsigned LineNo = Loc->getLine(); 212 | std::string FN = getFileName(Loc); 213 | string line = getSourceLine(FN, LineNo); 214 | FN = Loc->getFilename().str(); 215 | const char *filename = FN.c_str(); 216 | filename = strchr(filename, '/') + 1; 217 | filename = strchr(filename, '/') + 1; 218 | int idx = filename - FN.c_str(); 219 | 220 | while(line[0] == ' ' || line[0] == '\t') 221 | line.erase(line.begin()); 222 | 223 | string macro = extractMacro(line, I); 224 | 225 | // clean up the ending 226 | unsigned length = 0; 227 | for (auto it = macro.begin(), e = macro.end(); it != e; ++it) 228 | if (*it == ')' || *it == ';' || *it == ',') { 229 | macro = macro.substr(0, length); 230 | break; 231 | } else { 232 | ++length; 233 | } 234 | 235 | return macro; 236 | } 237 | 238 | 239 | /// Get source code information of this value 240 | void getSourceCodeInfo(Value *V, string &file, 241 | unsigned &line) { 242 | file = ""; 243 | line = 0; 244 | 245 | auto I = dyn_cast(V); 246 | if (!I) 247 | return; 248 | 249 | MDNode *N = I->getMetadata("dbg"); 250 | if (!N) 251 | return; 252 | 253 | DILocation *Loc = dyn_cast(N); 254 | if (!Loc || Loc->getLine() < 1) 255 | return; 256 | 257 | file = Loc->getFilename().str(); 258 | line = Loc->getLine(); 259 | } 260 | 261 | Argument *getArgByNo(Function *F, int8_t ArgNo) { 262 | 263 | if (ArgNo >= F->arg_size()) 264 | return NULL; 265 | 266 | int8_t idx = 0; 267 | Function::arg_iterator ai = F->arg_begin(); 268 | while (idx != ArgNo) { 269 | ++ai; 270 | ++idx; 271 | } 272 | return ai; 273 | } 274 | 275 | //#define HASH_SOURCE_INFO 276 | size_t funcHash(Function *F, bool withName) { 277 | 278 | hash str_hash; 279 | string output; 280 | 281 | #ifdef HASH_SOURCE_INFO 282 | DISubprogram *SP = F->getSubprogram(); 283 | 284 | if (SP) { 285 | output = SP->getFilename(); 286 | output = output + to_string(uint_hash(SP->getLine())); 287 | } 288 | else { 289 | #endif 290 | string sig; 291 | raw_string_ostream rso(sig); 292 | Type *FTy = F->getFunctionType(); 293 | FTy->print(rso); 294 | output = rso.str(); 295 | 296 | if (withName) 297 | output += F->getName(); 298 | #ifdef HASH_SOURCE_INFO 299 | } 300 | #endif 301 | string::iterator end_pos = remove(output.begin(), 302 | output.end(), ' '); 303 | output.erase(end_pos, output.end()); 304 | 305 | return str_hash(output); 306 | } 307 | 308 | /* 309 | size_t callHash(CallInst *CI) { 310 | 311 | CallSite CS(CI); 312 | Function *CF = CI->getCalledFunction(); 313 | 314 | if (CF) 315 | return funcHash(CF); 316 | else { 317 | hash str_hash; 318 | string sig; 319 | raw_string_ostream rso(sig); 320 | Type *FTy = CS.getFunctionType(); 321 | FTy->print(rso); 322 | 323 | string strip_str = rso.str(); 324 | string::iterator end_pos = remove(strip_str.begin(), 325 | strip_str.end(), ' '); 326 | strip_str.erase(end_pos, strip_str.end()); 327 | return str_hash(strip_str); 328 | } 329 | }*/ 330 | 331 | size_t valueHash(Value *v) { 332 | hash str_hash; 333 | string str; 334 | raw_string_ostream rso(str); 335 | v->print(rso); 336 | string vstr = rso.str(); 337 | return str_hash(vstr); 338 | } 339 | 340 | size_t typeHash(Type *Ty) { 341 | hash str_hash; 342 | string sig; 343 | 344 | raw_string_ostream rso(sig); 345 | Ty->print(rso); 346 | string ty_str = rso.str(); 347 | string::iterator end_pos = remove(ty_str.begin(), ty_str.end(), ' '); 348 | ty_str.erase(end_pos, ty_str.end()); 349 | 350 | return str_hash(ty_str); 351 | } 352 | 353 | size_t stringHash(std::string str){ 354 | std::hash str_hash; 355 | return str_hash(str); 356 | } 357 | 358 | size_t hashIdxHash(size_t Hs, int Idx) { 359 | hash str_hash; 360 | return Hs + str_hash(to_string(Idx)); 361 | } 362 | 363 | size_t typeIdxHash(Type *Ty, int Idx) { 364 | return hashIdxHash(typeHash(Ty), Idx); 365 | } 366 | 367 | -------------------------------------------------------------------------------- /src/lib/Common.h: -------------------------------------------------------------------------------- 1 | #ifndef _COMMON_H_ 2 | #define _COMMON_H_ 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | #include 10 | 11 | #include 12 | #include 13 | #include 14 | #include 15 | 16 | using namespace llvm; 17 | using namespace std; 18 | 19 | #define LOG(lv, stmt) \ 20 | do { \ 21 | if (VerboseLevel >= lv) \ 22 | errs() << stmt; \ 23 | } while(0) 24 | 25 | 26 | #define OP llvm::errs() 27 | 28 | #define WARN(stmt) LOG(1, "\n[WARN] " << stmt); 29 | 30 | #define ERR(stmt) \ 31 | do { \ 32 | errs() << "ERROR (" << __FUNCTION__ << "@" << __LINE__ << ")"; \ 33 | errs() << ": " << stmt; \ 34 | exit(-1); \ 35 | } while(0) 36 | 37 | /// Different colors for output 38 | #define KNRM "\x1B[0m" /* Normal */ 39 | #define KRED "\x1B[31m" /* Red */ 40 | #define KGRN "\x1B[32m" /* Green */ 41 | #define KYEL "\x1B[33m" /* Yellow */ 42 | #define KBLU "\x1B[34m" /* Blue */ 43 | #define KMAG "\x1B[35m" /* Magenta */ 44 | #define KCYN "\x1B[36m" /* Cyan */ 45 | #define KWHT "\x1B[37m" /* White */ 46 | 47 | extern cl::opt VerboseLevel; 48 | 49 | // 50 | // Common functions 51 | // 52 | 53 | string getFileName(DILocation *Loc, 54 | DISubprogram *SP=NULL); 55 | 56 | bool isConstant(Value *V); 57 | 58 | string getSourceLine(string fn_str, unsigned lineno); 59 | 60 | string getSourceFuncName(Instruction *I); 61 | 62 | StringRef getCalledFuncName(Instruction *I); 63 | 64 | string extractMacro(string, Instruction* I); 65 | 66 | DILocation *getSourceLocation(Instruction *I); 67 | 68 | void printSourceCodeInfo(Value *V); 69 | void printSourceCodeInfo(Function *F); 70 | string getMacroInfo(Value *V); 71 | void getSourceCodeInfo(Value *V, string &file, 72 | unsigned &line); 73 | 74 | Argument *getArgByNo(Function *F, int8_t ArgNo); 75 | 76 | size_t funcHash(Function *F, bool withName = true); 77 | size_t callHash(CallInst *CI); 78 | size_t valueHash(Value *v); 79 | size_t typeHash(Type *Ty); 80 | size_t stringHash(std::string str); 81 | size_t typeIdxHash(Type *Ty, int Idx = -1); 82 | size_t hashIdxHash(size_t Hs, int Idx = -1); 83 | 84 | // 85 | // Common data structures 86 | // 87 | class ModuleOracle { 88 | public: 89 | ModuleOracle(Module &m) : 90 | dl(m.getDataLayout()), 91 | tli(TargetLibraryInfoImpl(Triple(Twine(m.getTargetTriple())))) 92 | {} 93 | 94 | ~ModuleOracle() {} 95 | 96 | // Getter 97 | const DataLayout &getDataLayout() { 98 | return dl; 99 | } 100 | 101 | TargetLibraryInfo &getTargetLibraryInfo() { 102 | return tli; 103 | } 104 | 105 | // Data layout 106 | uint64_t getBits() { 107 | return Bits; 108 | } 109 | 110 | uint64_t getPointerWidth() { 111 | return dl.getPointerSizeInBits(); 112 | } 113 | 114 | uint64_t getPointerSize() { 115 | return dl.getPointerSize(); 116 | } 117 | 118 | uint64_t getTypeSize(Type *ty) { 119 | return dl.getTypeAllocSize(ty); 120 | } 121 | 122 | uint64_t getTypeWidth(Type *ty) { 123 | return dl.getTypeSizeInBits(ty); 124 | } 125 | 126 | uint64_t getTypeOffset(Type *type, unsigned idx) { 127 | assert(isa(type)); 128 | return dl.getStructLayout(cast(type)) 129 | ->getElementOffset(idx); 130 | } 131 | 132 | bool isReintPointerType(Type *ty) { 133 | return (ty->isPointerTy() || 134 | (ty->isIntegerTy() && 135 | ty->getIntegerBitWidth() == getPointerWidth())); 136 | } 137 | 138 | protected: 139 | // Info provide 140 | const DataLayout &dl; 141 | TargetLibraryInfo tli; 142 | 143 | // Consts 144 | const uint64_t Bits = 8; 145 | }; 146 | 147 | class Helper { 148 | public: 149 | // LLVM value 150 | static string getValueName(Value *v) { 151 | if (!v->hasName()) { 152 | return to_string(reinterpret_cast(v)); 153 | } else { 154 | return v->getName().str(); 155 | } 156 | } 157 | 158 | static string getValueType(Value *v) { 159 | if (Instruction *inst = dyn_cast(v)) { 160 | return string(inst->getOpcodeName()); 161 | } else { 162 | return string("value " + to_string(v->getValueID())); 163 | } 164 | } 165 | 166 | static string getValueRepr(Value *v) { 167 | string str; 168 | raw_string_ostream stm(str); 169 | 170 | v->print(stm); 171 | stm.flush(); 172 | 173 | return str; 174 | } 175 | 176 | // Z3 expr 177 | static string getExprType(Z3_context ctxt, Z3_ast ast) { 178 | return string(Z3_sort_to_string(ctxt, Z3_get_sort(ctxt, ast))); 179 | } 180 | 181 | static string getExprRepr(Z3_context ctxt, Z3_ast ast) { 182 | return string(Z3_ast_to_string(ctxt, ast)); 183 | } 184 | 185 | // String conversion 186 | static void convertDotInName(string &name) { 187 | replace(name.begin(), name.end(), '.', '_'); 188 | } 189 | }; 190 | 191 | class Dumper { 192 | public: 193 | Dumper() {} 194 | ~Dumper() {} 195 | 196 | // LLVM value 197 | void valueName(Value *val) { 198 | errs() << Helper::getValueName(val) << "\n"; 199 | } 200 | 201 | void typedValue(Value *val) { 202 | errs() << "[" << Helper::getValueType(val) << "]" 203 | << Helper::getValueRepr(val) 204 | << "\n"; 205 | } 206 | 207 | // Z3 expr 208 | void typedExpr(Z3_context ctxt, Z3_ast ast) { 209 | errs() << "[" << Helper::getExprType(ctxt, ast) << "]" 210 | << Helper::getExprRepr(ctxt, ast) 211 | << "\n"; 212 | } 213 | }; 214 | 215 | extern Dumper DUMP; 216 | 217 | #endif 218 | --------------------------------------------------------------------------------