├── README.md ├── modules ├── email_blacklist.txt ├── email_regexblacklist.txt ├── exe_blacklist.txt ├── exe_regexblacklist.txt ├── jshtml_blacklist.txt ├── jshtml_regexblacklist.txt ├── office_blacklist.txt ├── office_regexblacklist.txt ├── pdf_blacklist.txt ├── pdf_regexblacklist.txt ├── pefile.py ├── unknown_blacklist.txt └── unknown_regexblacklist.txt └── yaraGenerator.py /README.md: -------------------------------------------------------------------------------- 1 | ### Information 2 | This is a project to build a tool to attempt to allow for quick, simple, and effective yara rule creation to isolate malware families and other malicious objects of interest. This is an experiment and thus far I've had pretty good success with it. It is a work in progress and I welcome forks and feedback! 3 | 4 | To utilize this you must find a few files from a malware family, or if not executables then containing the attribute of interest, you wish to profile, (the more the better, three to four samples seems to be effective for malware, however to isolate exploits in carrier documents it often takes many more). Please note however that this tool will only be as precise as you are in choosing what you are looking for... visit http://yaragenerator.com for a webapplication version of this tool. 5 | 6 | ### Version and Updates 7 | 0.6.1 - Added logic for parsing and prioritizing strings/emails/headers from emails (must submit in .eml file in order for python library to parse it properly). Added per filetype string prioritization logic (IE include all email addresses and IP's common across emails before random words from email bodys). Due to targeted parsing, effective signatures can be built from a single email. Also boolean logic for email rules is "all of them" to allow for future variance in delivery methods. 8 | 9 | 0.6 - Refactored all of the code to allow for selectable filetype of samples (-f). This allows for entirely different signature generation for PDFs vs EXEs vs EMails. In addition to dispirate execution paths, each filetype has it's own string blacklist and regexlist to exclude unwanted things such as your gateway, usernames, @yourco.com etc. (Note: No custom per file code exists for anything beyond executables at this point, but the framework is now there) 10 | 11 | 0.5 - Added Regexes in modules/regexblacklist.txt which will remove matches from potential strings included in yara rules also added 30K strings to blacklist. Lowered hit requirment from 100 to 95% to allow more true positives from slight variants (example change of embeded C2 or UA) 12 | 13 | 0.4 - Added PEfile (http://code.google.com/p/pefile/) to extract and remove imports and functions from yara rules, added blacklist.txt to remove unwanted strings 14 | 15 | 0.3 - Added support for Tags, Unicode Wide Strings (Automatically Adds "wide" tag) 16 | 17 | 0.2 - Updated CLI and error handeling, removed hidden files, and ignored subdirectories 18 | 19 | 0.1 - Released, supports regular string extraction 20 | 21 | ### ToDo 22 | [+] Allow for scanning of benign/baseline files to automatically populate blacklists for various filetypes 23 | 24 | [+] Create custom execution paths leveraging opensource tools for various filetypes (IE email/pdf/office docs ..etc) 25 | 26 | [+] Continue to improve fidelity and flexibility of algos and underlying methodologies to generate signatures 27 | 28 | 29 | ### Example 30 | 31 | Usage is as follows with an example of a basic search + hitting all of 32 | the switches below: 33 | ``` 34 | 35 | usage: yaraGenerator.py [-h] -r RULENAME -f FILETYPE [-a AUTHOR] [-d DESCRIPTION] [-t TAGS] InputDirectory 36 | 37 | YaraGenerator 38 | 39 | positional arguments: 40 | InputDirectory Path To Files To Create Yara Rule From 41 | 42 | optional arguments: 43 | -h , --help show this help message and exit 44 | -r , --RuleName Enter A Rule/Alert Name (No Spaces + Must Start with Letter) 45 | -a , --Author Enter Author Name 46 | -d , --Description Provide a useful description of the Yara Rule 47 | -t , --Tags Apply Tags to Yara Rule For Easy Reference (AlphaNumeric) 48 | -v , --Verbose Print Finished Rule To Standard Out 49 | -f , --FileType Select Sample Set FileType choices are: unknown, exe, 50 | pdf, email, office, js-html 51 | ``` 52 | 53 | The blacklist.txt file in the /modules directory allows entry of one string per line, these strings will never appear in a rule generated by YaraGenerator. 54 | 55 | The regexblacklist.txt in the /modules directory allows entry of one Regular Expression per line. * Remember to use ^ and $ for the begining and end of a string if you wish to match exactly* YaraGenerator will disqualify any string which hits on any regex in the list from input into a Yara Rule. 56 | 57 | Example for a Specific Family of APT1 Malware: 58 | 59 | ``` 60 | python yaraGenerator.py ../greencat/ -r Win_Trojan_APT1_GreenCat -a "Chris Clark" -d "APT Trojan Comment Panda" -t "APT" -f "exe" 61 | 62 | [+] Generating Yara Rule Win_Trojan_APT1_GreenCat from files located in: ../greencat/ 63 | 64 | [+] Yara Rule Generated: Win_Trojan_APT1_GreenCat.yar 65 | 66 | [+] Files Examined: ['871cc547feb9dbec0285321068e392b8', '6570163cd34454b3d1476c134d44b9d9', '57e79f7df13c0cb01910d0c688fcd296'] 67 | [+] Author Credited: Chris Clark 68 | [+] Rule Description: APT Trojan Comment Panda 69 | [+] Rule Tags: APT 70 | 71 | [+] YaraGenerator (C) 2013 Chris@xenosec.org https://github.com/Xen0ph0n/YaraGenerator 72 | ``` 73 | Resulting Yara Rules: 74 | ``` 75 | rule Win_Trojan_APT_APT1_Greencat : APT 76 | { 77 | meta: 78 | author = "Chris Clark" 79 | date = "2013-06-04" 80 | description = "APT Trojan Comment Crew Greencat" 81 | hash0 = "57e79f7df13c0cb01910d0c688fcd296" 82 | hash1 = "871cc547feb9dbec0285321068e392b8" 83 | hash2 = "6570163cd34454b3d1476c134d44b9d9" 84 | sample_filetype = "exe" 85 | yaragenerator = "https://github.com/Xen0ph0n/YaraGenerator" 86 | strings: 87 | $string0 = "Ramdisk" 88 | $string1 = "Cache-Control:max-age" 89 | $string2 = "YYSSSSS" 90 | $string3 = "\\cmd.exe" 91 | $string4 = "Translation" wide 92 | $string5 = "CD-ROM" 93 | $string6 = "Mozilla/5.0" 94 | $string7 = "Volume on this computer:" 95 | $string8 = "pidrun" 96 | $string9 = "3@YAXPAX@Z" 97 | $string10 = "SMAgent.exe" wide 98 | $string11 = "Shell started successfully" 99 | $string12 = "Content-Length: %d" 100 | $string13 = "t4j SV3" 101 | $string14 = "Program started" 102 | $string15 = "Started already," 103 | $string16 = "SoundMAX service agent" wide 104 | condition: 105 | 16 of them 106 | } 107 | 108 | 109 | ``` 110 | ### Results 111 | 112 | GreenCat Rule: 113 | 114 | ``` 115 | 100% Hits on Test Samples: 116 | 117 | $ yara -rg Trojan_Win_GreenCat.yar greencat/ 118 | Trojan_Win_GreenCat [APT] ../greencat//8bf5a9e8d5bc1f44133c3f118fe8ca1701d9665a72b3893f509367905feb0a00 119 | Trojan_Win_GreenCat [APT] ../greencat//c196cac319e5c55e8169b6ed6930a10359b3db322abe8f00ed8cb83cf0888d3b 120 | Trojan_Win_GreenCat [APT] ../greencat//c23039cf2f859e659e59ec362277321fbcdac680e6d9bc93fc03c8971333c25e 121 | 122 | 100% True Positives On Other Samples In the APT1 Cadre which were detected as Green Cat By Other Yara Rules: 123 | 124 | $ yara -r Trojan_Win_GreenCat.yar .Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//1877a5d2f9c415109a8ac323f43be1dc10c546a72ab7207a96c6e6e71a132956 125 | Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//20ed6218575155517f19d4ce46a9addbf49dcadb8f5d7bd93efdccfe1925c7d0 126 | Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//4144820d9b31c4d3c54025a4368b32f727077c3ec253753360349a783846747f 127 | Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//4487b345f63d20c6b91eec8ee86c307911b1f2c3e29f337aa96a4a238bf2e87c 128 | Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//8bf5a9e8d5bc1f44133c3f118fe8ca1701d9665a72b3893f509367905feb0a00 129 | Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//c196cac319e5c55e8169b6ed6930a10359b3db322abe8f00ed8cb83cf0888d3b 130 | Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//c23039cf2f859e659e59ec362277321fbcdac680e6d9bc93fc03c8971333c25e 131 | Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//f76dd93b10fc173eaf901ff1fb00ff8a9e1f31e3bd86e00ff773b244b54292c5 132 | 133 | 100% True Negatives on clean files: 134 | 135 | $ yara -r Trojan_Win_GreenCat.yar ../../CleanFiles/ 136 | 137 | ``` 138 | 139 | ### Author & License 140 | 141 | YaraGenerator is copyrighted by Chris Clark 2013. Contact me at Chris@xenosys.org 142 | 143 | YaraGenerator is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. 144 | YaraGenerator is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. 145 | 146 | You should have received a copy of the GNU General Public License along with YaraGenerator. If not, see http://www.gnu.org/licenses/. 147 | 148 | -------------------------------------------------------------------------------- /modules/email_blacklist.txt: -------------------------------------------------------------------------------- 1 | undisclosed-recipients:; -------------------------------------------------------------------------------- /modules/email_regexblacklist.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/email_regexblacklist.txt -------------------------------------------------------------------------------- /modules/exe_regexblacklist.txt: -------------------------------------------------------------------------------- 1 | ^thisisaplaceholder$ -------------------------------------------------------------------------------- /modules/jshtml_blacklist.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/jshtml_blacklist.txt -------------------------------------------------------------------------------- /modules/jshtml_regexblacklist.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/jshtml_regexblacklist.txt -------------------------------------------------------------------------------- /modules/office_blacklist.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/office_blacklist.txt -------------------------------------------------------------------------------- /modules/office_regexblacklist.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/office_regexblacklist.txt -------------------------------------------------------------------------------- /modules/pdf_blacklist.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/pdf_blacklist.txt -------------------------------------------------------------------------------- /modules/pdf_regexblacklist.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/pdf_regexblacklist.txt -------------------------------------------------------------------------------- /modules/pefile.py: -------------------------------------------------------------------------------- 1 | # -*- coding: Latin-1 -*- 2 | """pefile, Portable Executable reader module 3 | 4 | 5 | All the PE file basic structures are available with their default names 6 | as attributes of the instance returned. 7 | 8 | Processed elements such as the import table are made available with lowercase 9 | names, to differentiate them from the upper case basic structure names. 10 | 11 | pefile has been tested against the limits of valid PE headers, that is, malware. 12 | Lots of packed malware attempt to abuse the format way beyond its standard use. 13 | To the best of my knowledge most of the abuses are handled gracefully. 14 | 15 | Copyright (c) 2005-2011 Ero Carrera 16 | 17 | All rights reserved. 18 | 19 | For detailed copyright information see the file COPYING in 20 | the root of the distribution archive. 21 | """ 22 | 23 | __revision__ = "$LastChangedRevision: 114 $" 24 | __author__ = 'Ero Carrera' 25 | __version__ = '1.2.10-%d' % int( __revision__[21:-2] ) 26 | __contact__ = 'ero.carrera@gmail.com' 27 | 28 | 29 | import os 30 | import struct 31 | import time 32 | import math 33 | import re 34 | import exceptions 35 | import string 36 | import array 37 | import mmap 38 | 39 | sha1, sha256, sha512, md5 = None, None, None, None 40 | 41 | try: 42 | import hashlib 43 | sha1 = hashlib.sha1 44 | sha256 = hashlib.sha256 45 | sha512 = hashlib.sha512 46 | md5 = hashlib.md5 47 | except ImportError: 48 | try: 49 | import sha 50 | sha1 = sha.new 51 | except ImportError: 52 | pass 53 | try: 54 | import md5 55 | md5 = md5.new 56 | except ImportError: 57 | pass 58 | 59 | try: 60 | enumerate 61 | except NameError: 62 | def enumerate(iter): 63 | L = list(iter) 64 | return zip(range(0, len(L)), L) 65 | 66 | 67 | fast_load = False 68 | 69 | # This will set a maximum length of a string to be retrieved from the file. 70 | # It's there to prevent loading massive amounts of data from memory mapped 71 | # files. Strings longer than 1MB should be rather rare. 72 | MAX_STRING_LENGTH = 0x100000 # 2^20 73 | 74 | IMAGE_DOS_SIGNATURE = 0x5A4D 75 | IMAGE_DOSZM_SIGNATURE = 0x4D5A 76 | IMAGE_NE_SIGNATURE = 0x454E 77 | IMAGE_LE_SIGNATURE = 0x454C 78 | IMAGE_LX_SIGNATURE = 0x584C 79 | 80 | IMAGE_NT_SIGNATURE = 0x00004550 81 | IMAGE_NUMBEROF_DIRECTORY_ENTRIES= 16 82 | IMAGE_ORDINAL_FLAG = 0x80000000L 83 | IMAGE_ORDINAL_FLAG64 = 0x8000000000000000L 84 | OPTIONAL_HEADER_MAGIC_PE = 0x10b 85 | OPTIONAL_HEADER_MAGIC_PE_PLUS = 0x20b 86 | 87 | 88 | directory_entry_types = [ 89 | ('IMAGE_DIRECTORY_ENTRY_EXPORT', 0), 90 | ('IMAGE_DIRECTORY_ENTRY_IMPORT', 1), 91 | ('IMAGE_DIRECTORY_ENTRY_RESOURCE', 2), 92 | ('IMAGE_DIRECTORY_ENTRY_EXCEPTION', 3), 93 | ('IMAGE_DIRECTORY_ENTRY_SECURITY', 4), 94 | ('IMAGE_DIRECTORY_ENTRY_BASERELOC', 5), 95 | ('IMAGE_DIRECTORY_ENTRY_DEBUG', 6), 96 | ('IMAGE_DIRECTORY_ENTRY_COPYRIGHT', 7), 97 | ('IMAGE_DIRECTORY_ENTRY_GLOBALPTR', 8), 98 | ('IMAGE_DIRECTORY_ENTRY_TLS', 9), 99 | ('IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG', 10), 100 | ('IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT', 11), 101 | ('IMAGE_DIRECTORY_ENTRY_IAT', 12), 102 | ('IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT', 13), 103 | ('IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR',14), 104 | ('IMAGE_DIRECTORY_ENTRY_RESERVED', 15) ] 105 | 106 | DIRECTORY_ENTRY = dict([(e[1], e[0]) for e in directory_entry_types]+directory_entry_types) 107 | 108 | 109 | image_characteristics = [ 110 | ('IMAGE_FILE_RELOCS_STRIPPED', 0x0001), 111 | ('IMAGE_FILE_EXECUTABLE_IMAGE', 0x0002), 112 | ('IMAGE_FILE_LINE_NUMS_STRIPPED', 0x0004), 113 | ('IMAGE_FILE_LOCAL_SYMS_STRIPPED', 0x0008), 114 | ('IMAGE_FILE_AGGRESIVE_WS_TRIM', 0x0010), 115 | ('IMAGE_FILE_LARGE_ADDRESS_AWARE', 0x0020), 116 | ('IMAGE_FILE_16BIT_MACHINE', 0x0040), 117 | ('IMAGE_FILE_BYTES_REVERSED_LO', 0x0080), 118 | ('IMAGE_FILE_32BIT_MACHINE', 0x0100), 119 | ('IMAGE_FILE_DEBUG_STRIPPED', 0x0200), 120 | ('IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP', 0x0400), 121 | ('IMAGE_FILE_NET_RUN_FROM_SWAP', 0x0800), 122 | ('IMAGE_FILE_SYSTEM', 0x1000), 123 | ('IMAGE_FILE_DLL', 0x2000), 124 | ('IMAGE_FILE_UP_SYSTEM_ONLY', 0x4000), 125 | ('IMAGE_FILE_BYTES_REVERSED_HI', 0x8000) ] 126 | 127 | IMAGE_CHARACTERISTICS = dict([(e[1], e[0]) for e in 128 | image_characteristics]+image_characteristics) 129 | 130 | 131 | section_characteristics = [ 132 | ('IMAGE_SCN_CNT_CODE', 0x00000020), 133 | ('IMAGE_SCN_CNT_INITIALIZED_DATA', 0x00000040), 134 | ('IMAGE_SCN_CNT_UNINITIALIZED_DATA', 0x00000080), 135 | ('IMAGE_SCN_LNK_OTHER', 0x00000100), 136 | ('IMAGE_SCN_LNK_INFO', 0x00000200), 137 | ('IMAGE_SCN_LNK_REMOVE', 0x00000800), 138 | ('IMAGE_SCN_LNK_COMDAT', 0x00001000), 139 | ('IMAGE_SCN_MEM_FARDATA', 0x00008000), 140 | ('IMAGE_SCN_MEM_PURGEABLE', 0x00020000), 141 | ('IMAGE_SCN_MEM_16BIT', 0x00020000), 142 | ('IMAGE_SCN_MEM_LOCKED', 0x00040000), 143 | ('IMAGE_SCN_MEM_PRELOAD', 0x00080000), 144 | ('IMAGE_SCN_ALIGN_1BYTES', 0x00100000), 145 | ('IMAGE_SCN_ALIGN_2BYTES', 0x00200000), 146 | ('IMAGE_SCN_ALIGN_4BYTES', 0x00300000), 147 | ('IMAGE_SCN_ALIGN_8BYTES', 0x00400000), 148 | ('IMAGE_SCN_ALIGN_16BYTES', 0x00500000), 149 | ('IMAGE_SCN_ALIGN_32BYTES', 0x00600000), 150 | ('IMAGE_SCN_ALIGN_64BYTES', 0x00700000), 151 | ('IMAGE_SCN_ALIGN_128BYTES', 0x00800000), 152 | ('IMAGE_SCN_ALIGN_256BYTES', 0x00900000), 153 | ('IMAGE_SCN_ALIGN_512BYTES', 0x00A00000), 154 | ('IMAGE_SCN_ALIGN_1024BYTES', 0x00B00000), 155 | ('IMAGE_SCN_ALIGN_2048BYTES', 0x00C00000), 156 | ('IMAGE_SCN_ALIGN_4096BYTES', 0x00D00000), 157 | ('IMAGE_SCN_ALIGN_8192BYTES', 0x00E00000), 158 | ('IMAGE_SCN_ALIGN_MASK', 0x00F00000), 159 | ('IMAGE_SCN_LNK_NRELOC_OVFL', 0x01000000), 160 | ('IMAGE_SCN_MEM_DISCARDABLE', 0x02000000), 161 | ('IMAGE_SCN_MEM_NOT_CACHED', 0x04000000), 162 | ('IMAGE_SCN_MEM_NOT_PAGED', 0x08000000), 163 | ('IMAGE_SCN_MEM_SHARED', 0x10000000), 164 | ('IMAGE_SCN_MEM_EXECUTE', 0x20000000), 165 | ('IMAGE_SCN_MEM_READ', 0x40000000), 166 | ('IMAGE_SCN_MEM_WRITE', 0x80000000L) ] 167 | 168 | SECTION_CHARACTERISTICS = dict([(e[1], e[0]) for e in 169 | section_characteristics]+section_characteristics) 170 | 171 | 172 | debug_types = [ 173 | ('IMAGE_DEBUG_TYPE_UNKNOWN', 0), 174 | ('IMAGE_DEBUG_TYPE_COFF', 1), 175 | ('IMAGE_DEBUG_TYPE_CODEVIEW', 2), 176 | ('IMAGE_DEBUG_TYPE_FPO', 3), 177 | ('IMAGE_DEBUG_TYPE_MISC', 4), 178 | ('IMAGE_DEBUG_TYPE_EXCEPTION', 5), 179 | ('IMAGE_DEBUG_TYPE_FIXUP', 6), 180 | ('IMAGE_DEBUG_TYPE_OMAP_TO_SRC', 7), 181 | ('IMAGE_DEBUG_TYPE_OMAP_FROM_SRC', 8), 182 | ('IMAGE_DEBUG_TYPE_BORLAND', 9), 183 | ('IMAGE_DEBUG_TYPE_RESERVED10', 10) ] 184 | 185 | DEBUG_TYPE = dict([(e[1], e[0]) for e in debug_types]+debug_types) 186 | 187 | 188 | subsystem_types = [ 189 | ('IMAGE_SUBSYSTEM_UNKNOWN', 0), 190 | ('IMAGE_SUBSYSTEM_NATIVE', 1), 191 | ('IMAGE_SUBSYSTEM_WINDOWS_GUI', 2), 192 | ('IMAGE_SUBSYSTEM_WINDOWS_CUI', 3), 193 | ('IMAGE_SUBSYSTEM_OS2_CUI', 5), 194 | ('IMAGE_SUBSYSTEM_POSIX_CUI', 7), 195 | ('IMAGE_SUBSYSTEM_WINDOWS_CE_GUI', 9), 196 | ('IMAGE_SUBSYSTEM_EFI_APPLICATION', 10), 197 | ('IMAGE_SUBSYSTEM_EFI_BOOT_SERVICE_DRIVER', 11), 198 | ('IMAGE_SUBSYSTEM_EFI_RUNTIME_DRIVER', 12), 199 | ('IMAGE_SUBSYSTEM_EFI_ROM', 13), 200 | ('IMAGE_SUBSYSTEM_XBOX', 14)] 201 | 202 | SUBSYSTEM_TYPE = dict([(e[1], e[0]) for e in subsystem_types]+subsystem_types) 203 | 204 | 205 | machine_types = [ 206 | ('IMAGE_FILE_MACHINE_UNKNOWN', 0), 207 | ('IMAGE_FILE_MACHINE_AM33', 0x1d3), 208 | ('IMAGE_FILE_MACHINE_AMD64', 0x8664), 209 | ('IMAGE_FILE_MACHINE_ARM', 0x1c0), 210 | ('IMAGE_FILE_MACHINE_EBC', 0xebc), 211 | ('IMAGE_FILE_MACHINE_I386', 0x14c), 212 | ('IMAGE_FILE_MACHINE_IA64', 0x200), 213 | ('IMAGE_FILE_MACHINE_MR32', 0x9041), 214 | ('IMAGE_FILE_MACHINE_MIPS16', 0x266), 215 | ('IMAGE_FILE_MACHINE_MIPSFPU', 0x366), 216 | ('IMAGE_FILE_MACHINE_MIPSFPU16',0x466), 217 | ('IMAGE_FILE_MACHINE_POWERPC', 0x1f0), 218 | ('IMAGE_FILE_MACHINE_POWERPCFP',0x1f1), 219 | ('IMAGE_FILE_MACHINE_R4000', 0x166), 220 | ('IMAGE_FILE_MACHINE_SH3', 0x1a2), 221 | ('IMAGE_FILE_MACHINE_SH3DSP', 0x1a3), 222 | ('IMAGE_FILE_MACHINE_SH4', 0x1a6), 223 | ('IMAGE_FILE_MACHINE_SH5', 0x1a8), 224 | ('IMAGE_FILE_MACHINE_THUMB', 0x1c2), 225 | ('IMAGE_FILE_MACHINE_WCEMIPSV2',0x169), 226 | ] 227 | 228 | MACHINE_TYPE = dict([(e[1], e[0]) for e in machine_types]+machine_types) 229 | 230 | 231 | relocation_types = [ 232 | ('IMAGE_REL_BASED_ABSOLUTE', 0), 233 | ('IMAGE_REL_BASED_HIGH', 1), 234 | ('IMAGE_REL_BASED_LOW', 2), 235 | ('IMAGE_REL_BASED_HIGHLOW', 3), 236 | ('IMAGE_REL_BASED_HIGHADJ', 4), 237 | ('IMAGE_REL_BASED_MIPS_JMPADDR', 5), 238 | ('IMAGE_REL_BASED_SECTION', 6), 239 | ('IMAGE_REL_BASED_REL', 7), 240 | ('IMAGE_REL_BASED_MIPS_JMPADDR16', 9), 241 | ('IMAGE_REL_BASED_IA64_IMM64', 9), 242 | ('IMAGE_REL_BASED_DIR64', 10), 243 | ('IMAGE_REL_BASED_HIGH3ADJ', 11) ] 244 | 245 | RELOCATION_TYPE = dict([(e[1], e[0]) for e in relocation_types]+relocation_types) 246 | 247 | 248 | dll_characteristics = [ 249 | ('IMAGE_DLL_CHARACTERISTICS_RESERVED_0x0001', 0x0001), 250 | ('IMAGE_DLL_CHARACTERISTICS_RESERVED_0x0002', 0x0002), 251 | ('IMAGE_DLL_CHARACTERISTICS_RESERVED_0x0004', 0x0004), 252 | ('IMAGE_DLL_CHARACTERISTICS_RESERVED_0x0008', 0x0008), 253 | ('IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE', 0x0040), 254 | ('IMAGE_DLL_CHARACTERISTICS_FORCE_INTEGRITY', 0x0080), 255 | ('IMAGE_DLL_CHARACTERISTICS_NX_COMPAT', 0x0100), 256 | ('IMAGE_DLL_CHARACTERISTICS_NO_ISOLATION', 0x0200), 257 | ('IMAGE_DLL_CHARACTERISTICS_NO_SEH', 0x0400), 258 | ('IMAGE_DLL_CHARACTERISTICS_NO_BIND', 0x0800), 259 | ('IMAGE_DLL_CHARACTERISTICS_RESERVED_0x1000', 0x1000), 260 | ('IMAGE_DLL_CHARACTERISTICS_WDM_DRIVER', 0x2000), 261 | ('IMAGE_DLL_CHARACTERISTICS_TERMINAL_SERVER_AWARE', 0x8000) ] 262 | 263 | DLL_CHARACTERISTICS = dict([(e[1], e[0]) for e in dll_characteristics]+dll_characteristics) 264 | 265 | 266 | # Resource types 267 | resource_type = [ 268 | ('RT_CURSOR', 1), 269 | ('RT_BITMAP', 2), 270 | ('RT_ICON', 3), 271 | ('RT_MENU', 4), 272 | ('RT_DIALOG', 5), 273 | ('RT_STRING', 6), 274 | ('RT_FONTDIR', 7), 275 | ('RT_FONT', 8), 276 | ('RT_ACCELERATOR', 9), 277 | ('RT_RCDATA', 10), 278 | ('RT_MESSAGETABLE', 11), 279 | ('RT_GROUP_CURSOR', 12), 280 | ('RT_GROUP_ICON', 14), 281 | ('RT_VERSION', 16), 282 | ('RT_DLGINCLUDE', 17), 283 | ('RT_PLUGPLAY', 19), 284 | ('RT_VXD', 20), 285 | ('RT_ANICURSOR', 21), 286 | ('RT_ANIICON', 22), 287 | ('RT_HTML', 23), 288 | ('RT_MANIFEST', 24) ] 289 | 290 | RESOURCE_TYPE = dict([(e[1], e[0]) for e in resource_type]+resource_type) 291 | 292 | 293 | # Language definitions 294 | lang = [ 295 | ('LANG_NEUTRAL', 0x00), 296 | ('LANG_INVARIANT', 0x7f), 297 | ('LANG_AFRIKAANS', 0x36), 298 | ('LANG_ALBANIAN', 0x1c), 299 | ('LANG_ARABIC', 0x01), 300 | ('LANG_ARMENIAN', 0x2b), 301 | ('LANG_ASSAMESE', 0x4d), 302 | ('LANG_AZERI', 0x2c), 303 | ('LANG_BASQUE', 0x2d), 304 | ('LANG_BELARUSIAN', 0x23), 305 | ('LANG_BENGALI', 0x45), 306 | ('LANG_BULGARIAN', 0x02), 307 | ('LANG_CATALAN', 0x03), 308 | ('LANG_CHINESE', 0x04), 309 | ('LANG_CROATIAN', 0x1a), 310 | ('LANG_CZECH', 0x05), 311 | ('LANG_DANISH', 0x06), 312 | ('LANG_DIVEHI', 0x65), 313 | ('LANG_DUTCH', 0x13), 314 | ('LANG_ENGLISH', 0x09), 315 | ('LANG_ESTONIAN', 0x25), 316 | ('LANG_FAEROESE', 0x38), 317 | ('LANG_FARSI', 0x29), 318 | ('LANG_FINNISH', 0x0b), 319 | ('LANG_FRENCH', 0x0c), 320 | ('LANG_GALICIAN', 0x56), 321 | ('LANG_GEORGIAN', 0x37), 322 | ('LANG_GERMAN', 0x07), 323 | ('LANG_GREEK', 0x08), 324 | ('LANG_GUJARATI', 0x47), 325 | ('LANG_HEBREW', 0x0d), 326 | ('LANG_HINDI', 0x39), 327 | ('LANG_HUNGARIAN', 0x0e), 328 | ('LANG_ICELANDIC', 0x0f), 329 | ('LANG_INDONESIAN', 0x21), 330 | ('LANG_ITALIAN', 0x10), 331 | ('LANG_JAPANESE', 0x11), 332 | ('LANG_KANNADA', 0x4b), 333 | ('LANG_KASHMIRI', 0x60), 334 | ('LANG_KAZAK', 0x3f), 335 | ('LANG_KONKANI', 0x57), 336 | ('LANG_KOREAN', 0x12), 337 | ('LANG_KYRGYZ', 0x40), 338 | ('LANG_LATVIAN', 0x26), 339 | ('LANG_LITHUANIAN', 0x27), 340 | ('LANG_MACEDONIAN', 0x2f), 341 | ('LANG_MALAY', 0x3e), 342 | ('LANG_MALAYALAM', 0x4c), 343 | ('LANG_MANIPURI', 0x58), 344 | ('LANG_MARATHI', 0x4e), 345 | ('LANG_MONGOLIAN', 0x50), 346 | ('LANG_NEPALI', 0x61), 347 | ('LANG_NORWEGIAN', 0x14), 348 | ('LANG_ORIYA', 0x48), 349 | ('LANG_POLISH', 0x15), 350 | ('LANG_PORTUGUESE', 0x16), 351 | ('LANG_PUNJABI', 0x46), 352 | ('LANG_ROMANIAN', 0x18), 353 | ('LANG_RUSSIAN', 0x19), 354 | ('LANG_SANSKRIT', 0x4f), 355 | ('LANG_SERBIAN', 0x1a), 356 | ('LANG_SINDHI', 0x59), 357 | ('LANG_SLOVAK', 0x1b), 358 | ('LANG_SLOVENIAN', 0x24), 359 | ('LANG_SPANISH', 0x0a), 360 | ('LANG_SWAHILI', 0x41), 361 | ('LANG_SWEDISH', 0x1d), 362 | ('LANG_SYRIAC', 0x5a), 363 | ('LANG_TAMIL', 0x49), 364 | ('LANG_TATAR', 0x44), 365 | ('LANG_TELUGU', 0x4a), 366 | ('LANG_THAI', 0x1e), 367 | ('LANG_TURKISH', 0x1f), 368 | ('LANG_UKRAINIAN', 0x22), 369 | ('LANG_URDU', 0x20), 370 | ('LANG_UZBEK', 0x43), 371 | ('LANG_VIETNAMESE', 0x2a), 372 | ('LANG_GAELIC', 0x3c), 373 | ('LANG_MALTESE', 0x3a), 374 | ('LANG_MAORI', 0x28), 375 | ('LANG_RHAETO_ROMANCE',0x17), 376 | ('LANG_SAAMI', 0x3b), 377 | ('LANG_SORBIAN', 0x2e), 378 | ('LANG_SUTU', 0x30), 379 | ('LANG_TSONGA', 0x31), 380 | ('LANG_TSWANA', 0x32), 381 | ('LANG_VENDA', 0x33), 382 | ('LANG_XHOSA', 0x34), 383 | ('LANG_ZULU', 0x35), 384 | ('LANG_ESPERANTO', 0x8f), 385 | ('LANG_WALON', 0x90), 386 | ('LANG_CORNISH', 0x91), 387 | ('LANG_WELSH', 0x92), 388 | ('LANG_BRETON', 0x93) ] 389 | 390 | LANG = dict(lang+[(e[1], e[0]) for e in lang]) 391 | 392 | 393 | # Sublanguage definitions 394 | sublang = [ 395 | ('SUBLANG_NEUTRAL', 0x00), 396 | ('SUBLANG_DEFAULT', 0x01), 397 | ('SUBLANG_SYS_DEFAULT', 0x02), 398 | ('SUBLANG_ARABIC_SAUDI_ARABIA', 0x01), 399 | ('SUBLANG_ARABIC_IRAQ', 0x02), 400 | ('SUBLANG_ARABIC_EGYPT', 0x03), 401 | ('SUBLANG_ARABIC_LIBYA', 0x04), 402 | ('SUBLANG_ARABIC_ALGERIA', 0x05), 403 | ('SUBLANG_ARABIC_MOROCCO', 0x06), 404 | ('SUBLANG_ARABIC_TUNISIA', 0x07), 405 | ('SUBLANG_ARABIC_OMAN', 0x08), 406 | ('SUBLANG_ARABIC_YEMEN', 0x09), 407 | ('SUBLANG_ARABIC_SYRIA', 0x0a), 408 | ('SUBLANG_ARABIC_JORDAN', 0x0b), 409 | ('SUBLANG_ARABIC_LEBANON', 0x0c), 410 | ('SUBLANG_ARABIC_KUWAIT', 0x0d), 411 | ('SUBLANG_ARABIC_UAE', 0x0e), 412 | ('SUBLANG_ARABIC_BAHRAIN', 0x0f), 413 | ('SUBLANG_ARABIC_QATAR', 0x10), 414 | ('SUBLANG_AZERI_LATIN', 0x01), 415 | ('SUBLANG_AZERI_CYRILLIC', 0x02), 416 | ('SUBLANG_CHINESE_TRADITIONAL', 0x01), 417 | ('SUBLANG_CHINESE_SIMPLIFIED', 0x02), 418 | ('SUBLANG_CHINESE_HONGKONG', 0x03), 419 | ('SUBLANG_CHINESE_SINGAPORE', 0x04), 420 | ('SUBLANG_CHINESE_MACAU', 0x05), 421 | ('SUBLANG_DUTCH', 0x01), 422 | ('SUBLANG_DUTCH_BELGIAN', 0x02), 423 | ('SUBLANG_ENGLISH_US', 0x01), 424 | ('SUBLANG_ENGLISH_UK', 0x02), 425 | ('SUBLANG_ENGLISH_AUS', 0x03), 426 | ('SUBLANG_ENGLISH_CAN', 0x04), 427 | ('SUBLANG_ENGLISH_NZ', 0x05), 428 | ('SUBLANG_ENGLISH_EIRE', 0x06), 429 | ('SUBLANG_ENGLISH_SOUTH_AFRICA', 0x07), 430 | ('SUBLANG_ENGLISH_JAMAICA', 0x08), 431 | ('SUBLANG_ENGLISH_CARIBBEAN', 0x09), 432 | ('SUBLANG_ENGLISH_BELIZE', 0x0a), 433 | ('SUBLANG_ENGLISH_TRINIDAD', 0x0b), 434 | ('SUBLANG_ENGLISH_ZIMBABWE', 0x0c), 435 | ('SUBLANG_ENGLISH_PHILIPPINES', 0x0d), 436 | ('SUBLANG_FRENCH', 0x01), 437 | ('SUBLANG_FRENCH_BELGIAN', 0x02), 438 | ('SUBLANG_FRENCH_CANADIAN', 0x03), 439 | ('SUBLANG_FRENCH_SWISS', 0x04), 440 | ('SUBLANG_FRENCH_LUXEMBOURG', 0x05), 441 | ('SUBLANG_FRENCH_MONACO', 0x06), 442 | ('SUBLANG_GERMAN', 0x01), 443 | ('SUBLANG_GERMAN_SWISS', 0x02), 444 | ('SUBLANG_GERMAN_AUSTRIAN', 0x03), 445 | ('SUBLANG_GERMAN_LUXEMBOURG', 0x04), 446 | ('SUBLANG_GERMAN_LIECHTENSTEIN', 0x05), 447 | ('SUBLANG_ITALIAN', 0x01), 448 | ('SUBLANG_ITALIAN_SWISS', 0x02), 449 | ('SUBLANG_KASHMIRI_SASIA', 0x02), 450 | ('SUBLANG_KASHMIRI_INDIA', 0x02), 451 | ('SUBLANG_KOREAN', 0x01), 452 | ('SUBLANG_LITHUANIAN', 0x01), 453 | ('SUBLANG_MALAY_MALAYSIA', 0x01), 454 | ('SUBLANG_MALAY_BRUNEI_DARUSSALAM', 0x02), 455 | ('SUBLANG_NEPALI_INDIA', 0x02), 456 | ('SUBLANG_NORWEGIAN_BOKMAL', 0x01), 457 | ('SUBLANG_NORWEGIAN_NYNORSK', 0x02), 458 | ('SUBLANG_PORTUGUESE', 0x02), 459 | ('SUBLANG_PORTUGUESE_BRAZILIAN', 0x01), 460 | ('SUBLANG_SERBIAN_LATIN', 0x02), 461 | ('SUBLANG_SERBIAN_CYRILLIC', 0x03), 462 | ('SUBLANG_SPANISH', 0x01), 463 | ('SUBLANG_SPANISH_MEXICAN', 0x02), 464 | ('SUBLANG_SPANISH_MODERN', 0x03), 465 | ('SUBLANG_SPANISH_GUATEMALA', 0x04), 466 | ('SUBLANG_SPANISH_COSTA_RICA', 0x05), 467 | ('SUBLANG_SPANISH_PANAMA', 0x06), 468 | ('SUBLANG_SPANISH_DOMINICAN_REPUBLIC', 0x07), 469 | ('SUBLANG_SPANISH_VENEZUELA', 0x08), 470 | ('SUBLANG_SPANISH_COLOMBIA', 0x09), 471 | ('SUBLANG_SPANISH_PERU', 0x0a), 472 | ('SUBLANG_SPANISH_ARGENTINA', 0x0b), 473 | ('SUBLANG_SPANISH_ECUADOR', 0x0c), 474 | ('SUBLANG_SPANISH_CHILE', 0x0d), 475 | ('SUBLANG_SPANISH_URUGUAY', 0x0e), 476 | ('SUBLANG_SPANISH_PARAGUAY', 0x0f), 477 | ('SUBLANG_SPANISH_BOLIVIA', 0x10), 478 | ('SUBLANG_SPANISH_EL_SALVADOR', 0x11), 479 | ('SUBLANG_SPANISH_HONDURAS', 0x12), 480 | ('SUBLANG_SPANISH_NICARAGUA', 0x13), 481 | ('SUBLANG_SPANISH_PUERTO_RICO', 0x14), 482 | ('SUBLANG_SWEDISH', 0x01), 483 | ('SUBLANG_SWEDISH_FINLAND', 0x02), 484 | ('SUBLANG_URDU_PAKISTAN', 0x01), 485 | ('SUBLANG_URDU_INDIA', 0x02), 486 | ('SUBLANG_UZBEK_LATIN', 0x01), 487 | ('SUBLANG_UZBEK_CYRILLIC', 0x02), 488 | ('SUBLANG_DUTCH_SURINAM', 0x03), 489 | ('SUBLANG_ROMANIAN', 0x01), 490 | ('SUBLANG_ROMANIAN_MOLDAVIA', 0x02), 491 | ('SUBLANG_RUSSIAN', 0x01), 492 | ('SUBLANG_RUSSIAN_MOLDAVIA', 0x02), 493 | ('SUBLANG_CROATIAN', 0x01), 494 | ('SUBLANG_LITHUANIAN_CLASSIC', 0x02), 495 | ('SUBLANG_GAELIC', 0x01), 496 | ('SUBLANG_GAELIC_SCOTTISH', 0x02), 497 | ('SUBLANG_GAELIC_MANX', 0x03) ] 498 | 499 | SUBLANG = dict(sublang+[(e[1], e[0]) for e in sublang]) 500 | 501 | # Initialize the dictionary with all the name->value pairs 502 | SUBLANG = dict( sublang ) 503 | # Now add all the value->name information, handling duplicates appropriately 504 | for sublang_name, sublang_value in sublang: 505 | if SUBLANG.has_key( sublang_value ): 506 | SUBLANG[ sublang_value ].append( sublang_name ) 507 | else: 508 | SUBLANG[ sublang_value ] = [ sublang_name ] 509 | 510 | # Resolve a sublang name given the main lang name 511 | # 512 | def get_sublang_name_for_lang( lang_value, sublang_value ): 513 | lang_name = LANG.get(lang_value, '*unknown*') 514 | for sublang_name in SUBLANG.get(sublang_value, list()): 515 | # if the main language is a substring of sublang's name, then 516 | # return that 517 | if lang_name in sublang_name: 518 | return sublang_name 519 | # otherwise return the first sublang name 520 | return SUBLANG.get(sublang_value, ['*unknown*'])[0] 521 | 522 | 523 | # Ange Albertini's code to process resources' strings 524 | # 525 | def parse_strings(data, counter, l): 526 | i = 0 527 | error_count = 0 528 | while i < len(data): 529 | 530 | data_slice = data[i:i + 2] 531 | if len(data_slice) < 2: 532 | break 533 | 534 | len_ = struct.unpack("= 3: 543 | break 544 | i += len_ * 2 545 | counter += 1 546 | 547 | 548 | def retrieve_flags(flag_dict, flag_filter): 549 | """Read the flags from a dictionary and return them in a usable form. 550 | 551 | Will return a list of (flag, value) for all flags in "flag_dict" 552 | matching the filter "flag_filter". 553 | """ 554 | 555 | return [(f[0], f[1]) for f in flag_dict.items() if 556 | isinstance(f[0], str) and f[0].startswith(flag_filter)] 557 | 558 | 559 | def set_flags(obj, flag_field, flags): 560 | """Will process the flags and set attributes in the object accordingly. 561 | 562 | The object "obj" will gain attributes named after the flags provided in 563 | "flags" and valued True/False, matching the results of applying each 564 | flag value from "flags" to flag_field. 565 | """ 566 | 567 | for flag in flags: 568 | if flag[1] & flag_field: 569 | #setattr(obj, flag[0], True) 570 | obj.__dict__[flag[0]] = True 571 | else: 572 | #setattr(obj, flag[0], False) 573 | obj.__dict__[flag[0]] = False 574 | 575 | 576 | def power_of_two(val): 577 | return val != 0 and (val & (val-1)) == 0 578 | 579 | 580 | FILE_ALIGNEMNT_HARDCODED_VALUE = 0x200 581 | FileAlignment_Warning = False # We only want to print the warning once 582 | SectionAlignment_Warning = False # We only want to print the warning once 583 | 584 | 585 | 586 | class UnicodeStringWrapperPostProcessor: 587 | """This class attempts to help the process of identifying strings 588 | that might be plain Unicode or Pascal. A list of strings will be 589 | wrapped on it with the hope the overlappings will help make the 590 | decision about their type.""" 591 | 592 | def __init__(self, pe, rva_ptr): 593 | self.pe = pe 594 | self.rva_ptr = rva_ptr 595 | self.string = None 596 | 597 | 598 | def get_rva(self): 599 | """Get the RVA of the string.""" 600 | 601 | return self.rva_ptr 602 | 603 | 604 | def __str__(self): 605 | """Return the escaped ASCII representation of the string.""" 606 | 607 | def convert_char(char): 608 | if char in string.printable: 609 | return char 610 | else: 611 | return r'\x%02x' % ord(char) 612 | 613 | if self.string: 614 | return ''.join([convert_char(c) for c in self.string]) 615 | 616 | return '' 617 | 618 | 619 | def invalidate(self): 620 | """Make this instance None, to express it's no known string type.""" 621 | 622 | self = None 623 | 624 | 625 | def render_pascal_16(self): 626 | 627 | self.string = self.pe.get_string_u_at_rva( 628 | self.rva_ptr+2, 629 | max_length=self.__get_pascal_16_length()) 630 | 631 | 632 | def ask_pascal_16(self, next_rva_ptr): 633 | """The next RVA is taken to be the one immediately following this one. 634 | 635 | Such RVA could indicate the natural end of the string and will be checked 636 | with the possible length contained in the first word. 637 | """ 638 | 639 | length = self.__get_pascal_16_length() 640 | 641 | if length == (next_rva_ptr - (self.rva_ptr+2)) / 2: 642 | self.length = length 643 | return True 644 | 645 | return False 646 | 647 | 648 | def __get_pascal_16_length(self): 649 | 650 | return self.__get_word_value_at_rva(self.rva_ptr) 651 | 652 | 653 | def __get_word_value_at_rva(self, rva): 654 | 655 | try: 656 | data = self.pe.get_data(self.rva_ptr, 2) 657 | except PEFormatError, e: 658 | return False 659 | 660 | if len(data)<2: 661 | return False 662 | 663 | return struct.unpack(' self.__format_length__: 865 | data = data[:self.__format_length__] 866 | 867 | # OC Patch: 868 | # Some malware have incorrect header lengths. 869 | # Fail gracefully if this occurs 870 | # Buggy malware: a29b0118af8b7408444df81701ad5a7f 871 | # 872 | elif len(data) < self.__format_length__: 873 | raise PEFormatError('Data length less than expected header length.') 874 | 875 | 876 | if data.count(chr(0)) == len(data): 877 | self.__all_zeroes__ = True 878 | 879 | self.__unpacked_data_elms__ = struct.unpack(self.__format__, data) 880 | for i in xrange(len(self.__unpacked_data_elms__)): 881 | for key in self.__keys__[i]: 882 | #self.values[key] = self.__unpacked_data_elms__[i] 883 | setattr(self, key, self.__unpacked_data_elms__[i]) 884 | 885 | 886 | def __pack__(self): 887 | 888 | new_values = [] 889 | 890 | for i in xrange(len(self.__unpacked_data_elms__)): 891 | 892 | for key in self.__keys__[i]: 893 | new_val = getattr(self, key) 894 | old_val = self.__unpacked_data_elms__[i] 895 | 896 | # In the case of Unions, when the first changed value 897 | # is picked the loop is exited 898 | if new_val != old_val: 899 | break 900 | 901 | new_values.append(new_val) 902 | 903 | return struct.pack(self.__format__, *new_values) 904 | 905 | 906 | def __str__(self): 907 | return '\n'.join( self.dump() ) 908 | 909 | def __repr__(self): 910 | return '' % (' '.join( [' '.join(s.split()) for s in self.dump()] )) 911 | 912 | 913 | def dump(self, indentation=0): 914 | """Returns a string representation of the structure.""" 915 | 916 | dump = [] 917 | 918 | dump.append('[%s]' % self.name) 919 | 920 | # Refer to the __set_format__ method for an explanation 921 | # of the following construct. 922 | for keys in self.__keys__: 923 | for key in keys: 924 | 925 | val = getattr(self, key) 926 | if isinstance(val, int) or isinstance(val, long): 927 | val_str = '0x%-8X' % (val) 928 | if key == 'TimeDateStamp' or key == 'dwTimeStamp': 929 | try: 930 | val_str += ' [%s UTC]' % time.asctime(time.gmtime(val)) 931 | except exceptions.ValueError, e: 932 | val_str += ' [INVALID TIME]' 933 | else: 934 | val_str = ''.join(filter(lambda c:c != '\0', str(val))) 935 | 936 | dump.append('0x%-8X 0x%-3X %-30s %s' % ( 937 | self.__field_offsets__[key] + self.__file_offset__, 938 | self.__field_offsets__[key], key+':', val_str)) 939 | 940 | return dump 941 | 942 | 943 | 944 | class SectionStructure(Structure): 945 | """Convenience section handling class.""" 946 | 947 | def __init__(self, *argl, **argd): 948 | if 'pe' in argd: 949 | self.pe = argd['pe'] 950 | del argd['pe'] 951 | 952 | Structure.__init__(self, *argl, **argd) 953 | 954 | def get_data(self, start=None, length=None): 955 | """Get data chunk from a section. 956 | 957 | Allows to query data from the section by passing the 958 | addresses where the PE file would be loaded by default. 959 | It is then possible to retrieve code and data by its real 960 | addresses as it would be if loaded. 961 | """ 962 | 963 | PointerToRawData_adj = self.pe.adjust_FileAlignment( self.PointerToRawData, 964 | self.pe.OPTIONAL_HEADER.FileAlignment ) 965 | VirtualAddress_adj = self.pe.adjust_SectionAlignment( self.VirtualAddress, 966 | self.pe.OPTIONAL_HEADER.SectionAlignment, self.pe.OPTIONAL_HEADER.FileAlignment ) 967 | 968 | if start is None: 969 | offset = PointerToRawData_adj 970 | else: 971 | offset = ( start - VirtualAddress_adj ) + PointerToRawData_adj 972 | 973 | if length is not None: 974 | end = offset + length 975 | else: 976 | end = offset + self.SizeOfRawData 977 | 978 | # PointerToRawData is not adjusted here as we might want to read any possible extra bytes 979 | # that might get cut off by aligning the start (and hence cutting something off the end) 980 | # 981 | if end > self.PointerToRawData + self.SizeOfRawData: 982 | end = self.PointerToRawData + self.SizeOfRawData 983 | 984 | return self.pe.__data__[offset:end] 985 | 986 | 987 | def __setattr__(self, name, val): 988 | 989 | if name == 'Characteristics': 990 | section_flags = retrieve_flags(SECTION_CHARACTERISTICS, 'IMAGE_SCN_') 991 | 992 | # Set the section's flags according the the Characteristics member 993 | set_flags(self, val, section_flags) 994 | 995 | elif 'IMAGE_SCN_' in name and hasattr(self, name): 996 | if val: 997 | self.__dict__['Characteristics'] |= SECTION_CHARACTERISTICS[name] 998 | else: 999 | self.__dict__['Characteristics'] ^= SECTION_CHARACTERISTICS[name] 1000 | 1001 | self.__dict__[name] = val 1002 | 1003 | 1004 | def get_rva_from_offset(self, offset): 1005 | return offset - self.pe.adjust_FileAlignment( self.PointerToRawData, 1006 | self.pe.OPTIONAL_HEADER.FileAlignment ) + self.pe.adjust_SectionAlignment( self.VirtualAddress, 1007 | self.pe.OPTIONAL_HEADER.SectionAlignment, self.pe.OPTIONAL_HEADER.FileAlignment ) 1008 | 1009 | 1010 | def get_offset_from_rva(self, rva): 1011 | return (rva - 1012 | self.pe.adjust_SectionAlignment( 1013 | self.VirtualAddress, 1014 | self.pe.OPTIONAL_HEADER.SectionAlignment, 1015 | self.pe.OPTIONAL_HEADER.FileAlignment ) 1016 | ) + self.pe.adjust_FileAlignment( 1017 | self.PointerToRawData, 1018 | self.pe.OPTIONAL_HEADER.FileAlignment ) 1019 | 1020 | 1021 | def contains_offset(self, offset): 1022 | """Check whether the section contains the file offset provided.""" 1023 | 1024 | if self.PointerToRawData is None: 1025 | # bss and other sections containing only uninitialized data must have 0 1026 | # and do not take space in the file 1027 | return False 1028 | return ( self.pe.adjust_FileAlignment( self.PointerToRawData, 1029 | self.pe.OPTIONAL_HEADER.FileAlignment ) <= 1030 | offset < 1031 | self.pe.adjust_FileAlignment( self.PointerToRawData, 1032 | self.pe.OPTIONAL_HEADER.FileAlignment ) + 1033 | self.SizeOfRawData ) 1034 | 1035 | 1036 | def contains_rva(self, rva): 1037 | """Check whether the section contains the address provided.""" 1038 | 1039 | # Check if the SizeOfRawData is realistic. If it's bigger than the size of 1040 | # the whole PE file minus the start address of the section it could be 1041 | # either truncated or the SizeOfRawData contain a misleading value. 1042 | # In either of those cases we take the VirtualSize 1043 | # 1044 | if len(self.pe.__data__) - self.pe.adjust_FileAlignment( self.PointerToRawData, 1045 | self.pe.OPTIONAL_HEADER.FileAlignment ) < self.SizeOfRawData: 1046 | # PECOFF documentation v8 says: 1047 | # VirtualSize: The total size of the section when loaded into memory. 1048 | # If this value is greater than SizeOfRawData, the section is zero-padded. 1049 | # This field is valid only for executable images and should be set to zero 1050 | # for object files. 1051 | # 1052 | size = self.Misc_VirtualSize 1053 | else: 1054 | size = max(self.SizeOfRawData, self.Misc_VirtualSize) 1055 | 1056 | VirtualAddress_adj = self.pe.adjust_SectionAlignment( self.VirtualAddress, 1057 | self.pe.OPTIONAL_HEADER.SectionAlignment, self.pe.OPTIONAL_HEADER.FileAlignment ) 1058 | 1059 | return VirtualAddress_adj <= rva < VirtualAddress_adj + size 1060 | 1061 | 1062 | def contains(self, rva): 1063 | #print "DEPRECATION WARNING: you should use contains_rva() instead of contains()" 1064 | return self.contains_rva(rva) 1065 | 1066 | 1067 | #def set_data(self, data): 1068 | # """Set the data belonging to the section.""" 1069 | # 1070 | # self.data = data 1071 | 1072 | 1073 | def get_entropy(self): 1074 | """Calculate and return the entropy for the section.""" 1075 | 1076 | return self.entropy_H( self.get_data() ) 1077 | 1078 | 1079 | def get_hash_sha1(self): 1080 | """Get the SHA-1 hex-digest of the section's data.""" 1081 | 1082 | if sha1 is not None: 1083 | return sha1( self.get_data() ).hexdigest() 1084 | 1085 | 1086 | def get_hash_sha256(self): 1087 | """Get the SHA-256 hex-digest of the section's data.""" 1088 | 1089 | if sha256 is not None: 1090 | return sha256( self.get_data() ).hexdigest() 1091 | 1092 | 1093 | def get_hash_sha512(self): 1094 | """Get the SHA-512 hex-digest of the section's data.""" 1095 | 1096 | if sha512 is not None: 1097 | return sha512( self.get_data() ).hexdigest() 1098 | 1099 | 1100 | def get_hash_md5(self): 1101 | """Get the MD5 hex-digest of the section's data.""" 1102 | 1103 | if md5 is not None: 1104 | return md5( self.get_data() ).hexdigest() 1105 | 1106 | 1107 | def entropy_H(self, data): 1108 | """Calculate the entropy of a chunk of data.""" 1109 | 1110 | if len(data) == 0: 1111 | return 0.0 1112 | 1113 | occurences = array.array('L', [0]*256) 1114 | 1115 | for x in data: 1116 | occurences[ord(x)] += 1 1117 | 1118 | entropy = 0 1119 | for x in occurences: 1120 | if x: 1121 | p_x = float(x) / len(data) 1122 | entropy -= p_x*math.log(p_x, 2) 1123 | 1124 | return entropy 1125 | 1126 | 1127 | 1128 | class DataContainer: 1129 | """Generic data container.""" 1130 | 1131 | def __init__(self, **args): 1132 | for key, value in args.items(): 1133 | setattr(self, key, value) 1134 | 1135 | 1136 | 1137 | class ImportDescData(DataContainer): 1138 | """Holds import descriptor information. 1139 | 1140 | dll: name of the imported DLL 1141 | imports: list of imported symbols (ImportData instances) 1142 | struct: IMAGE_IMPORT_DESCRIPTOR structure 1143 | """ 1144 | 1145 | class ImportData(DataContainer): 1146 | """Holds imported symbol's information. 1147 | 1148 | ordinal: Ordinal of the symbol 1149 | name: Name of the symbol 1150 | bound: If the symbol is bound, this contains 1151 | the address. 1152 | """ 1153 | 1154 | 1155 | def __setattr__(self, name, val): 1156 | 1157 | # If the instance doesn't yet have an ordinal attribute 1158 | # it's not fully initialized so can't do any of the 1159 | # following 1160 | # 1161 | if hasattr(self, 'ordinal') and hasattr(self, 'bound') and hasattr(self, 'name'): 1162 | 1163 | if name == 'ordinal': 1164 | 1165 | if self.pe.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE: 1166 | ordinal_flag = IMAGE_ORDINAL_FLAG 1167 | elif self.pe.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS: 1168 | ordinal_flag = IMAGE_ORDINAL_FLAG64 1169 | 1170 | # Set the ordinal and flag the entry as imporing by ordinal 1171 | self.struct_table.Ordinal = ordinal_flag | (val & 0xffff) 1172 | self.struct_table.AddressOfData = self.struct_table.Ordinal 1173 | self.struct_table.Function = self.struct_table.Ordinal 1174 | self.struct_table.ForwarderString = self.struct_table.Ordinal 1175 | elif name == 'bound': 1176 | if self.struct_iat is not None: 1177 | self.struct_iat.AddressOfData = val 1178 | self.struct_iat.AddressOfData = self.struct_iat.AddressOfData 1179 | self.struct_iat.Function = self.struct_iat.AddressOfData 1180 | self.struct_iat.ForwarderString = self.struct_iat.AddressOfData 1181 | elif name == 'address': 1182 | self.struct_table.AddressOfData = val 1183 | self.struct_table.Ordinal = self.struct_table.AddressOfData 1184 | self.struct_table.Function = self.struct_table.AddressOfData 1185 | self.struct_table.ForwarderString = self.struct_table.AddressOfData 1186 | elif name == 'name': 1187 | # Make sure we reset the entry in case the import had been set to import by ordinal 1188 | if self.name_offset: 1189 | 1190 | name_rva = self.pe.get_rva_from_offset( self.name_offset ) 1191 | self.pe.set_dword_at_offset( self.ordinal_offset, (0<<31) | name_rva ) 1192 | 1193 | # Complain if the length of the new name is longer than the existing one 1194 | if len(val) > len(self.name): 1195 | #raise Exception('The export name provided is longer than the existing one.') 1196 | pass 1197 | self.pe.set_bytes_at_offset( self.name_offset, val ) 1198 | 1199 | self.__dict__[name] = val 1200 | 1201 | 1202 | class ExportDirData(DataContainer): 1203 | """Holds export directory information. 1204 | 1205 | struct: IMAGE_EXPORT_DIRECTORY structure 1206 | symbols: list of exported symbols (ExportData instances) 1207 | """ 1208 | 1209 | class ExportData(DataContainer): 1210 | """Holds exported symbols' information. 1211 | 1212 | ordinal: ordinal of the symbol 1213 | address: address of the symbol 1214 | name: name of the symbol (None if the symbol is 1215 | exported by ordinal only) 1216 | forwarder: if the symbol is forwarded it will 1217 | contain the name of the target symbol, 1218 | None otherwise. 1219 | """ 1220 | 1221 | def __setattr__(self, name, val): 1222 | 1223 | # If the instance doesn't yet have an ordinal attribute 1224 | # it's not fully initialized so can't do any of the 1225 | # following 1226 | # 1227 | if hasattr(self, 'ordinal') and hasattr(self, 'address') and hasattr(self, 'forwarder') and hasattr(self, 'name'): 1228 | 1229 | if name == 'ordinal': 1230 | self.pe.set_word_at_offset( self.ordinal_offset, val ) 1231 | elif name == 'address': 1232 | self.pe.set_dword_at_offset( self.address_offset, val ) 1233 | elif name == 'name': 1234 | # Complain if the length of the new name is longer than the existing one 1235 | if len(val) > len(self.name): 1236 | #raise Exception('The export name provided is longer than the existing one.') 1237 | pass 1238 | self.pe.set_bytes_at_offset( self.name_offset, val ) 1239 | elif name == 'forwarder': 1240 | # Complain if the length of the new name is longer than the existing one 1241 | if len(val) > len(self.forwarder): 1242 | #raise Exception('The forwarder name provided is longer than the existing one.') 1243 | pass 1244 | self.pe.set_bytes_at_offset( self.forwarder_offset, val ) 1245 | 1246 | self.__dict__[name] = val 1247 | 1248 | 1249 | class ResourceDirData(DataContainer): 1250 | """Holds resource directory information. 1251 | 1252 | struct: IMAGE_RESOURCE_DIRECTORY structure 1253 | entries: list of entries (ResourceDirEntryData instances) 1254 | """ 1255 | 1256 | class ResourceDirEntryData(DataContainer): 1257 | """Holds resource directory entry data. 1258 | 1259 | struct: IMAGE_RESOURCE_DIRECTORY_ENTRY structure 1260 | name: If the resource is identified by name this 1261 | attribute will contain the name string. None 1262 | otherwise. If identified by id, the id is 1263 | available at 'struct.Id' 1264 | id: the id, also in struct.Id 1265 | directory: If this entry has a lower level directory 1266 | this attribute will point to the 1267 | ResourceDirData instance representing it. 1268 | data: If this entry has no further lower directories 1269 | and points to the actual resource data, this 1270 | attribute will reference the corresponding 1271 | ResourceDataEntryData instance. 1272 | (Either of the 'directory' or 'data' attribute will exist, 1273 | but not both.) 1274 | """ 1275 | 1276 | class ResourceDataEntryData(DataContainer): 1277 | """Holds resource data entry information. 1278 | 1279 | struct: IMAGE_RESOURCE_DATA_ENTRY structure 1280 | lang: Primary language ID 1281 | sublang: Sublanguage ID 1282 | """ 1283 | 1284 | class DebugData(DataContainer): 1285 | """Holds debug information. 1286 | 1287 | struct: IMAGE_DEBUG_DIRECTORY structure 1288 | """ 1289 | 1290 | class BaseRelocationData(DataContainer): 1291 | """Holds base relocation information. 1292 | 1293 | struct: IMAGE_BASE_RELOCATION structure 1294 | entries: list of relocation data (RelocationData instances) 1295 | """ 1296 | 1297 | class RelocationData(DataContainer): 1298 | """Holds relocation information. 1299 | 1300 | type: Type of relocation 1301 | The type string is can be obtained by 1302 | RELOCATION_TYPE[type] 1303 | rva: RVA of the relocation 1304 | """ 1305 | def __setattr__(self, name, val): 1306 | 1307 | # If the instance doesn't yet have a struct attribute 1308 | # it's not fully initialized so can't do any of the 1309 | # following 1310 | # 1311 | if hasattr(self, 'struct'): 1312 | # Get the word containing the type and data 1313 | # 1314 | word = self.struct.Data 1315 | 1316 | if name == 'type': 1317 | word = (val << 12) | (word & 0xfff) 1318 | elif name == 'rva': 1319 | offset = val-self.base_rva 1320 | if offset < 0: 1321 | offset = 0 1322 | word = ( word & 0xf000) | ( offset & 0xfff) 1323 | 1324 | # Store the modified data 1325 | # 1326 | self.struct.Data = word 1327 | 1328 | self.__dict__[name] = val 1329 | 1330 | class TlsData(DataContainer): 1331 | """Holds TLS information. 1332 | 1333 | struct: IMAGE_TLS_DIRECTORY structure 1334 | """ 1335 | 1336 | class BoundImportDescData(DataContainer): 1337 | """Holds bound import descriptor data. 1338 | 1339 | This directory entry will provide with information on the 1340 | DLLs this PE files has been bound to (if bound at all). 1341 | The structure will contain the name and timestamp of the 1342 | DLL at the time of binding so that the loader can know 1343 | whether it differs from the one currently present in the 1344 | system and must, therefore, re-bind the PE's imports. 1345 | 1346 | struct: IMAGE_BOUND_IMPORT_DESCRIPTOR structure 1347 | name: DLL name 1348 | entries: list of entries (BoundImportRefData instances) 1349 | the entries will exist if this DLL has forwarded 1350 | symbols. If so, the destination DLL will have an 1351 | entry in this list. 1352 | """ 1353 | 1354 | class LoadConfigData(DataContainer): 1355 | """Holds Load Config data. 1356 | 1357 | struct: IMAGE_LOAD_CONFIG_DIRECTORY structure 1358 | name: dll name 1359 | """ 1360 | 1361 | class BoundImportRefData(DataContainer): 1362 | """Holds bound import forwarder reference data. 1363 | 1364 | Contains the same information as the bound descriptor but 1365 | for forwarded DLLs, if any. 1366 | 1367 | struct: IMAGE_BOUND_FORWARDER_REF structure 1368 | name: dll name 1369 | """ 1370 | 1371 | 1372 | # Valid FAT32 8.3 short filename characters according to: 1373 | # http://en.wikipedia.org/wiki/8.3_filename 1374 | # This will help decide whether DLL ASCII names are likely 1375 | # to be valid of otherwise corruted data 1376 | # 1377 | # The flename length is not checked because the DLLs filename 1378 | # can be longer that the 8.3 1379 | allowed_filename = string.lowercase + string.uppercase + string.digits + "!#$%&'()-@^_`{}~+,.;=[]" + ''.join( [chr(i) for i in range(128, 256)] ) 1380 | def is_valid_dos_filename(s): 1381 | if s is None or not isinstance(s, str): 1382 | return False 1383 | for c in s: 1384 | if c not in allowed_filename: 1385 | return False 1386 | return True 1387 | 1388 | 1389 | # Check if a imported name uses the valid accepted characters expected in mangled 1390 | # function names. If the symbol's characters don't fall within this charset 1391 | # we will assume the name is invalid 1392 | # 1393 | allowed_function_name = string.lowercase + string.uppercase + string.digits + '_?@$()' 1394 | def is_valid_function_name(s): 1395 | if s is None or not isinstance(s, str): 1396 | return False 1397 | for c in s: 1398 | if c not in allowed_function_name: 1399 | return False 1400 | return True 1401 | 1402 | 1403 | 1404 | class PE: 1405 | """A Portable Executable representation. 1406 | 1407 | This class provides access to most of the information in a PE file. 1408 | 1409 | It expects to be supplied the name of the file to load or PE data 1410 | to process and an optional argument 'fast_load' (False by default) 1411 | which controls whether to load all the directories information, 1412 | which can be quite time consuming. 1413 | 1414 | pe = pefile.PE('module.dll') 1415 | pe = pefile.PE(name='module.dll') 1416 | 1417 | would load 'module.dll' and process it. If the data would be already 1418 | available in a buffer the same could be achieved with: 1419 | 1420 | pe = pefile.PE(data=module_dll_data) 1421 | 1422 | The "fast_load" can be set to a default by setting its value in the 1423 | module itself by means,for instance, of a "pefile.fast_load = True". 1424 | That will make all the subsequent instances not to load the 1425 | whole PE structure. The "full_load" method can be used to parse 1426 | the missing data at a later stage. 1427 | 1428 | Basic headers information will be available in the attributes: 1429 | 1430 | DOS_HEADER 1431 | NT_HEADERS 1432 | FILE_HEADER 1433 | OPTIONAL_HEADER 1434 | 1435 | All of them will contain among their attributes the members of the 1436 | corresponding structures as defined in WINNT.H 1437 | 1438 | The raw data corresponding to the header (from the beginning of the 1439 | file up to the start of the first section) will be available in the 1440 | instance's attribute 'header' as a string. 1441 | 1442 | The sections will be available as a list in the 'sections' attribute. 1443 | Each entry will contain as attributes all the structure's members. 1444 | 1445 | Directory entries will be available as attributes (if they exist): 1446 | (no other entries are processed at this point) 1447 | 1448 | DIRECTORY_ENTRY_IMPORT (list of ImportDescData instances) 1449 | DIRECTORY_ENTRY_EXPORT (ExportDirData instance) 1450 | DIRECTORY_ENTRY_RESOURCE (ResourceDirData instance) 1451 | DIRECTORY_ENTRY_DEBUG (list of DebugData instances) 1452 | DIRECTORY_ENTRY_BASERELOC (list of BaseRelocationData instances) 1453 | DIRECTORY_ENTRY_TLS 1454 | DIRECTORY_ENTRY_BOUND_IMPORT (list of BoundImportData instances) 1455 | 1456 | The following dictionary attributes provide ways of mapping different 1457 | constants. They will accept the numeric value and return the string 1458 | representation and the opposite, feed in the string and get the 1459 | numeric constant: 1460 | 1461 | DIRECTORY_ENTRY 1462 | IMAGE_CHARACTERISTICS 1463 | SECTION_CHARACTERISTICS 1464 | DEBUG_TYPE 1465 | SUBSYSTEM_TYPE 1466 | MACHINE_TYPE 1467 | RELOCATION_TYPE 1468 | RESOURCE_TYPE 1469 | LANG 1470 | SUBLANG 1471 | """ 1472 | 1473 | # 1474 | # Format specifications for PE structures. 1475 | # 1476 | 1477 | __IMAGE_DOS_HEADER_format__ = ('IMAGE_DOS_HEADER', 1478 | ('H,e_magic', 'H,e_cblp', 'H,e_cp', 1479 | 'H,e_crlc', 'H,e_cparhdr', 'H,e_minalloc', 1480 | 'H,e_maxalloc', 'H,e_ss', 'H,e_sp', 'H,e_csum', 1481 | 'H,e_ip', 'H,e_cs', 'H,e_lfarlc', 'H,e_ovno', '8s,e_res', 1482 | 'H,e_oemid', 'H,e_oeminfo', '20s,e_res2', 1483 | 'I,e_lfanew')) 1484 | 1485 | __IMAGE_FILE_HEADER_format__ = ('IMAGE_FILE_HEADER', 1486 | ('H,Machine', 'H,NumberOfSections', 1487 | 'I,TimeDateStamp', 'I,PointerToSymbolTable', 1488 | 'I,NumberOfSymbols', 'H,SizeOfOptionalHeader', 1489 | 'H,Characteristics')) 1490 | 1491 | __IMAGE_DATA_DIRECTORY_format__ = ('IMAGE_DATA_DIRECTORY', 1492 | ('I,VirtualAddress', 'I,Size')) 1493 | 1494 | 1495 | __IMAGE_OPTIONAL_HEADER_format__ = ('IMAGE_OPTIONAL_HEADER', 1496 | ('H,Magic', 'B,MajorLinkerVersion', 1497 | 'B,MinorLinkerVersion', 'I,SizeOfCode', 1498 | 'I,SizeOfInitializedData', 'I,SizeOfUninitializedData', 1499 | 'I,AddressOfEntryPoint', 'I,BaseOfCode', 'I,BaseOfData', 1500 | 'I,ImageBase', 'I,SectionAlignment', 'I,FileAlignment', 1501 | 'H,MajorOperatingSystemVersion', 'H,MinorOperatingSystemVersion', 1502 | 'H,MajorImageVersion', 'H,MinorImageVersion', 1503 | 'H,MajorSubsystemVersion', 'H,MinorSubsystemVersion', 1504 | 'I,Reserved1', 'I,SizeOfImage', 'I,SizeOfHeaders', 1505 | 'I,CheckSum', 'H,Subsystem', 'H,DllCharacteristics', 1506 | 'I,SizeOfStackReserve', 'I,SizeOfStackCommit', 1507 | 'I,SizeOfHeapReserve', 'I,SizeOfHeapCommit', 1508 | 'I,LoaderFlags', 'I,NumberOfRvaAndSizes' )) 1509 | 1510 | 1511 | __IMAGE_OPTIONAL_HEADER64_format__ = ('IMAGE_OPTIONAL_HEADER64', 1512 | ('H,Magic', 'B,MajorLinkerVersion', 1513 | 'B,MinorLinkerVersion', 'I,SizeOfCode', 1514 | 'I,SizeOfInitializedData', 'I,SizeOfUninitializedData', 1515 | 'I,AddressOfEntryPoint', 'I,BaseOfCode', 1516 | 'Q,ImageBase', 'I,SectionAlignment', 'I,FileAlignment', 1517 | 'H,MajorOperatingSystemVersion', 'H,MinorOperatingSystemVersion', 1518 | 'H,MajorImageVersion', 'H,MinorImageVersion', 1519 | 'H,MajorSubsystemVersion', 'H,MinorSubsystemVersion', 1520 | 'I,Reserved1', 'I,SizeOfImage', 'I,SizeOfHeaders', 1521 | 'I,CheckSum', 'H,Subsystem', 'H,DllCharacteristics', 1522 | 'Q,SizeOfStackReserve', 'Q,SizeOfStackCommit', 1523 | 'Q,SizeOfHeapReserve', 'Q,SizeOfHeapCommit', 1524 | 'I,LoaderFlags', 'I,NumberOfRvaAndSizes' )) 1525 | 1526 | 1527 | __IMAGE_NT_HEADERS_format__ = ('IMAGE_NT_HEADERS', ('I,Signature',)) 1528 | 1529 | __IMAGE_SECTION_HEADER_format__ = ('IMAGE_SECTION_HEADER', 1530 | ('8s,Name', 'I,Misc,Misc_PhysicalAddress,Misc_VirtualSize', 1531 | 'I,VirtualAddress', 'I,SizeOfRawData', 'I,PointerToRawData', 1532 | 'I,PointerToRelocations', 'I,PointerToLinenumbers', 1533 | 'H,NumberOfRelocations', 'H,NumberOfLinenumbers', 1534 | 'I,Characteristics')) 1535 | 1536 | __IMAGE_DELAY_IMPORT_DESCRIPTOR_format__ = ('IMAGE_DELAY_IMPORT_DESCRIPTOR', 1537 | ('I,grAttrs', 'I,szName', 'I,phmod', 'I,pIAT', 'I,pINT', 1538 | 'I,pBoundIAT', 'I,pUnloadIAT', 'I,dwTimeStamp')) 1539 | 1540 | __IMAGE_IMPORT_DESCRIPTOR_format__ = ('IMAGE_IMPORT_DESCRIPTOR', 1541 | ('I,OriginalFirstThunk,Characteristics', 1542 | 'I,TimeDateStamp', 'I,ForwarderChain', 'I,Name', 'I,FirstThunk')) 1543 | 1544 | __IMAGE_EXPORT_DIRECTORY_format__ = ('IMAGE_EXPORT_DIRECTORY', 1545 | ('I,Characteristics', 1546 | 'I,TimeDateStamp', 'H,MajorVersion', 'H,MinorVersion', 'I,Name', 1547 | 'I,Base', 'I,NumberOfFunctions', 'I,NumberOfNames', 1548 | 'I,AddressOfFunctions', 'I,AddressOfNames', 'I,AddressOfNameOrdinals')) 1549 | 1550 | __IMAGE_RESOURCE_DIRECTORY_format__ = ('IMAGE_RESOURCE_DIRECTORY', 1551 | ('I,Characteristics', 1552 | 'I,TimeDateStamp', 'H,MajorVersion', 'H,MinorVersion', 1553 | 'H,NumberOfNamedEntries', 'H,NumberOfIdEntries')) 1554 | 1555 | __IMAGE_RESOURCE_DIRECTORY_ENTRY_format__ = ('IMAGE_RESOURCE_DIRECTORY_ENTRY', 1556 | ('I,Name', 1557 | 'I,OffsetToData')) 1558 | 1559 | __IMAGE_RESOURCE_DATA_ENTRY_format__ = ('IMAGE_RESOURCE_DATA_ENTRY', 1560 | ('I,OffsetToData', 'I,Size', 'I,CodePage', 'I,Reserved')) 1561 | 1562 | __VS_VERSIONINFO_format__ = ( 'VS_VERSIONINFO', 1563 | ('H,Length', 'H,ValueLength', 'H,Type' )) 1564 | 1565 | __VS_FIXEDFILEINFO_format__ = ( 'VS_FIXEDFILEINFO', 1566 | ('I,Signature', 'I,StrucVersion', 'I,FileVersionMS', 'I,FileVersionLS', 1567 | 'I,ProductVersionMS', 'I,ProductVersionLS', 'I,FileFlagsMask', 'I,FileFlags', 1568 | 'I,FileOS', 'I,FileType', 'I,FileSubtype', 'I,FileDateMS', 'I,FileDateLS')) 1569 | 1570 | __StringFileInfo_format__ = ( 'StringFileInfo', 1571 | ('H,Length', 'H,ValueLength', 'H,Type' )) 1572 | 1573 | __StringTable_format__ = ( 'StringTable', 1574 | ('H,Length', 'H,ValueLength', 'H,Type' )) 1575 | 1576 | __String_format__ = ( 'String', 1577 | ('H,Length', 'H,ValueLength', 'H,Type' )) 1578 | 1579 | __Var_format__ = ( 'Var', ('H,Length', 'H,ValueLength', 'H,Type' )) 1580 | 1581 | __IMAGE_THUNK_DATA_format__ = ('IMAGE_THUNK_DATA', 1582 | ('I,ForwarderString,Function,Ordinal,AddressOfData',)) 1583 | 1584 | __IMAGE_THUNK_DATA64_format__ = ('IMAGE_THUNK_DATA', 1585 | ('Q,ForwarderString,Function,Ordinal,AddressOfData',)) 1586 | 1587 | __IMAGE_DEBUG_DIRECTORY_format__ = ('IMAGE_DEBUG_DIRECTORY', 1588 | ('I,Characteristics', 'I,TimeDateStamp', 'H,MajorVersion', 1589 | 'H,MinorVersion', 'I,Type', 'I,SizeOfData', 'I,AddressOfRawData', 1590 | 'I,PointerToRawData')) 1591 | 1592 | __IMAGE_BASE_RELOCATION_format__ = ('IMAGE_BASE_RELOCATION', 1593 | ('I,VirtualAddress', 'I,SizeOfBlock') ) 1594 | 1595 | __IMAGE_BASE_RELOCATION_ENTRY_format__ = ('IMAGE_BASE_RELOCATION_ENTRY', 1596 | ('H,Data',) ) 1597 | 1598 | __IMAGE_TLS_DIRECTORY_format__ = ('IMAGE_TLS_DIRECTORY', 1599 | ('I,StartAddressOfRawData', 'I,EndAddressOfRawData', 1600 | 'I,AddressOfIndex', 'I,AddressOfCallBacks', 1601 | 'I,SizeOfZeroFill', 'I,Characteristics' ) ) 1602 | 1603 | __IMAGE_TLS_DIRECTORY64_format__ = ('IMAGE_TLS_DIRECTORY', 1604 | ('Q,StartAddressOfRawData', 'Q,EndAddressOfRawData', 1605 | 'Q,AddressOfIndex', 'Q,AddressOfCallBacks', 1606 | 'I,SizeOfZeroFill', 'I,Characteristics' ) ) 1607 | 1608 | __IMAGE_LOAD_CONFIG_DIRECTORY_format__ = ('IMAGE_LOAD_CONFIG_DIRECTORY', 1609 | ('I,Size', 'I,TimeDateStamp', 1610 | 'H,MajorVersion', 'H,MinorVersion', 1611 | 'I,GlobalFlagsClear', 'I,GlobalFlagsSet', 1612 | 'I,CriticalSectionDefaultTimeout', 1613 | 'I,DeCommitFreeBlockThreshold', 1614 | 'I,DeCommitTotalFreeThreshold', 1615 | 'I,LockPrefixTable', 1616 | 'I,MaximumAllocationSize', 1617 | 'I,VirtualMemoryThreshold', 1618 | 'I,ProcessHeapFlags', 1619 | 'I,ProcessAffinityMask', 1620 | 'H,CSDVersion', 'H,Reserved1', 1621 | 'I,EditList', 'I,SecurityCookie', 1622 | 'I,SEHandlerTable', 'I,SEHandlerCount' ) ) 1623 | 1624 | __IMAGE_LOAD_CONFIG_DIRECTORY64_format__ = ('IMAGE_LOAD_CONFIG_DIRECTORY', 1625 | ('I,Size', 'I,TimeDateStamp', 1626 | 'H,MajorVersion', 'H,MinorVersion', 1627 | 'I,GlobalFlagsClear', 'I,GlobalFlagsSet', 1628 | 'I,CriticalSectionDefaultTimeout', 1629 | 'Q,DeCommitFreeBlockThreshold', 1630 | 'Q,DeCommitTotalFreeThreshold', 1631 | 'Q,LockPrefixTable', 1632 | 'Q,MaximumAllocationSize', 1633 | 'Q,VirtualMemoryThreshold', 1634 | 'Q,ProcessAffinityMask', 1635 | 'I,ProcessHeapFlags', 1636 | 'H,CSDVersion', 'H,Reserved1', 1637 | 'Q,EditList', 'Q,SecurityCookie', 1638 | 'Q,SEHandlerTable', 'Q,SEHandlerCount' ) ) 1639 | 1640 | __IMAGE_BOUND_IMPORT_DESCRIPTOR_format__ = ('IMAGE_BOUND_IMPORT_DESCRIPTOR', 1641 | ('I,TimeDateStamp', 'H,OffsetModuleName', 'H,NumberOfModuleForwarderRefs')) 1642 | 1643 | __IMAGE_BOUND_FORWARDER_REF_format__ = ('IMAGE_BOUND_FORWARDER_REF', 1644 | ('I,TimeDateStamp', 'H,OffsetModuleName', 'H,Reserved') ) 1645 | 1646 | 1647 | def __init__(self, name=None, data=None, fast_load=None): 1648 | 1649 | self.sections = [] 1650 | 1651 | self.__warnings = [] 1652 | 1653 | self.PE_TYPE = None 1654 | 1655 | if not name and not data: 1656 | return 1657 | 1658 | # This list will keep track of all the structures created. 1659 | # That will allow for an easy iteration through the list 1660 | # in order to save the modifications made 1661 | self.__structures__ = [] 1662 | self.__from_file = None 1663 | 1664 | if not fast_load: 1665 | fast_load = globals()['fast_load'] 1666 | try: 1667 | self.__parse__(name, data, fast_load) 1668 | except: 1669 | self.close() 1670 | raise 1671 | 1672 | 1673 | def close(self): 1674 | if ( self.__from_file is True and hasattr(self, '__data__') and 1675 | ((isinstance(mmap.mmap, type) and isinstance(self.__data__, mmap.mmap)) or 1676 | 'mmap.mmap' in repr(type(self.__data__))) ): 1677 | self.__data__.close() 1678 | 1679 | 1680 | def __unpack_data__(self, format, data, file_offset): 1681 | """Apply structure format to raw data. 1682 | 1683 | Returns and unpacked structure object if successful, None otherwise. 1684 | """ 1685 | 1686 | structure = Structure(format, file_offset=file_offset) 1687 | 1688 | try: 1689 | structure.__unpack__(data) 1690 | except PEFormatError, err: 1691 | self.__warnings.append( 1692 | 'Corrupt header "%s" at file offset %d. Exception: %s' % ( 1693 | format[0], file_offset, str(err)) ) 1694 | return None 1695 | 1696 | self.__structures__.append(structure) 1697 | 1698 | return structure 1699 | 1700 | 1701 | def __parse__(self, fname, data, fast_load): 1702 | """Parse a Portable Executable file. 1703 | 1704 | Loads a PE file, parsing all its structures and making them available 1705 | through the instance's attributes. 1706 | """ 1707 | 1708 | if fname: 1709 | stat = os.stat(fname) 1710 | if stat.st_size == 0: 1711 | raise PEFormatError('The file is empty') 1712 | try: 1713 | fd = file(fname, 'rb') 1714 | self.fileno = fd.fileno() 1715 | self.__data__ = mmap.mmap(self.fileno, 0, access=mmap.ACCESS_READ) 1716 | self.__from_file = True 1717 | finally: 1718 | fd.close() 1719 | elif data: 1720 | self.__data__ = data 1721 | self.__from_file = False 1722 | 1723 | dos_header_data = self.__data__[:64] 1724 | if len(dos_header_data) != 64: 1725 | raise PEFormatError('Unable to read the DOS Header, possibly a truncated file.') 1726 | 1727 | self.DOS_HEADER = self.__unpack_data__( 1728 | self.__IMAGE_DOS_HEADER_format__, 1729 | dos_header_data, file_offset=0) 1730 | 1731 | if self.DOS_HEADER.e_magic == IMAGE_DOSZM_SIGNATURE: 1732 | raise PEFormatError('Probably a ZM Executable (not a PE file).') 1733 | if not self.DOS_HEADER or self.DOS_HEADER.e_magic != IMAGE_DOS_SIGNATURE: 1734 | raise PEFormatError('DOS Header magic not found.') 1735 | 1736 | # OC Patch: 1737 | # Check for sane value in e_lfanew 1738 | # 1739 | if self.DOS_HEADER.e_lfanew > len(self.__data__): 1740 | raise PEFormatError('Invalid e_lfanew value, probably not a PE file') 1741 | 1742 | nt_headers_offset = self.DOS_HEADER.e_lfanew 1743 | 1744 | self.NT_HEADERS = self.__unpack_data__( 1745 | self.__IMAGE_NT_HEADERS_format__, 1746 | self.__data__[nt_headers_offset:nt_headers_offset+8], 1747 | file_offset = nt_headers_offset) 1748 | 1749 | # We better check the signature right here, before the file screws 1750 | # around with sections: 1751 | # OC Patch: 1752 | # Some malware will cause the Signature value to not exist at all 1753 | if not self.NT_HEADERS or not self.NT_HEADERS.Signature: 1754 | raise PEFormatError('NT Headers not found.') 1755 | 1756 | if (0xFFFF & self.NT_HEADERS.Signature) == IMAGE_NE_SIGNATURE: 1757 | raise PEFormatError('Invalid NT Headers signature. Probably a NE file') 1758 | if (0xFFFF & self.NT_HEADERS.Signature) == IMAGE_LE_SIGNATURE: 1759 | raise PEFormatError('Invalid NT Headers signature. Probably a LE file') 1760 | if (0xFFFF & self.NT_HEADERS.Signature) == IMAGE_LX_SIGNATURE: 1761 | raise PEFormatError('Invalid NT Headers signature. Probably a LX file') 1762 | if self.NT_HEADERS.Signature != IMAGE_NT_SIGNATURE: 1763 | raise PEFormatError('Invalid NT Headers signature.') 1764 | 1765 | self.FILE_HEADER = self.__unpack_data__( 1766 | self.__IMAGE_FILE_HEADER_format__, 1767 | self.__data__[nt_headers_offset+4:nt_headers_offset+4+32], 1768 | file_offset = nt_headers_offset+4) 1769 | image_flags = retrieve_flags(IMAGE_CHARACTERISTICS, 'IMAGE_FILE_') 1770 | 1771 | if not self.FILE_HEADER: 1772 | raise PEFormatError('File Header missing') 1773 | 1774 | # Set the image's flags according the the Characteristics member 1775 | set_flags(self.FILE_HEADER, self.FILE_HEADER.Characteristics, image_flags) 1776 | 1777 | optional_header_offset = \ 1778 | nt_headers_offset+4+self.FILE_HEADER.sizeof() 1779 | 1780 | # Note: location of sections can be controlled from PE header: 1781 | sections_offset = optional_header_offset + self.FILE_HEADER.SizeOfOptionalHeader 1782 | 1783 | self.OPTIONAL_HEADER = self.__unpack_data__( 1784 | self.__IMAGE_OPTIONAL_HEADER_format__, 1785 | self.__data__[optional_header_offset:], 1786 | file_offset = optional_header_offset) 1787 | 1788 | # According to solardesigner's findings for his 1789 | # Tiny PE project, the optional header does not 1790 | # need fields beyond "Subsystem" in order to be 1791 | # loadable by the Windows loader (given that zeroes 1792 | # are acceptable values and the header is loaded 1793 | # in a zeroed memory page) 1794 | # If trying to parse a full Optional Header fails 1795 | # we try to parse it again with some 0 padding 1796 | # 1797 | MINIMUM_VALID_OPTIONAL_HEADER_RAW_SIZE = 69 1798 | 1799 | if ( self.OPTIONAL_HEADER is None and 1800 | len(self.__data__[optional_header_offset:optional_header_offset+0x200]) 1801 | >= MINIMUM_VALID_OPTIONAL_HEADER_RAW_SIZE ): 1802 | 1803 | # Add enough zeroes to make up for the unused fields 1804 | # 1805 | padding_length = 128 1806 | 1807 | # Create padding 1808 | # 1809 | padded_data = self.__data__[optional_header_offset:optional_header_offset+0x200] + ( 1810 | '\0' * padding_length) 1811 | 1812 | self.OPTIONAL_HEADER = self.__unpack_data__( 1813 | self.__IMAGE_OPTIONAL_HEADER_format__, 1814 | padded_data, 1815 | file_offset = optional_header_offset) 1816 | 1817 | 1818 | # Check the Magic in the OPTIONAL_HEADER and set the PE file 1819 | # type accordingly 1820 | # 1821 | if self.OPTIONAL_HEADER is not None: 1822 | 1823 | if self.OPTIONAL_HEADER.Magic == OPTIONAL_HEADER_MAGIC_PE: 1824 | 1825 | self.PE_TYPE = OPTIONAL_HEADER_MAGIC_PE 1826 | 1827 | elif self.OPTIONAL_HEADER.Magic == OPTIONAL_HEADER_MAGIC_PE_PLUS: 1828 | 1829 | self.PE_TYPE = OPTIONAL_HEADER_MAGIC_PE_PLUS 1830 | 1831 | self.OPTIONAL_HEADER = self.__unpack_data__( 1832 | self.__IMAGE_OPTIONAL_HEADER64_format__, 1833 | self.__data__[optional_header_offset:optional_header_offset+0x200], 1834 | file_offset = optional_header_offset) 1835 | 1836 | # Again, as explained above, we try to parse 1837 | # a reduced form of the Optional Header which 1838 | # is still valid despite not including all 1839 | # structure members 1840 | # 1841 | MINIMUM_VALID_OPTIONAL_HEADER_RAW_SIZE = 69+4 1842 | 1843 | if ( self.OPTIONAL_HEADER is None and 1844 | len(self.__data__[optional_header_offset:optional_header_offset+0x200]) 1845 | >= MINIMUM_VALID_OPTIONAL_HEADER_RAW_SIZE ): 1846 | 1847 | padding_length = 128 1848 | padded_data = self.__data__[optional_header_offset:optional_header_offset+0x200] + ( 1849 | '\0' * padding_length) 1850 | self.OPTIONAL_HEADER = self.__unpack_data__( 1851 | self.__IMAGE_OPTIONAL_HEADER64_format__, 1852 | padded_data, 1853 | file_offset = optional_header_offset) 1854 | 1855 | 1856 | if not self.FILE_HEADER: 1857 | raise PEFormatError('File Header missing') 1858 | 1859 | 1860 | # OC Patch: 1861 | # Die gracefully if there is no OPTIONAL_HEADER field 1862 | # 975440f5ad5e2e4a92c4d9a5f22f75c1 1863 | if self.PE_TYPE is None or self.OPTIONAL_HEADER is None: 1864 | raise PEFormatError("No Optional Header found, invalid PE32 or PE32+ file") 1865 | 1866 | dll_characteristics_flags = retrieve_flags(DLL_CHARACTERISTICS, 'IMAGE_DLL_CHARACTERISTICS_') 1867 | 1868 | # Set the Dll Characteristics flags according the the DllCharacteristics member 1869 | set_flags( 1870 | self.OPTIONAL_HEADER, 1871 | self.OPTIONAL_HEADER.DllCharacteristics, 1872 | dll_characteristics_flags) 1873 | 1874 | 1875 | self.OPTIONAL_HEADER.DATA_DIRECTORY = [] 1876 | #offset = (optional_header_offset + self.FILE_HEADER.SizeOfOptionalHeader) 1877 | offset = (optional_header_offset + self.OPTIONAL_HEADER.sizeof()) 1878 | 1879 | 1880 | self.NT_HEADERS.FILE_HEADER = self.FILE_HEADER 1881 | self.NT_HEADERS.OPTIONAL_HEADER = self.OPTIONAL_HEADER 1882 | 1883 | 1884 | # The NumberOfRvaAndSizes is sanitized to stay within 1885 | # reasonable limits so can be casted to an int 1886 | # 1887 | if self.OPTIONAL_HEADER.NumberOfRvaAndSizes > 0x10: 1888 | self.__warnings.append( 1889 | 'Suspicious NumberOfRvaAndSizes in the Optional Header. ' + 1890 | 'Normal values are never larger than 0x10, the value is: 0x%x' % 1891 | self.OPTIONAL_HEADER.NumberOfRvaAndSizes ) 1892 | 1893 | MAX_ASSUMED_VALID_NUMBER_OF_RVA_AND_SIZES = 0x100 1894 | for i in xrange(int(0x7fffffffL & self.OPTIONAL_HEADER.NumberOfRvaAndSizes)): 1895 | 1896 | if len(self.__data__) - offset == 0: 1897 | break 1898 | 1899 | if len(self.__data__) - offset < 8: 1900 | data = self.__data__[offset:] + '\0'*8 1901 | else: 1902 | data = self.__data__[offset:offset+MAX_ASSUMED_VALID_NUMBER_OF_RVA_AND_SIZES] 1903 | 1904 | dir_entry = self.__unpack_data__( 1905 | self.__IMAGE_DATA_DIRECTORY_format__, 1906 | data, 1907 | file_offset = offset) 1908 | 1909 | if dir_entry is None: 1910 | break 1911 | 1912 | # Would fail if missing an entry 1913 | # 1d4937b2fa4d84ad1bce0309857e70ca offending sample 1914 | try: 1915 | dir_entry.name = DIRECTORY_ENTRY[i] 1916 | except (KeyError, AttributeError): 1917 | break 1918 | 1919 | offset += dir_entry.sizeof() 1920 | 1921 | self.OPTIONAL_HEADER.DATA_DIRECTORY.append(dir_entry) 1922 | 1923 | # If the offset goes outside the optional header, 1924 | # the loop is broken, regardless of how many directories 1925 | # NumberOfRvaAndSizes says there are 1926 | # 1927 | # We assume a normally sized optional header, hence that we do 1928 | # a sizeof() instead of reading SizeOfOptionalHeader. 1929 | # Then we add a default number of directories times their size, 1930 | # if we go beyond that, we assume the number of directories 1931 | # is wrong and stop processing 1932 | if offset >= (optional_header_offset + 1933 | self.OPTIONAL_HEADER.sizeof() + 8*16) : 1934 | 1935 | break 1936 | 1937 | 1938 | offset = self.parse_sections(sections_offset) 1939 | 1940 | # OC Patch: 1941 | # There could be a problem if there are no raw data sections 1942 | # greater than 0 1943 | # fc91013eb72529da005110a3403541b6 example 1944 | # Should this throw an exception in the minimum header offset 1945 | # can't be found? 1946 | # 1947 | rawDataPointers = [ 1948 | self.adjust_FileAlignment( s.PointerToRawData, 1949 | self.OPTIONAL_HEADER.FileAlignment ) 1950 | for s in self.sections if s.PointerToRawData>0 ] 1951 | 1952 | if len(rawDataPointers) > 0: 1953 | lowest_section_offset = min(rawDataPointers) 1954 | else: 1955 | lowest_section_offset = None 1956 | 1957 | if not lowest_section_offset or lowest_section_offset < offset: 1958 | self.header = self.__data__[:offset] 1959 | else: 1960 | self.header = self.__data__[:lowest_section_offset] 1961 | 1962 | 1963 | # Check whether the entry point lies within a section 1964 | # 1965 | if self.get_section_by_rva(self.OPTIONAL_HEADER.AddressOfEntryPoint) is not None: 1966 | 1967 | # Check whether the entry point lies within the file 1968 | # 1969 | ep_offset = self.get_offset_from_rva(self.OPTIONAL_HEADER.AddressOfEntryPoint) 1970 | if ep_offset > len(self.__data__): 1971 | 1972 | self.__warnings.append( 1973 | 'Possibly corrupt file. AddressOfEntryPoint lies outside the file. ' + 1974 | 'AddressOfEntryPoint: 0x%x' % 1975 | self.OPTIONAL_HEADER.AddressOfEntryPoint ) 1976 | 1977 | else: 1978 | 1979 | self.__warnings.append( 1980 | 'AddressOfEntryPoint lies outside the sections\' boundaries. ' + 1981 | 'AddressOfEntryPoint: 0x%x' % 1982 | self.OPTIONAL_HEADER.AddressOfEntryPoint ) 1983 | 1984 | 1985 | if not fast_load: 1986 | self.parse_data_directories() 1987 | 1988 | class RichHeader: 1989 | pass 1990 | rich_header = self.parse_rich_header() 1991 | if rich_header: 1992 | self.RICH_HEADER = RichHeader() 1993 | self.RICH_HEADER.checksum = rich_header.get('checksum', None) 1994 | self.RICH_HEADER.values = rich_header.get('values', None) 1995 | else: 1996 | self.RICH_HEADER = None 1997 | 1998 | 1999 | def parse_rich_header(self): 2000 | """Parses the rich header 2001 | see http://www.ntcore.com/files/richsign.htm for more information 2002 | 2003 | Structure: 2004 | 00 DanS ^ checksum, checksum, checksum, checksum 2005 | 10 Symbol RVA ^ checksum, Symbol size ^ checksum... 2006 | ... 2007 | XX Rich, checksum, 0, 0,... 2008 | """ 2009 | 2010 | # Rich Header constants 2011 | # 2012 | DANS = 0x536E6144 # 'DanS' as dword 2013 | RICH = 0x68636952 # 'Rich' as dword 2014 | 2015 | # Read a block of data 2016 | # 2017 | try: 2018 | data = list(struct.unpack("<32I", self.get_data(0x80, 0x80))) 2019 | except: 2020 | # In the cases where there's not enough data to contain the Rich header 2021 | # we abort its parsing 2022 | return None 2023 | 2024 | # the checksum should be present 3 times after the DanS signature 2025 | # 2026 | checksum = data[1] 2027 | if (data[0] ^ checksum != DANS 2028 | or data[2] != checksum 2029 | or data[3] != checksum): 2030 | return None 2031 | 2032 | result = {"checksum": checksum} 2033 | headervalues = [] 2034 | result ["values"] = headervalues 2035 | 2036 | data = data[4:] 2037 | for i in xrange(len(data) / 2): 2038 | 2039 | # Stop until the Rich footer signature is found 2040 | # 2041 | if data[2 * i] == RICH: 2042 | 2043 | # it should be followed by the checksum 2044 | # 2045 | if data[2 * i + 1] != checksum: 2046 | self.__warnings.append('Rich Header corrupted') 2047 | break 2048 | 2049 | # header values come by pairs 2050 | # 2051 | headervalues += [data[2 * i] ^ checksum, data[2 * i + 1] ^ checksum] 2052 | return result 2053 | 2054 | 2055 | def get_warnings(self): 2056 | """Return the list of warnings. 2057 | 2058 | Non-critical problems found when parsing the PE file are 2059 | appended to a list of warnings. This method returns the 2060 | full list. 2061 | """ 2062 | 2063 | return self.__warnings 2064 | 2065 | 2066 | def show_warnings(self): 2067 | """Print the list of warnings. 2068 | 2069 | Non-critical problems found when parsing the PE file are 2070 | appended to a list of warnings. This method prints the 2071 | full list to standard output. 2072 | """ 2073 | 2074 | for warning in self.__warnings: 2075 | print '>', warning 2076 | 2077 | 2078 | def full_load(self): 2079 | """Process the data directories. 2080 | 2081 | This method will load the data directories which might not have 2082 | been loaded if the "fast_load" option was used. 2083 | """ 2084 | 2085 | self.parse_data_directories() 2086 | 2087 | 2088 | def write(self, filename=None): 2089 | """Write the PE file. 2090 | 2091 | This function will process all headers and components 2092 | of the PE file and include all changes made (by just 2093 | assigning to attributes in the PE objects) and write 2094 | the changes back to a file whose name is provided as 2095 | an argument. The filename is optional, if not 2096 | provided the data will be returned as a 'str' object. 2097 | """ 2098 | 2099 | file_data = list(self.__data__) 2100 | for structure in self.__structures__: 2101 | 2102 | struct_data = list(structure.__pack__()) 2103 | offset = structure.get_file_offset() 2104 | 2105 | file_data[offset:offset+len(struct_data)] = struct_data 2106 | 2107 | if hasattr(self, 'VS_VERSIONINFO'): 2108 | if hasattr(self, 'FileInfo'): 2109 | for entry in self.FileInfo: 2110 | if hasattr(entry, 'StringTable'): 2111 | for st_entry in entry.StringTable: 2112 | for key, entry in st_entry.entries.items(): 2113 | 2114 | offsets = st_entry.entries_offsets[key] 2115 | lengths = st_entry.entries_lengths[key] 2116 | 2117 | if len( entry ) > lengths[1]: 2118 | 2119 | l = list() 2120 | for idx, c in enumerate(entry): 2121 | if ord(c) > 256: 2122 | l.extend( [ chr(ord(c) & 0xff), chr( (ord(c) & 0xff00) >>8) ] ) 2123 | else: 2124 | l.extend( [chr( ord(c) ), '\0'] ) 2125 | 2126 | file_data[ 2127 | offsets[1] : offsets[1] + lengths[1]*2 ] = l 2128 | 2129 | else: 2130 | 2131 | l = list() 2132 | for idx, c in enumerate(entry): 2133 | if ord(c) > 256: 2134 | l.extend( [ chr(ord(c) & 0xff), chr( (ord(c) & 0xff00) >>8) ] ) 2135 | else: 2136 | l.extend( [chr( ord(c) ), '\0'] ) 2137 | 2138 | file_data[ 2139 | offsets[1] : offsets[1] + len(entry)*2 ] = l 2140 | 2141 | remainder = lengths[1] - len(entry) 2142 | file_data[ 2143 | offsets[1] + len(entry)*2 : 2144 | offsets[1] + lengths[1]*2 ] = [ 2145 | u'\0' ] * remainder*2 2146 | 2147 | new_file_data = ''.join( [ chr(ord(c)) for c in file_data] ) 2148 | 2149 | if filename: 2150 | f = file(filename, 'wb+') 2151 | f.write(new_file_data) 2152 | f.close() 2153 | else: 2154 | return new_file_data 2155 | 2156 | 2157 | def parse_sections(self, offset): 2158 | """Fetch the PE file sections. 2159 | 2160 | The sections will be readily available in the "sections" attribute. 2161 | Its attributes will contain all the section information plus "data" 2162 | a buffer containing the section's data. 2163 | 2164 | The "Characteristics" member will be processed and attributes 2165 | representing the section characteristics (with the 'IMAGE_SCN_' 2166 | string trimmed from the constant's names) will be added to the 2167 | section instance. 2168 | 2169 | Refer to the SectionStructure class for additional info. 2170 | """ 2171 | 2172 | self.sections = [] 2173 | 2174 | for i in xrange(self.FILE_HEADER.NumberOfSections): 2175 | section = SectionStructure( self.__IMAGE_SECTION_HEADER_format__, pe=self ) 2176 | if not section: 2177 | break 2178 | section_offset = offset + section.sizeof() * i 2179 | section.set_file_offset(section_offset) 2180 | section.__unpack__(self.__data__[section_offset : section_offset + section.sizeof()]) 2181 | self.__structures__.append(section) 2182 | 2183 | if section.SizeOfRawData > len(self.__data__): 2184 | self.__warnings.append( 2185 | ('Error parsing section %d. ' % i) + 2186 | 'SizeOfRawData is larger than file.') 2187 | 2188 | if self.adjust_FileAlignment( section.PointerToRawData, 2189 | self.OPTIONAL_HEADER.FileAlignment ) > len(self.__data__): 2190 | 2191 | self.__warnings.append( 2192 | ('Error parsing section %d. ' % i) + 2193 | 'PointerToRawData points beyond the end of the file.') 2194 | 2195 | if section.Misc_VirtualSize > 0x10000000: 2196 | self.__warnings.append( 2197 | ('Suspicious value found parsing section %d. ' % i) + 2198 | 'VirtualSize is extremely large > 256MiB.') 2199 | 2200 | if self.adjust_SectionAlignment( section.VirtualAddress, 2201 | self.OPTIONAL_HEADER.SectionAlignment, self.OPTIONAL_HEADER.FileAlignment ) > 0x10000000: 2202 | self.__warnings.append( 2203 | ('Suspicious value found parsing section %d. ' % i) + 2204 | 'VirtualAddress is beyond 0x10000000.') 2205 | 2206 | # 2207 | # Some packer used a non-aligned PointerToRawData in the sections, 2208 | # which causes several common tools not to load the section data 2209 | # properly as they blindly read from the indicated offset. 2210 | # It seems that Windows will round the offset down to the largest 2211 | # offset multiple of FileAlignment which is smaller than 2212 | # PointerToRawData. The following code will do the same. 2213 | # 2214 | 2215 | #alignment = self.OPTIONAL_HEADER.FileAlignment 2216 | #self.update_section_data(section) 2217 | 2218 | if ( self.OPTIONAL_HEADER.FileAlignment != 0 and 2219 | ( section.PointerToRawData % self.OPTIONAL_HEADER.FileAlignment) != 0): 2220 | self.__warnings.append( 2221 | ('Error parsing section %d. ' % i) + 2222 | 'PointerToRawData should normally be ' + 2223 | 'a multiple of FileAlignment, this might imply the file ' + 2224 | 'is trying to confuse tools which parse this incorrectly') 2225 | 2226 | 2227 | section_flags = retrieve_flags(SECTION_CHARACTERISTICS, 'IMAGE_SCN_') 2228 | 2229 | # Set the section's flags according the the Characteristics member 2230 | set_flags(section, section.Characteristics, section_flags) 2231 | 2232 | if ( section.__dict__.get('IMAGE_SCN_MEM_WRITE', False) and 2233 | section.__dict__.get('IMAGE_SCN_MEM_EXECUTE', False) ): 2234 | 2235 | if section.Name == 'PAGE' and self.is_driver(): 2236 | # Drivers can have a PAGE section with those flags set without 2237 | # implying that it is malicious 2238 | pass 2239 | else: 2240 | self.__warnings.append( 2241 | ('Suspicious flags set for section %d. ' % i) + 2242 | 'Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. ' + 2243 | 'This might indicate a packed executable.') 2244 | 2245 | self.sections.append(section) 2246 | 2247 | if self.FILE_HEADER.NumberOfSections > 0 and self.sections: 2248 | return offset + self.sections[0].sizeof()*self.FILE_HEADER.NumberOfSections 2249 | else: 2250 | return offset 2251 | 2252 | 2253 | 2254 | def parse_data_directories(self, directories=None): 2255 | """Parse and process the PE file's data directories. 2256 | 2257 | If the optional argument 'directories' is given, only 2258 | the directories at the specified indices will be parsed. 2259 | Such functionality allows parsing of areas of interest 2260 | without the burden of having to parse all others. 2261 | The directories can then be specified as: 2262 | 2263 | For export/import only: 2264 | 2265 | directories = [ 0, 1 ] 2266 | 2267 | or (more verbosely): 2268 | 2269 | directories = [ DIRECTORY_ENTRY['IMAGE_DIRECTORY_ENTRY_IMPORT'], 2270 | DIRECTORY_ENTRY['IMAGE_DIRECTORY_ENTRY_EXPORT'] ] 2271 | 2272 | If 'directories' is a list, the ones that are processed will be removed, 2273 | leaving only the ones that are not present in the image. 2274 | """ 2275 | 2276 | directory_parsing = ( 2277 | ('IMAGE_DIRECTORY_ENTRY_IMPORT', self.parse_import_directory), 2278 | ('IMAGE_DIRECTORY_ENTRY_EXPORT', self.parse_export_directory), 2279 | ('IMAGE_DIRECTORY_ENTRY_RESOURCE', self.parse_resources_directory), 2280 | ('IMAGE_DIRECTORY_ENTRY_DEBUG', self.parse_debug_directory), 2281 | ('IMAGE_DIRECTORY_ENTRY_BASERELOC', self.parse_relocations_directory), 2282 | ('IMAGE_DIRECTORY_ENTRY_TLS', self.parse_directory_tls), 2283 | ('IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG', self.parse_directory_load_config), 2284 | ('IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT', self.parse_delay_import_directory), 2285 | ('IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT', self.parse_directory_bound_imports) ) 2286 | 2287 | if directories is not None: 2288 | if not isinstance(directories, (tuple, list)): 2289 | directories = [directories] 2290 | 2291 | for entry in directory_parsing: 2292 | # OC Patch: 2293 | # 2294 | try: 2295 | directory_index = DIRECTORY_ENTRY[entry[0]] 2296 | dir_entry = self.OPTIONAL_HEADER.DATA_DIRECTORY[directory_index] 2297 | except IndexError: 2298 | break 2299 | 2300 | # Only process all the directories if no individual ones have 2301 | # been chosen 2302 | # 2303 | if directories is None or directory_index in directories: 2304 | 2305 | if dir_entry.VirtualAddress: 2306 | value = entry[1](dir_entry.VirtualAddress, dir_entry.Size) 2307 | if value: 2308 | setattr(self, entry[0][6:], value) 2309 | 2310 | if (directories is not None) and isinstance(directories, list) and (entry[0] in directories): 2311 | directories.remove(directory_index) 2312 | 2313 | 2314 | 2315 | def parse_directory_bound_imports(self, rva, size): 2316 | """""" 2317 | 2318 | bnd_descr = Structure(self.__IMAGE_BOUND_IMPORT_DESCRIPTOR_format__) 2319 | bnd_descr_size = bnd_descr.sizeof() 2320 | start = rva 2321 | 2322 | bound_imports = [] 2323 | while True: 2324 | 2325 | bnd_descr = self.__unpack_data__( 2326 | self.__IMAGE_BOUND_IMPORT_DESCRIPTOR_format__, 2327 | self.__data__[rva:rva+bnd_descr_size], 2328 | file_offset = rva) 2329 | if bnd_descr is None: 2330 | # If can't parse directory then silently return. 2331 | # This directory does not necessarily have to be valid to 2332 | # still have a valid PE file 2333 | 2334 | self.__warnings.append( 2335 | 'The Bound Imports directory exists but can\'t be parsed.') 2336 | 2337 | return 2338 | 2339 | if bnd_descr.all_zeroes(): 2340 | break 2341 | 2342 | rva += bnd_descr.sizeof() 2343 | 2344 | forwarder_refs = [] 2345 | for idx in xrange(bnd_descr.NumberOfModuleForwarderRefs): 2346 | # Both structures IMAGE_BOUND_IMPORT_DESCRIPTOR and 2347 | # IMAGE_BOUND_FORWARDER_REF have the same size. 2348 | bnd_frwd_ref = self.__unpack_data__( 2349 | self.__IMAGE_BOUND_FORWARDER_REF_format__, 2350 | self.__data__[rva:rva+bnd_descr_size], 2351 | file_offset = rva) 2352 | # OC Patch: 2353 | if not bnd_frwd_ref: 2354 | raise PEFormatError( 2355 | "IMAGE_BOUND_FORWARDER_REF cannot be read") 2356 | rva += bnd_frwd_ref.sizeof() 2357 | 2358 | offset = start+bnd_frwd_ref.OffsetModuleName 2359 | name_str = self.get_string_from_data( 2360 | 0, self.__data__[offset : offset + MAX_STRING_LENGTH]) 2361 | 2362 | if not name_str: 2363 | break 2364 | forwarder_refs.append(BoundImportRefData( 2365 | struct = bnd_frwd_ref, 2366 | name = name_str)) 2367 | 2368 | offset = start+bnd_descr.OffsetModuleName 2369 | name_str = self.get_string_from_data( 2370 | 0, self.__data__[offset : offset + MAX_STRING_LENGTH]) 2371 | 2372 | if not name_str: 2373 | break 2374 | bound_imports.append( 2375 | BoundImportDescData( 2376 | struct = bnd_descr, 2377 | name = name_str, 2378 | entries = forwarder_refs)) 2379 | 2380 | return bound_imports 2381 | 2382 | 2383 | def parse_directory_tls(self, rva, size): 2384 | """""" 2385 | 2386 | if self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE: 2387 | format = self.__IMAGE_TLS_DIRECTORY_format__ 2388 | 2389 | elif self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS: 2390 | format = self.__IMAGE_TLS_DIRECTORY64_format__ 2391 | 2392 | try: 2393 | tls_struct = self.__unpack_data__( 2394 | format, 2395 | self.get_data( rva, Structure(format).sizeof() ), 2396 | file_offset = self.get_offset_from_rva(rva)) 2397 | except PEFormatError: 2398 | self.__warnings.append( 2399 | 'Invalid TLS information. Can\'t read ' + 2400 | 'data at RVA: 0x%x' % rva) 2401 | tls_struct = None 2402 | 2403 | if not tls_struct: 2404 | return None 2405 | 2406 | return TlsData( struct = tls_struct ) 2407 | 2408 | 2409 | def parse_directory_load_config(self, rva, size): 2410 | """""" 2411 | 2412 | if self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE: 2413 | format = self.__IMAGE_LOAD_CONFIG_DIRECTORY_format__ 2414 | 2415 | elif self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS: 2416 | format = self.__IMAGE_LOAD_CONFIG_DIRECTORY64_format__ 2417 | 2418 | try: 2419 | load_config_struct = self.__unpack_data__( 2420 | format, 2421 | self.get_data( rva, Structure(format).sizeof() ), 2422 | file_offset = self.get_offset_from_rva(rva)) 2423 | except PEFormatError: 2424 | self.__warnings.append( 2425 | 'Invalid LOAD_CONFIG information. Can\'t read ' + 2426 | 'data at RVA: 0x%x' % rva) 2427 | load_config_struct = None 2428 | 2429 | if not load_config_struct: 2430 | return None 2431 | 2432 | return LoadConfigData( struct = load_config_struct ) 2433 | 2434 | 2435 | def parse_relocations_directory(self, rva, size): 2436 | """""" 2437 | 2438 | rlc_size = Structure(self.__IMAGE_BASE_RELOCATION_format__).sizeof() 2439 | end = rva+size 2440 | 2441 | relocations = [] 2442 | while rva < end: 2443 | 2444 | # OC Patch: 2445 | # Malware that has bad RVA entries will cause an error. 2446 | # Just continue on after an exception 2447 | # 2448 | try: 2449 | rlc = self.__unpack_data__( 2450 | self.__IMAGE_BASE_RELOCATION_format__, 2451 | self.get_data(rva, rlc_size), 2452 | file_offset = self.get_offset_from_rva(rva) ) 2453 | except PEFormatError: 2454 | self.__warnings.append( 2455 | 'Invalid relocation information. Can\'t read ' + 2456 | 'data at RVA: 0x%x' % rva) 2457 | rlc = None 2458 | 2459 | if not rlc: 2460 | break 2461 | 2462 | # rlc.VirtualAddress must lie within the Image 2463 | if rlc.VirtualAddress > self.OPTIONAL_HEADER.SizeOfImage: 2464 | self.__warnings.append( 2465 | 'Invalid relocation information. VirtualAddress outside' + 2466 | ' of Image: 0x%x' % rlc.VirtualAddress) 2467 | break 2468 | 2469 | # rlc.SizeOfBlock must be less or equal than the size of the image 2470 | # (It's a rather loose sanity test) 2471 | if rlc.SizeOfBlock > self.OPTIONAL_HEADER.SizeOfImage: 2472 | self.__warnings.append( 2473 | 'Invalid relocation information. SizeOfBlock too large' + 2474 | ': %d' % rlc.SizeOfBlock) 2475 | break 2476 | 2477 | reloc_entries = self.parse_relocations( 2478 | rva+rlc_size, rlc.VirtualAddress, rlc.SizeOfBlock-rlc_size ) 2479 | 2480 | relocations.append( 2481 | BaseRelocationData( 2482 | struct = rlc, 2483 | entries = reloc_entries)) 2484 | 2485 | if not rlc.SizeOfBlock: 2486 | break 2487 | rva += rlc.SizeOfBlock 2488 | 2489 | return relocations 2490 | 2491 | 2492 | def parse_relocations(self, data_rva, rva, size): 2493 | """""" 2494 | 2495 | data = self.get_data(data_rva, size) 2496 | file_offset = self.get_offset_from_rva(data_rva) 2497 | 2498 | entries = [] 2499 | for idx in xrange( len(data) / 2 ): 2500 | 2501 | entry = self.__unpack_data__( 2502 | self.__IMAGE_BASE_RELOCATION_ENTRY_format__, 2503 | data[idx*2:(idx+1)*2], 2504 | file_offset = file_offset ) 2505 | 2506 | if not entry: 2507 | break 2508 | word = entry.Data 2509 | 2510 | reloc_type = (word>>12) 2511 | reloc_offset = (word & 0x0fff) 2512 | entries.append( 2513 | RelocationData( 2514 | struct = entry, 2515 | type = reloc_type, 2516 | base_rva = rva, 2517 | rva = reloc_offset+rva)) 2518 | file_offset += entry.sizeof() 2519 | 2520 | return entries 2521 | 2522 | 2523 | def parse_debug_directory(self, rva, size): 2524 | """""" 2525 | 2526 | dbg_size = Structure(self.__IMAGE_DEBUG_DIRECTORY_format__).sizeof() 2527 | 2528 | debug = [] 2529 | for idx in xrange(size/dbg_size): 2530 | try: 2531 | data = self.get_data(rva+dbg_size*idx, dbg_size) 2532 | except PEFormatError, e: 2533 | self.__warnings.append( 2534 | 'Invalid debug information. Can\'t read ' + 2535 | 'data at RVA: 0x%x' % rva) 2536 | return None 2537 | 2538 | dbg = self.__unpack_data__( 2539 | self.__IMAGE_DEBUG_DIRECTORY_format__, 2540 | data, file_offset = self.get_offset_from_rva(rva+dbg_size*idx)) 2541 | 2542 | if not dbg: 2543 | return None 2544 | 2545 | debug.append( 2546 | DebugData( 2547 | struct = dbg)) 2548 | 2549 | return debug 2550 | 2551 | 2552 | def parse_resources_directory(self, rva, size=0, base_rva = None, level = 0, dirs=None): 2553 | """Parse the resources directory. 2554 | 2555 | Given the RVA of the resources directory, it will process all 2556 | its entries. 2557 | 2558 | The root will have the corresponding member of its structure, 2559 | IMAGE_RESOURCE_DIRECTORY plus 'entries', a list of all the 2560 | entries in the directory. 2561 | 2562 | Those entries will have, correspondingly, all the structure's 2563 | members (IMAGE_RESOURCE_DIRECTORY_ENTRY) and an additional one, 2564 | "directory", pointing to the IMAGE_RESOURCE_DIRECTORY structure 2565 | representing upper layers of the tree. This one will also have 2566 | an 'entries' attribute, pointing to the 3rd, and last, level. 2567 | Another directory with more entries. Those last entries will 2568 | have a new attribute (both 'leaf' or 'data_entry' can be used to 2569 | access it). This structure finally points to the resource data. 2570 | All the members of this structure, IMAGE_RESOURCE_DATA_ENTRY, 2571 | are available as its attributes. 2572 | """ 2573 | 2574 | # OC Patch: 2575 | if dirs is None: 2576 | dirs = [rva] 2577 | 2578 | if base_rva is None: 2579 | base_rva = rva 2580 | 2581 | resources_section = self.get_section_by_rva(rva) 2582 | 2583 | try: 2584 | # If the RVA is invalid all would blow up. Some EXEs seem to be 2585 | # specially nasty and have an invalid RVA. 2586 | data = self.get_data(rva, Structure(self.__IMAGE_RESOURCE_DIRECTORY_format__).sizeof() ) 2587 | except PEFormatError, e: 2588 | self.__warnings.append( 2589 | 'Invalid resources directory. Can\'t read ' + 2590 | 'directory data at RVA: 0x%x' % rva) 2591 | return None 2592 | 2593 | # Get the resource directory structure, that is, the header 2594 | # of the table preceding the actual entries 2595 | # 2596 | resource_dir = self.__unpack_data__( 2597 | self.__IMAGE_RESOURCE_DIRECTORY_format__, data, 2598 | file_offset = self.get_offset_from_rva(rva) ) 2599 | if resource_dir is None: 2600 | # If can't parse resources directory then silently return. 2601 | # This directory does not necessarily have to be valid to 2602 | # still have a valid PE file 2603 | self.__warnings.append( 2604 | 'Invalid resources directory. Can\'t parse ' + 2605 | 'directory data at RVA: 0x%x' % rva) 2606 | return None 2607 | 2608 | dir_entries = [] 2609 | 2610 | # Advance the RVA to the positon immediately following the directory 2611 | # table header and pointing to the first entry in the table 2612 | # 2613 | rva += resource_dir.sizeof() 2614 | 2615 | number_of_entries = ( 2616 | resource_dir.NumberOfNamedEntries + 2617 | resource_dir.NumberOfIdEntries ) 2618 | 2619 | # Set a hard limit on the maximum resonable number of entries 2620 | MAX_ALLOWED_ENTRIES = 4096 2621 | if number_of_entries > MAX_ALLOWED_ENTRIES: 2622 | self.__warnings.append( 2623 | 'Error parsing the resources directory, ' 2624 | 'The directory contains %d entries (>%s)' % 2625 | (number_of_entries, MAX_ALLOWED_ENTRIES) ) 2626 | return None 2627 | 2628 | strings_to_postprocess = list() 2629 | 2630 | for idx in xrange(number_of_entries): 2631 | 2632 | 2633 | res = self.parse_resource_entry(rva) 2634 | if res is None: 2635 | self.__warnings.append( 2636 | 'Error parsing the resources directory, ' 2637 | 'Entry %d is invalid, RVA = 0x%x. ' % 2638 | (idx, rva) ) 2639 | break 2640 | 2641 | entry_name = None 2642 | entry_id = None 2643 | 2644 | # If all named entries have been processed, only Id ones 2645 | # remain 2646 | 2647 | if idx >= resource_dir.NumberOfNamedEntries: 2648 | entry_id = res.Name 2649 | else: 2650 | ustr_offset = base_rva+res.NameOffset 2651 | try: 2652 | #entry_name = self.get_string_u_at_rva(ustr_offset, max_length=16) 2653 | entry_name = UnicodeStringWrapperPostProcessor(self, ustr_offset) 2654 | strings_to_postprocess.append(entry_name) 2655 | 2656 | except PEFormatError, excp: 2657 | self.__warnings.append( 2658 | 'Error parsing the resources directory, ' 2659 | 'attempting to read entry name. ' 2660 | 'Can\'t read unicode string at offset 0x%x' % 2661 | (ustr_offset) ) 2662 | 2663 | 2664 | if res.DataIsDirectory: 2665 | # OC Patch: 2666 | # 2667 | # One trick malware can do is to recursively reference 2668 | # the next directory. This causes hilarity to ensue when 2669 | # trying to parse everything correctly. 2670 | # If the original RVA given to this function is equal to 2671 | # the next one to parse, we assume that it's a trick. 2672 | # Instead of raising a PEFormatError this would skip some 2673 | # reasonable data so we just break. 2674 | # 2675 | # 9ee4d0a0caf095314fd7041a3e4404dc is the offending sample 2676 | if (base_rva + res.OffsetToDirectory) in dirs: 2677 | 2678 | break 2679 | 2680 | else: 2681 | entry_directory = self.parse_resources_directory( 2682 | base_rva+res.OffsetToDirectory, 2683 | size-(rva-base_rva), # size 2684 | base_rva=base_rva, level = level+1, 2685 | dirs=dirs + [base_rva + res.OffsetToDirectory]) 2686 | 2687 | if not entry_directory: 2688 | break 2689 | 2690 | # Ange Albertini's code to process resources' strings 2691 | # 2692 | strings = None 2693 | if entry_id == RESOURCE_TYPE['RT_STRING']: 2694 | strings = dict() 2695 | for resource_id in entry_directory.entries: 2696 | if hasattr(resource_id, 'directory'): 2697 | 2698 | resource_strings = dict() 2699 | 2700 | for resource_lang in resource_id.directory.entries: 2701 | 2702 | 2703 | if (resource_lang is None or not hasattr(resource_lang, 'data') or 2704 | resource_lang.data.struct.Size is None or resource_id.id is None): 2705 | continue 2706 | 2707 | string_entry_rva = resource_lang.data.struct.OffsetToData 2708 | string_entry_size = resource_lang.data.struct.Size 2709 | string_entry_id = resource_id.id 2710 | 2711 | string_entry_data = self.get_data(string_entry_rva, string_entry_size) 2712 | parse_strings( string_entry_data, (int(string_entry_id) - 1) * 16, resource_strings ) 2713 | strings.update(resource_strings) 2714 | 2715 | resource_id.directory.strings = resource_strings 2716 | 2717 | dir_entries.append( 2718 | ResourceDirEntryData( 2719 | struct = res, 2720 | name = entry_name, 2721 | id = entry_id, 2722 | directory = entry_directory)) 2723 | 2724 | else: 2725 | struct = self.parse_resource_data_entry( 2726 | base_rva + res.OffsetToDirectory) 2727 | 2728 | if struct: 2729 | entry_data = ResourceDataEntryData( 2730 | struct = struct, 2731 | lang = res.Name & 0x3ff, 2732 | sublang = res.Name >> 10 ) 2733 | 2734 | dir_entries.append( 2735 | ResourceDirEntryData( 2736 | struct = res, 2737 | name = entry_name, 2738 | id = entry_id, 2739 | data = entry_data)) 2740 | 2741 | else: 2742 | break 2743 | 2744 | 2745 | 2746 | # Check if this entry contains version information 2747 | # 2748 | if level == 0 and res.Id == RESOURCE_TYPE['RT_VERSION']: 2749 | if len(dir_entries)>0: 2750 | last_entry = dir_entries[-1] 2751 | 2752 | rt_version_struct = None 2753 | try: 2754 | rt_version_struct = last_entry.directory.entries[0].directory.entries[0].data.struct 2755 | except: 2756 | # Maybe a malformed directory structure...? 2757 | # Lets ignore it 2758 | pass 2759 | 2760 | if rt_version_struct is not None: 2761 | self.parse_version_information(rt_version_struct) 2762 | 2763 | rva += res.sizeof() 2764 | 2765 | 2766 | string_rvas = [s.get_rva() for s in strings_to_postprocess] 2767 | string_rvas.sort() 2768 | 2769 | for idx, s in enumerate(strings_to_postprocess): 2770 | s.render_pascal_16() 2771 | 2772 | 2773 | resource_directory_data = ResourceDirData( 2774 | struct = resource_dir, 2775 | entries = dir_entries) 2776 | 2777 | return resource_directory_data 2778 | 2779 | 2780 | def parse_resource_data_entry(self, rva): 2781 | """Parse a data entry from the resources directory.""" 2782 | 2783 | try: 2784 | # If the RVA is invalid all would blow up. Some EXEs seem to be 2785 | # specially nasty and have an invalid RVA. 2786 | data = self.get_data(rva, Structure(self.__IMAGE_RESOURCE_DATA_ENTRY_format__).sizeof() ) 2787 | except PEFormatError, excp: 2788 | self.__warnings.append( 2789 | 'Error parsing a resource directory data entry, ' + 2790 | 'the RVA is invalid: 0x%x' % ( rva ) ) 2791 | return None 2792 | 2793 | data_entry = self.__unpack_data__( 2794 | self.__IMAGE_RESOURCE_DATA_ENTRY_format__, data, 2795 | file_offset = self.get_offset_from_rva(rva) ) 2796 | 2797 | return data_entry 2798 | 2799 | 2800 | def parse_resource_entry(self, rva): 2801 | """Parse a directory entry from the resources directory.""" 2802 | 2803 | try: 2804 | data = self.get_data( rva, Structure(self.__IMAGE_RESOURCE_DIRECTORY_ENTRY_format__).sizeof() ) 2805 | except PEFormatError, excp: 2806 | # A warning will be added by the caller if this method returns None 2807 | return None 2808 | 2809 | resource = self.__unpack_data__( 2810 | self.__IMAGE_RESOURCE_DIRECTORY_ENTRY_format__, data, 2811 | file_offset = self.get_offset_from_rva(rva) ) 2812 | 2813 | if resource is None: 2814 | return None 2815 | 2816 | #resource.NameIsString = (resource.Name & 0x80000000L) >> 31 2817 | resource.NameOffset = resource.Name & 0x7FFFFFFFL 2818 | 2819 | resource.__pad = resource.Name & 0xFFFF0000L 2820 | resource.Id = resource.Name & 0x0000FFFFL 2821 | 2822 | resource.DataIsDirectory = (resource.OffsetToData & 0x80000000L) >> 31 2823 | resource.OffsetToDirectory = resource.OffsetToData & 0x7FFFFFFFL 2824 | 2825 | return resource 2826 | 2827 | 2828 | def parse_version_information(self, version_struct): 2829 | """Parse version information structure. 2830 | 2831 | The date will be made available in three attributes of the PE object. 2832 | 2833 | VS_VERSIONINFO will contain the first three fields of the main structure: 2834 | 'Length', 'ValueLength', and 'Type' 2835 | 2836 | VS_FIXEDFILEINFO will hold the rest of the fields, accessible as sub-attributes: 2837 | 'Signature', 'StrucVersion', 'FileVersionMS', 'FileVersionLS', 2838 | 'ProductVersionMS', 'ProductVersionLS', 'FileFlagsMask', 'FileFlags', 2839 | 'FileOS', 'FileType', 'FileSubtype', 'FileDateMS', 'FileDateLS' 2840 | 2841 | FileInfo is a list of all StringFileInfo and VarFileInfo structures. 2842 | 2843 | StringFileInfo structures will have a list as an attribute named 'StringTable' 2844 | containing all the StringTable structures. Each of those structures contains a 2845 | dictionary 'entries' with all the key/value version information string pairs. 2846 | 2847 | VarFileInfo structures will have a list as an attribute named 'Var' containing 2848 | all Var structures. Each Var structure will have a dictionary as an attribute 2849 | named 'entry' which will contain the name and value of the Var. 2850 | """ 2851 | 2852 | 2853 | # Retrieve the data for the version info resource 2854 | # 2855 | start_offset = self.get_offset_from_rva( version_struct.OffsetToData ) 2856 | raw_data = self.__data__[ start_offset : start_offset+version_struct.Size ] 2857 | 2858 | 2859 | # Map the main structure and the subsequent string 2860 | # 2861 | versioninfo_struct = self.__unpack_data__( 2862 | self.__VS_VERSIONINFO_format__, raw_data, 2863 | file_offset = start_offset ) 2864 | 2865 | if versioninfo_struct is None: 2866 | return 2867 | 2868 | ustr_offset = version_struct.OffsetToData + versioninfo_struct.sizeof() 2869 | try: 2870 | versioninfo_string = self.get_string_u_at_rva( ustr_offset ) 2871 | except PEFormatError, excp: 2872 | self.__warnings.append( 2873 | 'Error parsing the version information, ' + 2874 | 'attempting to read VS_VERSION_INFO string. Can\'t ' + 2875 | 'read unicode string at offset 0x%x' % ( 2876 | ustr_offset ) ) 2877 | 2878 | versioninfo_string = None 2879 | 2880 | # If the structure does not contain the expected name, it's assumed to be invalid 2881 | # 2882 | if versioninfo_string != u'VS_VERSION_INFO': 2883 | 2884 | self.__warnings.append('Invalid VS_VERSION_INFO block') 2885 | return 2886 | 2887 | 2888 | # Set the PE object's VS_VERSIONINFO to this one 2889 | # 2890 | self.VS_VERSIONINFO = versioninfo_struct 2891 | 2892 | # The the Key attribute to point to the unicode string identifying the structure 2893 | # 2894 | self.VS_VERSIONINFO.Key = versioninfo_string 2895 | 2896 | 2897 | # Process the fixed version information, get the offset and structure 2898 | # 2899 | fixedfileinfo_offset = self.dword_align( 2900 | versioninfo_struct.sizeof() + 2 * (len(versioninfo_string) + 1), 2901 | version_struct.OffsetToData) 2902 | fixedfileinfo_struct = self.__unpack_data__( 2903 | self.__VS_FIXEDFILEINFO_format__, 2904 | raw_data[fixedfileinfo_offset:], 2905 | file_offset = start_offset+fixedfileinfo_offset ) 2906 | 2907 | if not fixedfileinfo_struct: 2908 | return 2909 | 2910 | # Set the PE object's VS_FIXEDFILEINFO to this one 2911 | # 2912 | self.VS_FIXEDFILEINFO = fixedfileinfo_struct 2913 | 2914 | 2915 | # Start parsing all the StringFileInfo and VarFileInfo structures 2916 | # 2917 | 2918 | # Get the first one 2919 | # 2920 | stringfileinfo_offset = self.dword_align( 2921 | fixedfileinfo_offset + fixedfileinfo_struct.sizeof(), 2922 | version_struct.OffsetToData) 2923 | original_stringfileinfo_offset = stringfileinfo_offset 2924 | 2925 | 2926 | # Set the PE object's attribute that will contain them all. 2927 | # 2928 | self.FileInfo = list() 2929 | 2930 | 2931 | while True: 2932 | 2933 | # Process the StringFileInfo/VarFileInfo struct 2934 | # 2935 | stringfileinfo_struct = self.__unpack_data__( 2936 | self.__StringFileInfo_format__, 2937 | raw_data[stringfileinfo_offset:], 2938 | file_offset = start_offset+stringfileinfo_offset ) 2939 | 2940 | if stringfileinfo_struct is None: 2941 | self.__warnings.append( 2942 | 'Error parsing StringFileInfo/VarFileInfo struct' ) 2943 | return None 2944 | 2945 | # Get the subsequent string defining the structure. 2946 | # 2947 | ustr_offset = ( version_struct.OffsetToData + 2948 | stringfileinfo_offset + versioninfo_struct.sizeof() ) 2949 | try: 2950 | stringfileinfo_string = self.get_string_u_at_rva( ustr_offset ) 2951 | except PEFormatError, excp: 2952 | self.__warnings.append( 2953 | 'Error parsing the version information, ' + 2954 | 'attempting to read StringFileInfo string. Can\'t ' + 2955 | 'read unicode string at offset 0x%x' % ( ustr_offset ) ) 2956 | break 2957 | 2958 | # Set such string as the Key attribute 2959 | # 2960 | stringfileinfo_struct.Key = stringfileinfo_string 2961 | 2962 | 2963 | # Append the structure to the PE object's list 2964 | # 2965 | self.FileInfo.append(stringfileinfo_struct) 2966 | 2967 | 2968 | # Parse a StringFileInfo entry 2969 | # 2970 | if stringfileinfo_string and stringfileinfo_string.startswith(u'StringFileInfo'): 2971 | 2972 | if stringfileinfo_struct.Type in (0,1) and stringfileinfo_struct.ValueLength == 0: 2973 | 2974 | stringtable_offset = self.dword_align( 2975 | stringfileinfo_offset + stringfileinfo_struct.sizeof() + 2976 | 2*(len(stringfileinfo_string)+1), 2977 | version_struct.OffsetToData) 2978 | 2979 | stringfileinfo_struct.StringTable = list() 2980 | 2981 | # Process the String Table entries 2982 | # 2983 | while True: 2984 | 2985 | stringtable_struct = self.__unpack_data__( 2986 | self.__StringTable_format__, 2987 | raw_data[stringtable_offset:], 2988 | file_offset = start_offset+stringtable_offset ) 2989 | 2990 | if not stringtable_struct: 2991 | break 2992 | 2993 | ustr_offset = ( version_struct.OffsetToData + stringtable_offset + 2994 | stringtable_struct.sizeof() ) 2995 | try: 2996 | stringtable_string = self.get_string_u_at_rva( ustr_offset ) 2997 | except PEFormatError, excp: 2998 | self.__warnings.append( 2999 | 'Error parsing the version information, ' + 3000 | 'attempting to read StringTable string. Can\'t ' + 3001 | 'read unicode string at offset 0x%x' % ( ustr_offset ) ) 3002 | break 3003 | 3004 | stringtable_struct.LangID = stringtable_string 3005 | stringtable_struct.entries = dict() 3006 | stringtable_struct.entries_offsets = dict() 3007 | stringtable_struct.entries_lengths = dict() 3008 | stringfileinfo_struct.StringTable.append(stringtable_struct) 3009 | 3010 | entry_offset = self.dword_align( 3011 | stringtable_offset + stringtable_struct.sizeof() + 3012 | 2*(len(stringtable_string)+1), 3013 | version_struct.OffsetToData) 3014 | 3015 | # Process all entries in the string table 3016 | # 3017 | 3018 | while entry_offset < stringtable_offset + stringtable_struct.Length: 3019 | 3020 | string_struct = self.__unpack_data__( 3021 | self.__String_format__, raw_data[entry_offset:], 3022 | file_offset = start_offset+entry_offset ) 3023 | 3024 | if not string_struct: 3025 | break 3026 | 3027 | ustr_offset = ( version_struct.OffsetToData + entry_offset + 3028 | string_struct.sizeof() ) 3029 | try: 3030 | key = self.get_string_u_at_rva( ustr_offset ) 3031 | key_offset = self.get_offset_from_rva( ustr_offset ) 3032 | except PEFormatError, excp: 3033 | self.__warnings.append( 3034 | 'Error parsing the version information, ' + 3035 | 'attempting to read StringTable Key string. Can\'t ' + 3036 | 'read unicode string at offset 0x%x' % ( ustr_offset ) ) 3037 | break 3038 | 3039 | value_offset = self.dword_align( 3040 | 2*(len(key)+1) + entry_offset + string_struct.sizeof(), 3041 | version_struct.OffsetToData) 3042 | 3043 | ustr_offset = version_struct.OffsetToData + value_offset 3044 | try: 3045 | value = self.get_string_u_at_rva( ustr_offset, 3046 | max_length = string_struct.ValueLength ) 3047 | value_offset = self.get_offset_from_rva( ustr_offset ) 3048 | except PEFormatError, excp: 3049 | self.__warnings.append( 3050 | 'Error parsing the version information, ' + 3051 | 'attempting to read StringTable Value string. ' + 3052 | 'Can\'t read unicode string at offset 0x%x' % ( 3053 | ustr_offset ) ) 3054 | break 3055 | 3056 | if string_struct.Length == 0: 3057 | entry_offset = stringtable_offset + stringtable_struct.Length 3058 | else: 3059 | entry_offset = self.dword_align( 3060 | string_struct.Length+entry_offset, version_struct.OffsetToData) 3061 | 3062 | key_as_char = [] 3063 | for c in key: 3064 | if ord(c) >= 0x80: 3065 | key_as_char.append('\\x%02x' % ord(c)) 3066 | else: 3067 | key_as_char.append(c) 3068 | 3069 | key_as_char = ''.join(key_as_char) 3070 | 3071 | setattr(stringtable_struct, key_as_char, value) 3072 | stringtable_struct.entries[key] = value 3073 | stringtable_struct.entries_offsets[key] = (key_offset, value_offset) 3074 | stringtable_struct.entries_lengths[key] = (len(key), len(value)) 3075 | 3076 | 3077 | new_stringtable_offset = self.dword_align( 3078 | stringtable_struct.Length + stringtable_offset, 3079 | version_struct.OffsetToData) 3080 | 3081 | # check if the entry is crafted in a way that would lead to an infinite 3082 | # loop and break if so 3083 | # 3084 | if new_stringtable_offset == stringtable_offset: 3085 | break 3086 | stringtable_offset = new_stringtable_offset 3087 | 3088 | if stringtable_offset >= stringfileinfo_struct.Length: 3089 | break 3090 | 3091 | # Parse a VarFileInfo entry 3092 | # 3093 | elif stringfileinfo_string and stringfileinfo_string.startswith( u'VarFileInfo' ): 3094 | 3095 | varfileinfo_struct = stringfileinfo_struct 3096 | varfileinfo_struct.name = 'VarFileInfo' 3097 | 3098 | if varfileinfo_struct.Type in (0, 1) and varfileinfo_struct.ValueLength == 0: 3099 | 3100 | var_offset = self.dword_align( 3101 | stringfileinfo_offset + varfileinfo_struct.sizeof() + 3102 | 2*(len(stringfileinfo_string)+1), 3103 | version_struct.OffsetToData) 3104 | 3105 | varfileinfo_struct.Var = list() 3106 | 3107 | # Process all entries 3108 | # 3109 | 3110 | while True: 3111 | var_struct = self.__unpack_data__( 3112 | self.__Var_format__, 3113 | raw_data[var_offset:], 3114 | file_offset = start_offset+var_offset ) 3115 | 3116 | if not var_struct: 3117 | break 3118 | 3119 | ustr_offset = ( version_struct.OffsetToData + var_offset + 3120 | var_struct.sizeof() ) 3121 | try: 3122 | var_string = self.get_string_u_at_rva( ustr_offset ) 3123 | except PEFormatError, excp: 3124 | self.__warnings.append( 3125 | 'Error parsing the version information, ' + 3126 | 'attempting to read VarFileInfo Var string. ' + 3127 | 'Can\'t read unicode string at offset 0x%x' % (ustr_offset)) 3128 | break 3129 | 3130 | 3131 | varfileinfo_struct.Var.append(var_struct) 3132 | 3133 | varword_offset = self.dword_align( 3134 | 2*(len(var_string)+1) + var_offset + var_struct.sizeof(), 3135 | version_struct.OffsetToData) 3136 | orig_varword_offset = varword_offset 3137 | 3138 | while varword_offset < orig_varword_offset + var_struct.ValueLength: 3139 | word1 = self.get_word_from_data( 3140 | raw_data[varword_offset:varword_offset+2], 0) 3141 | word2 = self.get_word_from_data( 3142 | raw_data[varword_offset+2:varword_offset+4], 0) 3143 | varword_offset += 4 3144 | 3145 | if isinstance(word1, (int, long)) and isinstance(word2, (int, long)): 3146 | var_struct.entry = {var_string: '0x%04x 0x%04x' % (word1, word2)} 3147 | 3148 | var_offset = self.dword_align( 3149 | var_offset+var_struct.Length, version_struct.OffsetToData) 3150 | 3151 | if var_offset <= var_offset+var_struct.Length: 3152 | break 3153 | 3154 | 3155 | # Increment and align the offset 3156 | # 3157 | stringfileinfo_offset = self.dword_align( 3158 | stringfileinfo_struct.Length+stringfileinfo_offset, 3159 | version_struct.OffsetToData) 3160 | 3161 | # Check if all the StringFileInfo and VarFileInfo items have been processed 3162 | # 3163 | if stringfileinfo_struct.Length == 0 or stringfileinfo_offset >= versioninfo_struct.Length: 3164 | break 3165 | 3166 | 3167 | 3168 | def parse_export_directory(self, rva, size): 3169 | """Parse the export directory. 3170 | 3171 | Given the RVA of the export directory, it will process all 3172 | its entries. 3173 | 3174 | The exports will be made available through a list "exports" 3175 | containing a tuple with the following elements: 3176 | 3177 | (ordinal, symbol_address, symbol_name) 3178 | 3179 | And also through a dictionary "exports_by_ordinal" whose keys 3180 | will be the ordinals and the values tuples of the from: 3181 | 3182 | (symbol_address, symbol_name) 3183 | 3184 | The symbol addresses are relative, not absolute. 3185 | """ 3186 | 3187 | try: 3188 | export_dir = self.__unpack_data__( 3189 | self.__IMAGE_EXPORT_DIRECTORY_format__, 3190 | self.get_data( rva, Structure(self.__IMAGE_EXPORT_DIRECTORY_format__).sizeof() ), 3191 | file_offset = self.get_offset_from_rva(rva) ) 3192 | except PEFormatError: 3193 | self.__warnings.append( 3194 | 'Error parsing export directory at RVA: 0x%x' % ( rva ) ) 3195 | return 3196 | 3197 | if not export_dir: 3198 | return 3199 | 3200 | # We keep track of the bytes left in the file and use it to set a upper 3201 | # bound in the number of items that can be read from the different 3202 | # arrays 3203 | # 3204 | def length_until_eof(rva): 3205 | return len(self.__data__) - self.get_offset_from_rva(rva) 3206 | 3207 | try: 3208 | address_of_names = self.get_data( 3209 | export_dir.AddressOfNames, min( length_until_eof(export_dir.AddressOfNames), export_dir.NumberOfNames*4)) 3210 | address_of_name_ordinals = self.get_data( 3211 | export_dir.AddressOfNameOrdinals, min( length_until_eof(export_dir.AddressOfNameOrdinals), export_dir.NumberOfNames*4) ) 3212 | address_of_functions = self.get_data( 3213 | export_dir.AddressOfFunctions, min( length_until_eof(export_dir.AddressOfFunctions), export_dir.NumberOfFunctions*4) ) 3214 | except PEFormatError: 3215 | self.__warnings.append( 3216 | 'Error parsing export directory at RVA: 0x%x' % ( rva ) ) 3217 | return 3218 | 3219 | exports = [] 3220 | 3221 | max_failed_entries_before_giving_up = 10 3222 | 3223 | for i in xrange( min( export_dir.NumberOfNames, length_until_eof(export_dir.AddressOfNames)/4) ): 3224 | 3225 | symbol_name_address = self.get_dword_from_data(address_of_names, i) 3226 | 3227 | if symbol_name_address is None: 3228 | max_failed_entries_before_giving_up -= 1 3229 | if max_failed_entries_before_giving_up <= 0: 3230 | break 3231 | 3232 | symbol_name = self.get_string_at_rva( symbol_name_address ) 3233 | try: 3234 | symbol_name_offset = self.get_offset_from_rva( symbol_name_address ) 3235 | except PEFormatError: 3236 | max_failed_entries_before_giving_up -= 1 3237 | if max_failed_entries_before_giving_up <= 0: 3238 | break 3239 | continue 3240 | 3241 | symbol_ordinal = self.get_word_from_data( 3242 | address_of_name_ordinals, i) 3243 | 3244 | 3245 | if symbol_ordinal is not None and symbol_ordinal*4 < len(address_of_functions): 3246 | symbol_address = self.get_dword_from_data( 3247 | address_of_functions, symbol_ordinal) 3248 | else: 3249 | # Corrupt? a bad pointer... we assume it's all 3250 | # useless, no exports 3251 | return None 3252 | 3253 | if symbol_address is None or symbol_address == 0: 3254 | continue 3255 | 3256 | # If the funcion's RVA points within the export directory 3257 | # it will point to a string with the forwarded symbol's string 3258 | # instead of pointing the the function start address. 3259 | 3260 | if symbol_address >= rva and symbol_address < rva+size: 3261 | forwarder_str = self.get_string_at_rva(symbol_address) 3262 | try: 3263 | forwarder_offset = self.get_offset_from_rva( symbol_address ) 3264 | except PEFormatError: 3265 | continue 3266 | else: 3267 | forwarder_str = None 3268 | forwarder_offset = None 3269 | 3270 | exports.append( 3271 | ExportData( 3272 | pe = self, 3273 | ordinal = export_dir.Base+symbol_ordinal, 3274 | ordinal_offset = self.get_offset_from_rva( export_dir.AddressOfNameOrdinals + 2*i ), 3275 | address = symbol_address, 3276 | address_offset = self.get_offset_from_rva( export_dir.AddressOfFunctions + 4*symbol_ordinal ), 3277 | name = symbol_name, 3278 | name_offset = symbol_name_offset, 3279 | forwarder = forwarder_str, 3280 | forwarder_offset = forwarder_offset )) 3281 | 3282 | ordinals = [exp.ordinal for exp in exports] 3283 | 3284 | max_failed_entries_before_giving_up = 10 3285 | 3286 | for idx in xrange( min(export_dir.NumberOfFunctions, length_until_eof(export_dir.AddressOfFunctions)/4) ): 3287 | 3288 | if not idx+export_dir.Base in ordinals: 3289 | try: 3290 | symbol_address = self.get_dword_from_data( 3291 | address_of_functions, idx) 3292 | except PEFormatError: 3293 | symbol_address = None 3294 | 3295 | if symbol_address is None: 3296 | max_failed_entries_before_giving_up -= 1 3297 | if max_failed_entries_before_giving_up <= 0: 3298 | break 3299 | 3300 | if symbol_address == 0: 3301 | continue 3302 | # 3303 | # Checking for forwarder again. 3304 | # 3305 | if symbol_address >= rva and symbol_address < rva+size: 3306 | forwarder_str = self.get_string_at_rva(symbol_address) 3307 | else: 3308 | forwarder_str = None 3309 | 3310 | exports.append( 3311 | ExportData( 3312 | ordinal = export_dir.Base+idx, 3313 | address = symbol_address, 3314 | name = None, 3315 | forwarder = forwarder_str)) 3316 | 3317 | return ExportDirData( 3318 | struct = export_dir, 3319 | symbols = exports) 3320 | 3321 | 3322 | def dword_align(self, offset, base): 3323 | return ((offset+base+3) & 0xfffffffcL) - (base & 0xfffffffcL) 3324 | 3325 | 3326 | def parse_delay_import_directory(self, rva, size): 3327 | """Walk and parse the delay import directory.""" 3328 | 3329 | import_descs = [] 3330 | while True: 3331 | try: 3332 | # If the RVA is invalid all would blow up. Some PEs seem to be 3333 | # specially nasty and have an invalid RVA. 3334 | data = self.get_data( rva, Structure(self.__IMAGE_DELAY_IMPORT_DESCRIPTOR_format__).sizeof() ) 3335 | except PEFormatError, e: 3336 | self.__warnings.append( 3337 | 'Error parsing the Delay import directory at RVA: 0x%x' % ( rva ) ) 3338 | break 3339 | 3340 | import_desc = self.__unpack_data__( 3341 | self.__IMAGE_DELAY_IMPORT_DESCRIPTOR_format__, 3342 | data, file_offset = self.get_offset_from_rva(rva) ) 3343 | 3344 | 3345 | # If the structure is all zeroes, we reached the end of the list 3346 | if not import_desc or import_desc.all_zeroes(): 3347 | break 3348 | 3349 | 3350 | rva += import_desc.sizeof() 3351 | 3352 | try: 3353 | import_data = self.parse_imports( 3354 | import_desc.pINT, 3355 | import_desc.pIAT, 3356 | None) 3357 | except PEFormatError, e: 3358 | self.__warnings.append( 3359 | 'Error parsing the Delay import directory. ' + 3360 | 'Invalid import data at RVA: 0x%x' % ( rva ) ) 3361 | break 3362 | 3363 | if not import_data: 3364 | continue 3365 | 3366 | 3367 | dll = self.get_string_at_rva(import_desc.szName) 3368 | if not is_valid_dos_filename(dll): 3369 | dll = '*invalid*' 3370 | 3371 | if dll: 3372 | import_descs.append( 3373 | ImportDescData( 3374 | struct = import_desc, 3375 | imports = import_data, 3376 | dll = dll)) 3377 | 3378 | return import_descs 3379 | 3380 | 3381 | 3382 | def parse_import_directory(self, rva, size): 3383 | """Walk and parse the import directory.""" 3384 | 3385 | import_descs = [] 3386 | while True: 3387 | try: 3388 | # If the RVA is invalid all would blow up. Some EXEs seem to be 3389 | # specially nasty and have an invalid RVA. 3390 | data = self.get_data(rva, Structure(self.__IMAGE_IMPORT_DESCRIPTOR_format__).sizeof() ) 3391 | except PEFormatError, e: 3392 | self.__warnings.append( 3393 | 'Error parsing the import directory at RVA: 0x%x' % ( rva ) ) 3394 | break 3395 | 3396 | import_desc = self.__unpack_data__( 3397 | self.__IMAGE_IMPORT_DESCRIPTOR_format__, 3398 | data, file_offset = self.get_offset_from_rva(rva) ) 3399 | 3400 | # If the structure is all zeroes, we reached the end of the list 3401 | if not import_desc or import_desc.all_zeroes(): 3402 | break 3403 | 3404 | rva += import_desc.sizeof() 3405 | 3406 | try: 3407 | import_data = self.parse_imports( 3408 | import_desc.OriginalFirstThunk, 3409 | import_desc.FirstThunk, 3410 | import_desc.ForwarderChain) 3411 | except PEFormatError, excp: 3412 | self.__warnings.append( 3413 | 'Error parsing the import directory. ' + 3414 | 'Invalid Import data at RVA: 0x%x (%s)' % ( rva, str(excp) ) ) 3415 | break 3416 | #raise excp 3417 | 3418 | if not import_data: 3419 | continue 3420 | 3421 | dll = self.get_string_at_rva(import_desc.Name) 3422 | if not is_valid_dos_filename(dll): 3423 | dll = '*invalid*' 3424 | 3425 | if dll: 3426 | import_descs.append( 3427 | ImportDescData( 3428 | struct = import_desc, 3429 | imports = import_data, 3430 | dll = dll)) 3431 | 3432 | suspicious_imports = set([ 'LoadLibrary', 'GetProcAddress' ]) 3433 | suspicious_imports_count = 0 3434 | total_symbols = 0 3435 | for imp_dll in import_descs: 3436 | for symbol in imp_dll.imports: 3437 | for suspicious_symbol in suspicious_imports: 3438 | if symbol and symbol.name and symbol.name.startswith( suspicious_symbol ): 3439 | suspicious_imports_count += 1 3440 | break 3441 | total_symbols += 1 3442 | if suspicious_imports_count == len(suspicious_imports) and total_symbols < 20: 3443 | self.__warnings.append( 3444 | 'Imported symbols contain entries typical of packed executables.' ) 3445 | 3446 | 3447 | 3448 | return import_descs 3449 | 3450 | 3451 | 3452 | def parse_imports(self, original_first_thunk, first_thunk, forwarder_chain): 3453 | """Parse the imported symbols. 3454 | 3455 | It will fill a list, which will be available as the dictionary 3456 | attribute "imports". Its keys will be the DLL names and the values 3457 | all the symbols imported from that object. 3458 | """ 3459 | 3460 | imported_symbols = [] 3461 | 3462 | # The following has been commented as a PE does not 3463 | # need to have the import data necessarily witin 3464 | # a section, it can keep it in gaps between sections 3465 | # or overlapping other data. 3466 | # 3467 | #imports_section = self.get_section_by_rva(first_thunk) 3468 | #if not imports_section: 3469 | # raise PEFormatError, 'Invalid/corrupt imports.' 3470 | 3471 | # Import Lookup Table. Contains ordinals or pointers to strings. 3472 | ilt = self.get_import_table(original_first_thunk) 3473 | # Import Address Table. May have identical content to ILT if 3474 | # PE file is not bounded, Will contain the address of the 3475 | # imported symbols once the binary is loaded or if it is already 3476 | # bound. 3477 | iat = self.get_import_table(first_thunk) 3478 | 3479 | # OC Patch: 3480 | # Would crash if IAT or ILT had None type 3481 | if (not iat or len(iat)==0) and (not ilt or len(ilt)==0): 3482 | raise PEFormatError( 3483 | 'Invalid Import Table information. ' + 3484 | 'Both ILT and IAT appear to be broken.') 3485 | 3486 | table = None 3487 | if ilt: 3488 | table = ilt 3489 | elif iat: 3490 | table = iat 3491 | else: 3492 | return None 3493 | 3494 | imp_offset = 4 3495 | address_mask = 0x7fffffff 3496 | if self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE: 3497 | ordinal_flag = IMAGE_ORDINAL_FLAG 3498 | elif self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS: 3499 | ordinal_flag = IMAGE_ORDINAL_FLAG64 3500 | imp_offset = 8 3501 | address_mask = 0x7fffffffffffffffL 3502 | 3503 | for idx in xrange(len(table)): 3504 | 3505 | imp_ord = None 3506 | imp_hint = None 3507 | imp_name = None 3508 | name_offset = None 3509 | hint_name_table_rva = None 3510 | 3511 | if table[idx].AddressOfData: 3512 | 3513 | # If imported by ordinal, we will append the ordinal number 3514 | # 3515 | if table[idx].AddressOfData & ordinal_flag: 3516 | import_by_ordinal = True 3517 | imp_ord = table[idx].AddressOfData & 0xffff 3518 | imp_name = None 3519 | name_offset = None 3520 | else: 3521 | import_by_ordinal = False 3522 | try: 3523 | hint_name_table_rva = table[idx].AddressOfData & address_mask 3524 | data = self.get_data(hint_name_table_rva, 2) 3525 | # Get the Hint 3526 | imp_hint = self.get_word_from_data(data, 0) 3527 | imp_name = self.get_string_at_rva(table[idx].AddressOfData+2) 3528 | if not is_valid_function_name(imp_name): 3529 | imp_name = '*invalid*' 3530 | 3531 | name_offset = self.get_offset_from_rva(table[idx].AddressOfData+2) 3532 | except PEFormatError, e: 3533 | pass 3534 | 3535 | # by nriva: we want the ThunkRVA and ThunkOffset 3536 | thunk_offset = table[idx].get_file_offset() 3537 | thunk_rva = self.get_rva_from_offset(thunk_offset) 3538 | 3539 | imp_address = first_thunk + self.OPTIONAL_HEADER.ImageBase + idx * imp_offset 3540 | 3541 | struct_iat = None 3542 | try: 3543 | 3544 | if iat and ilt and ilt[idx].AddressOfData != iat[idx].AddressOfData: 3545 | imp_bound = iat[idx].AddressOfData 3546 | struct_iat = iat[idx] 3547 | else: 3548 | imp_bound = None 3549 | except IndexError: 3550 | imp_bound = None 3551 | 3552 | # The file with hashes: 3553 | # 3554 | # MD5: bfe97192e8107d52dd7b4010d12b2924 3555 | # SHA256: 3d22f8b001423cb460811ab4f4789f277b35838d45c62ec0454c877e7c82c7f5 3556 | # 3557 | # has an invalid table built in a way that it's parseable but contains invalid 3558 | # entries that lead pefile to take extremely long amounts of time to 3559 | # parse. It also leads to extreme memory consumption. 3560 | # To prevent similar cases, if invalid entries are found in the middle of a 3561 | # table the parsing will be aborted 3562 | # 3563 | if imp_ord == None and imp_name == None: 3564 | raise PEFormatError( 'Invalid entries in the Import Table. Aborting parsing.' ) 3565 | 3566 | if imp_name != '' and (imp_ord or imp_name): 3567 | imported_symbols.append( 3568 | ImportData( 3569 | pe = self, 3570 | struct_table = table[idx], 3571 | struct_iat = struct_iat, # for bound imports if any 3572 | import_by_ordinal = import_by_ordinal, 3573 | ordinal = imp_ord, 3574 | ordinal_offset = table[idx].get_file_offset(), 3575 | hint = imp_hint, 3576 | name = imp_name, 3577 | name_offset = name_offset, 3578 | bound = imp_bound, 3579 | address = imp_address, 3580 | hint_name_table_rva = hint_name_table_rva, 3581 | thunk_offset = thunk_offset, 3582 | thunk_rva = thunk_rva )) 3583 | 3584 | return imported_symbols 3585 | 3586 | 3587 | 3588 | def get_import_table(self, rva): 3589 | 3590 | table = [] 3591 | 3592 | # We need the ordinal flag for a simple heuristic 3593 | # we're implementing within the loop 3594 | # 3595 | if self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE: 3596 | ordinal_flag = IMAGE_ORDINAL_FLAG 3597 | format = self.__IMAGE_THUNK_DATA_format__ 3598 | elif self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS: 3599 | ordinal_flag = IMAGE_ORDINAL_FLAG64 3600 | format = self.__IMAGE_THUNK_DATA64_format__ 3601 | 3602 | MAX_ADDRESS_SPREAD = 128*2**20 # 64 MB 3603 | MAX_REPEATED_ADDRESSES = 15 3604 | repeated_address = 0 3605 | addresses_of_data_set_64 = set() 3606 | addresses_of_data_set_32 = set() 3607 | while True and rva: 3608 | 3609 | # if we see too many times the same entry we assume it could be 3610 | # a table containing bogus data (with malicious intenet or otherwise) 3611 | if repeated_address >= MAX_REPEATED_ADDRESSES: 3612 | return [] 3613 | 3614 | # if the addresses point somwhere but the difference between the highest 3615 | # and lowest address is larger than MAX_ADDRESS_SPREAD we assume a bogus 3616 | # table as the addresses should be contained within a module 3617 | if (addresses_of_data_set_32 and 3618 | max(addresses_of_data_set_32) - min(addresses_of_data_set_32) > MAX_ADDRESS_SPREAD ): 3619 | return [] 3620 | if (addresses_of_data_set_64 and 3621 | max(addresses_of_data_set_64) - min(addresses_of_data_set_64) > MAX_ADDRESS_SPREAD ): 3622 | return [] 3623 | 3624 | try: 3625 | data = self.get_data( rva, Structure(format).sizeof() ) 3626 | except PEFormatError, e: 3627 | self.__warnings.append( 3628 | 'Error parsing the import table. ' + 3629 | 'Invalid data at RVA: 0x%x' % ( rva ) ) 3630 | return None 3631 | 3632 | thunk_data = self.__unpack_data__( 3633 | format, data, file_offset=self.get_offset_from_rva(rva) ) 3634 | 3635 | if thunk_data and thunk_data.AddressOfData: 3636 | # If the entry looks like could be an ordinal... 3637 | if thunk_data.AddressOfData & ordinal_flag: 3638 | # but its value is beyond 2^16, we will assume it's a 3639 | # corrupted and ignore it altogether 3640 | if thunk_data.AddressOfData & 0x7fffffff > 0xffff: 3641 | return [] 3642 | # and if it looks like it should be an RVA 3643 | else: 3644 | # keep track of the RVAs seen and store them to study their 3645 | # properties. When certain non-standard features are detected 3646 | # the parsing will be aborted 3647 | if (thunk_data.AddressOfData in addresses_of_data_set_32 or 3648 | thunk_data.AddressOfData in addresses_of_data_set_64): 3649 | repeated_address += 1 3650 | if thunk_data.AddressOfData >= 2**32: 3651 | addresses_of_data_set_64.add(thunk_data.AddressOfData) 3652 | else: 3653 | addresses_of_data_set_32.add(thunk_data.AddressOfData) 3654 | 3655 | if not thunk_data or thunk_data.all_zeroes(): 3656 | break 3657 | 3658 | rva += thunk_data.sizeof() 3659 | 3660 | table.append(thunk_data) 3661 | 3662 | return table 3663 | 3664 | 3665 | def get_memory_mapped_image(self, max_virtual_address=0x10000000, ImageBase=None): 3666 | """Returns the data corresponding to the memory layout of the PE file. 3667 | 3668 | The data includes the PE header and the sections loaded at offsets 3669 | corresponding to their relative virtual addresses. (the VirtualAddress 3670 | section header member). 3671 | Any offset in this data corresponds to the absolute memory address 3672 | ImageBase+offset. 3673 | 3674 | The optional argument 'max_virtual_address' provides with means of limiting 3675 | which section are processed. 3676 | Any section with their VirtualAddress beyond this value will be skipped. 3677 | Normally, sections with values beyond this range are just there to confuse 3678 | tools. It's a common trick to see in packed executables. 3679 | 3680 | If the 'ImageBase' optional argument is supplied, the file's relocations 3681 | will be applied to the image by calling the 'relocate_image()' method. Beware 3682 | that the relocation information is applied permanently. 3683 | """ 3684 | 3685 | # Rebase if requested 3686 | # 3687 | if ImageBase is not None: 3688 | 3689 | # Keep a copy of the image's data before modifying it by rebasing it 3690 | # 3691 | original_data = self.__data__ 3692 | 3693 | self.relocate_image(ImageBase) 3694 | 3695 | # Collect all sections in one code block 3696 | #mapped_data = self.header 3697 | mapped_data = ''+ self.__data__[:] 3698 | for section in self.sections: 3699 | 3700 | # Miscellaneous integrity tests. 3701 | # Some packer will set these to bogus values to 3702 | # make tools go nuts. 3703 | # 3704 | if section.Misc_VirtualSize == 0 or section.SizeOfRawData == 0: 3705 | continue 3706 | 3707 | if section.SizeOfRawData > len(self.__data__): 3708 | continue 3709 | 3710 | if self.adjust_FileAlignment( section.PointerToRawData, 3711 | self.OPTIONAL_HEADER.FileAlignment ) > len(self.__data__): 3712 | 3713 | continue 3714 | 3715 | VirtualAddress_adj = self.adjust_SectionAlignment( section.VirtualAddress, 3716 | self.OPTIONAL_HEADER.SectionAlignment, self.OPTIONAL_HEADER.FileAlignment ) 3717 | 3718 | if VirtualAddress_adj >= max_virtual_address: 3719 | continue 3720 | 3721 | padding_length = VirtualAddress_adj - len(mapped_data) 3722 | 3723 | if padding_length>0: 3724 | mapped_data += '\0'*padding_length 3725 | elif padding_length<0: 3726 | mapped_data = mapped_data[:padding_length] 3727 | 3728 | mapped_data += section.get_data() 3729 | 3730 | # If the image was rebased, restore it to its original form 3731 | # 3732 | if ImageBase is not None: 3733 | self.__data__ = original_data 3734 | 3735 | return mapped_data 3736 | 3737 | 3738 | def get_resources_strings(self): 3739 | """Returns a list of all the strings found withing the resources (if any). 3740 | 3741 | This method will scan all entries in the resources directory of the PE, if 3742 | there is one, and will return a list() with the strings. 3743 | 3744 | An empty list will be returned otherwise. 3745 | """ 3746 | 3747 | resources_strings = list() 3748 | 3749 | if hasattr(self, 'DIRECTORY_ENTRY_RESOURCE'): 3750 | 3751 | for resource_type in self.DIRECTORY_ENTRY_RESOURCE.entries: 3752 | if hasattr(resource_type, 'directory'): 3753 | for resource_id in resource_type.directory.entries: 3754 | if hasattr(resource_id, 'directory'): 3755 | if hasattr(resource_id.directory, 'strings') and resource_id.directory.strings: 3756 | for res_string in resource_id.directory.strings.values(): 3757 | resources_strings.append( res_string ) 3758 | 3759 | return resources_strings 3760 | 3761 | 3762 | def get_data(self, rva=0, length=None): 3763 | """Get data regardless of the section where it lies on. 3764 | 3765 | Given a RVA and the size of the chunk to retrieve, this method 3766 | will find the section where the data lies and return the data. 3767 | """ 3768 | 3769 | s = self.get_section_by_rva(rva) 3770 | 3771 | if length: 3772 | end = rva + length 3773 | else: 3774 | end = None 3775 | 3776 | if not s: 3777 | if rva < len(self.header): 3778 | return self.header[rva:end] 3779 | 3780 | # Before we give up we check whether the file might 3781 | # contain the data anyway. There are cases of PE files 3782 | # without sections that rely on windows loading the first 3783 | # 8291 bytes into memory and assume the data will be 3784 | # there 3785 | # A functional file with these characteristics is: 3786 | # MD5: 0008892cdfbc3bda5ce047c565e52295 3787 | # SHA-1: c7116b9ff950f86af256defb95b5d4859d4752a9 3788 | # 3789 | if rva < len(self.__data__): 3790 | return self.__data__[rva:end] 3791 | 3792 | raise PEFormatError, 'data at RVA can\'t be fetched. Corrupt header?' 3793 | 3794 | return s.get_data(rva, length) 3795 | 3796 | 3797 | def get_rva_from_offset(self, offset): 3798 | """Get the RVA corresponding to this file offset. """ 3799 | 3800 | s = self.get_section_by_offset(offset) 3801 | if not s: 3802 | if self.sections: 3803 | lowest_rva = min( [ self.adjust_SectionAlignment( s.VirtualAddress, 3804 | self.OPTIONAL_HEADER.SectionAlignment, self.OPTIONAL_HEADER.FileAlignment ) for s in self.sections] ) 3805 | if offset < lowest_rva: 3806 | # We will assume that the offset lies within the headers, or 3807 | # at least points before where the earliest section starts 3808 | # and we will simply return the offset as the RVA 3809 | # 3810 | # The case illustrating this behavior can be found at: 3811 | # http://corkami.blogspot.com/2010/01/hey-hey-hey-whats-in-your-head.html 3812 | # where the import table is not contained by any section 3813 | # hence the RVA needs to be resolved to a raw offset 3814 | return offset 3815 | else: 3816 | return offset 3817 | #raise PEFormatError("specified offset (0x%x) doesn't belong to any section." % offset) 3818 | return s.get_rva_from_offset(offset) 3819 | 3820 | def get_offset_from_rva(self, rva): 3821 | """Get the file offset corresponding to this RVA. 3822 | 3823 | Given a RVA , this method will find the section where the 3824 | data lies and return the offset within the file. 3825 | """ 3826 | 3827 | s = self.get_section_by_rva(rva) 3828 | if not s: 3829 | 3830 | # If not found within a section assume it might 3831 | # point to overlay data or otherwise data present 3832 | # but not contained in any section. In those 3833 | # cases the RVA should equal the offset 3834 | if rva len(data): 4265 | return None 4266 | 4267 | return struct.unpack(' len(self.__data__): 4287 | return None 4288 | 4289 | return self.get_dword_from_data(self.__data__[offset:offset+4], 0) 4290 | 4291 | 4292 | def set_dword_at_rva(self, rva, dword): 4293 | """Set the double word value at the file offset corresponding to the given RVA.""" 4294 | return self.set_bytes_at_rva(rva, self.get_data_from_dword(dword)) 4295 | 4296 | 4297 | def set_dword_at_offset(self, offset, dword): 4298 | """Set the double word value at the given file offset.""" 4299 | return self.set_bytes_at_offset(offset, self.get_data_from_dword(dword)) 4300 | 4301 | 4302 | 4303 | ## 4304 | # Word get/set 4305 | ## 4306 | 4307 | def get_data_from_word(self, word): 4308 | """Return a two byte string representing the word value. (little endian).""" 4309 | return struct.pack(' len(data): 4322 | return None 4323 | 4324 | return struct.unpack(' len(self.__data__): 4344 | return None 4345 | 4346 | return self.get_word_from_data(self.__data__[offset:offset+2], 0) 4347 | 4348 | 4349 | def set_word_at_rva(self, rva, word): 4350 | """Set the word value at the file offset corresponding to the given RVA.""" 4351 | return self.set_bytes_at_rva(rva, self.get_data_from_word(word)) 4352 | 4353 | 4354 | def set_word_at_offset(self, offset, word): 4355 | """Set the word value at the given file offset.""" 4356 | return self.set_bytes_at_offset(offset, self.get_data_from_word(word)) 4357 | 4358 | 4359 | ## 4360 | # Quad-Word get/set 4361 | ## 4362 | 4363 | def get_data_from_qword(self, word): 4364 | """Return a eight byte string representing the quad-word value. (little endian).""" 4365 | return struct.pack(' len(data): 4378 | return None 4379 | 4380 | return struct.unpack(' len(self.__data__): 4400 | return None 4401 | 4402 | return self.get_qword_from_data(self.__data__[offset:offset+8], 0) 4403 | 4404 | 4405 | def set_qword_at_rva(self, rva, qword): 4406 | """Set the quad-word value at the file offset corresponding to the given RVA.""" 4407 | return self.set_bytes_at_rva(rva, self.get_data_from_qword(qword)) 4408 | 4409 | 4410 | def set_qword_at_offset(self, offset, qword): 4411 | """Set the quad-word value at the given file offset.""" 4412 | return self.set_bytes_at_offset(offset, self.get_data_from_qword(qword)) 4413 | 4414 | 4415 | 4416 | ## 4417 | # Set bytes 4418 | ## 4419 | 4420 | 4421 | def set_bytes_at_rva(self, rva, data): 4422 | """Overwrite, with the given string, the bytes at the file offset corresponding to the given RVA. 4423 | 4424 | Return True if successful, False otherwise. It can fail if the 4425 | offset is outside the file's boundaries. 4426 | """ 4427 | 4428 | if not isinstance(data, str): 4429 | raise TypeError('data should be of type: str') 4430 | 4431 | offset = self.get_physical_by_rva(rva) 4432 | if not offset: 4433 | return False 4434 | 4435 | return self.set_bytes_at_offset(offset, data) 4436 | 4437 | 4438 | def set_bytes_at_offset(self, offset, data): 4439 | """Overwrite the bytes at the given file offset with the given string. 4440 | 4441 | Return True if successful, False otherwise. It can fail if the 4442 | offset is outside the file's boundaries. 4443 | """ 4444 | 4445 | if not isinstance(data, str): 4446 | raise TypeError('data should be of type: str') 4447 | 4448 | if offset >= 0 and offset < len(self.__data__): 4449 | self.__data__ = ( self.__data__[:offset] + data + self.__data__[offset+len(data):] ) 4450 | else: 4451 | return False 4452 | 4453 | return True 4454 | 4455 | 4456 | def merge_modified_section_data(self): 4457 | """Update the PE image content with any individual section data that has been modified.""" 4458 | 4459 | for section in self.sections: 4460 | section_data_start = self.adjust_FileAlignment( section.PointerToRawData, 4461 | self.OPTIONAL_HEADER.FileAlignment ) 4462 | section_data_end = section_data_start+section.SizeOfRawData 4463 | if section_data_start < len(self.__data__) and section_data_end < len(self.__data__): 4464 | self.__data__ = self.__data__[:section_data_start] + section.get_data() + self.__data__[section_data_end:] 4465 | 4466 | 4467 | def relocate_image(self, new_ImageBase): 4468 | """Apply the relocation information to the image using the provided new image base. 4469 | 4470 | This method will apply the relocation information to the image. Given the new base, 4471 | all the relocations will be processed and both the raw data and the section's data 4472 | will be fixed accordingly. 4473 | The resulting image can be retrieved as well through the method: 4474 | 4475 | get_memory_mapped_image() 4476 | 4477 | In order to get something that would more closely match what could be found in memory 4478 | once the Windows loader finished its work. 4479 | """ 4480 | 4481 | relocation_difference = new_ImageBase - self.OPTIONAL_HEADER.ImageBase 4482 | 4483 | 4484 | for reloc in self.DIRECTORY_ENTRY_BASERELOC: 4485 | 4486 | virtual_address = reloc.struct.VirtualAddress 4487 | size_of_block = reloc.struct.SizeOfBlock 4488 | 4489 | # We iterate with an index because if the relocation is of type 4490 | # IMAGE_REL_BASED_HIGHADJ we need to also process the next entry 4491 | # at once and skip it for the next iteration 4492 | # 4493 | entry_idx = 0 4494 | while entry_idx>16)&0xffff ) 4512 | 4513 | elif entry.type == RELOCATION_TYPE['IMAGE_REL_BASED_LOW']: 4514 | # Fix the low 16bits of a relocation 4515 | # 4516 | # Add low 16 bits of relocation_difference to the 16bit value 4517 | # at RVA=entry.rva 4518 | 4519 | self.set_word_at_rva( 4520 | entry.rva, 4521 | ( self.get_word_at_rva(entry.rva) + relocation_difference)&0xffff) 4522 | 4523 | elif entry.type == RELOCATION_TYPE['IMAGE_REL_BASED_HIGHLOW']: 4524 | # Handle all high and low parts of a 32bit relocation 4525 | # 4526 | # Add relocation_difference to the value at RVA=entry.rva 4527 | 4528 | self.set_dword_at_rva( 4529 | entry.rva, 4530 | self.get_dword_at_rva(entry.rva)+relocation_difference) 4531 | 4532 | elif entry.type == RELOCATION_TYPE['IMAGE_REL_BASED_HIGHADJ']: 4533 | # Fix the high 16bits of a relocation and adjust 4534 | # 4535 | # Add high 16bits of relocation_difference to the 32bit value 4536 | # composed from the (16bit value at RVA=entry.rva)<<16 plus 4537 | # the 16bit value at the next relocation entry. 4538 | # 4539 | 4540 | # If the next entry is beyond the array's limits, 4541 | # abort... the table is corrupt 4542 | # 4543 | if entry_idx == len(reloc.entries): 4544 | break 4545 | 4546 | next_entry = reloc.entries[entry_idx] 4547 | entry_idx += 1 4548 | self.set_word_at_rva( entry.rva, 4549 | ((self.get_word_at_rva(entry.rva)<<16) + next_entry.rva + 4550 | relocation_difference & 0xffff0000) >> 16 ) 4551 | 4552 | elif entry.type == RELOCATION_TYPE['IMAGE_REL_BASED_DIR64']: 4553 | # Apply the difference to the 64bit value at the offset 4554 | # RVA=entry.rva 4555 | 4556 | self.set_qword_at_rva( 4557 | entry.rva, 4558 | self.get_qword_at_rva(entry.rva) + relocation_difference) 4559 | 4560 | 4561 | def verify_checksum(self): 4562 | 4563 | return self.OPTIONAL_HEADER.CheckSum == self.generate_checksum() 4564 | 4565 | 4566 | def generate_checksum(self): 4567 | 4568 | # This will make sure that the data representing the PE image 4569 | # is updated with any changes that might have been made by 4570 | # assigning values to header fields as those are not automatically 4571 | # updated upon assignment. 4572 | # 4573 | self.__data__ = self.write() 4574 | 4575 | # Get the offset to the CheckSum field in the OptionalHeader 4576 | # 4577 | checksum_offset = self.OPTIONAL_HEADER.__file_offset__ + 0x40 # 64 4578 | 4579 | checksum = 0 4580 | 4581 | # Verify the data is dword-aligned. Add padding if needed 4582 | # 4583 | remainder = len(self.__data__) % 4 4584 | data = self.__data__ + ( '\0' * ((4-remainder) * ( remainder != 0 )) ) 4585 | 4586 | for i in range( len( data ) / 4 ): 4587 | 4588 | # Skip the checksum field 4589 | # 4590 | if i == checksum_offset / 4: 4591 | continue 4592 | 4593 | dword = struct.unpack('I', data[ i*4 : i*4+4 ])[0] 4594 | checksum = (checksum & 0xffffffff) + dword + (checksum>>32) 4595 | if checksum > 2**32: 4596 | checksum = (checksum & 0xffffffff) + (checksum >> 32) 4597 | 4598 | checksum = (checksum & 0xffff) + (checksum >> 16) 4599 | checksum = (checksum) + (checksum >> 16) 4600 | checksum = checksum & 0xffff 4601 | 4602 | # The length is the one of the original data, not the padded one 4603 | # 4604 | return checksum + len(self.__data__) 4605 | 4606 | 4607 | def is_exe(self): 4608 | """Check whether the file is a standard executable. 4609 | 4610 | This will return true only if the file has the IMAGE_FILE_EXECUTABLE_IMAGE flag set 4611 | and the IMAGE_FILE_DLL not set and the file does not appear to be a driver either. 4612 | """ 4613 | 4614 | EXE_flag = IMAGE_CHARACTERISTICS['IMAGE_FILE_EXECUTABLE_IMAGE'] 4615 | 4616 | if (not self.is_dll()) and (not self.is_driver()) and ( 4617 | EXE_flag & self.FILE_HEADER.Characteristics) == EXE_flag: 4618 | return True 4619 | 4620 | return False 4621 | 4622 | 4623 | def is_dll(self): 4624 | """Check whether the file is a standard DLL. 4625 | 4626 | This will return true only if the image has the IMAGE_FILE_DLL flag set. 4627 | """ 4628 | 4629 | DLL_flag = IMAGE_CHARACTERISTICS['IMAGE_FILE_DLL'] 4630 | 4631 | if ( DLL_flag & self.FILE_HEADER.Characteristics) == DLL_flag: 4632 | return True 4633 | 4634 | return False 4635 | 4636 | 4637 | def is_driver(self): 4638 | """Check whether the file is a Windows driver. 4639 | 4640 | This will return true only if there are reliable indicators of the image 4641 | being a driver. 4642 | """ 4643 | 4644 | # Checking that the ImageBase field of the OptionalHeader is above or 4645 | # equal to 0x80000000 (that is, whether it lies in the upper 2GB of 4646 | # the address space, normally belonging to the kernel) is not a 4647 | # reliable enough indicator. For instance, PEs that play the invalid 4648 | # ImageBase trick to get relocated could be incorrectly assumed to be 4649 | # drivers. 4650 | 4651 | # This is not reliable either... 4652 | # 4653 | # if any( (section.Characteristics & SECTION_CHARACTERISTICS['IMAGE_SCN_MEM_NOT_PAGED']) for section in self.sections ): 4654 | # return True 4655 | 4656 | if hasattr(self, 'DIRECTORY_ENTRY_IMPORT'): 4657 | 4658 | # If it imports from "ntoskrnl.exe" or other kernel components it should be a driver 4659 | # 4660 | if set( ('ntoskrnl.exe', 'hal.dll', 'ndis.sys', 'bootvid.dll', 'kdcom.dll' ) ).intersection( [ imp.dll.lower() for imp in self.DIRECTORY_ENTRY_IMPORT ] ): 4661 | return True 4662 | 4663 | return False 4664 | 4665 | 4666 | def get_overlay_data_start_offset(self): 4667 | """Get the offset of data appended to the file and not contained within the area described in the headers.""" 4668 | 4669 | highest_PointerToRawData = 0 4670 | highest_SizeOfRawData = 0 4671 | for section in self.sections: 4672 | 4673 | # If a section seems to fall outside the boundaries of the file we assume it's either 4674 | # because of intentionally misleading values or because the file is truncated 4675 | # In either case we skip it 4676 | if section.PointerToRawData + section.SizeOfRawData > len(self.__data__): 4677 | continue 4678 | 4679 | if section.PointerToRawData + section.SizeOfRawData > highest_PointerToRawData + highest_SizeOfRawData: 4680 | highest_PointerToRawData = section.PointerToRawData 4681 | highest_SizeOfRawData = section.SizeOfRawData 4682 | 4683 | if len(self.__data__) > highest_PointerToRawData + highest_SizeOfRawData: 4684 | return highest_PointerToRawData + highest_SizeOfRawData 4685 | 4686 | return None 4687 | 4688 | 4689 | def get_overlay(self): 4690 | """Get the data appended to the file and not contained within the area described in the headers.""" 4691 | 4692 | overlay_data_offset = self.get_overlay_data_start_offset() 4693 | 4694 | if overlay_data_offset is not None: 4695 | return self.__data__[ overlay_data_offset : ] 4696 | 4697 | return None 4698 | 4699 | 4700 | def trim(self): 4701 | """Return the just data defined by the PE headers, removing any overlayed data.""" 4702 | 4703 | overlay_data_offset = self.get_overlay_data_start_offset() 4704 | 4705 | if overlay_data_offset is not None: 4706 | return self.__data__[ : overlay_data_offset ] 4707 | 4708 | return self.__data__[:] 4709 | 4710 | 4711 | # According to http://corkami.blogspot.com/2010/01/parce-que-la-planche-aura-brule.html 4712 | # if PointerToRawData is less that 0x200 it's rounded to zero. Loading the test file 4713 | # in a debugger it's easy to verify that the PointerToRawData value of 1 is rounded 4714 | # to zero. Hence we reproduce the behabior 4715 | # 4716 | # According to the document: 4717 | # [ Microsoft Portable Executable and Common Object File Format Specification ] 4718 | # "The alignment factor (in bytes) that is used to align the raw data of sections in 4719 | # the image file. The value should be a power of 2 between 512 and 64 K, inclusive. 4720 | # The default is 512. If the SectionAlignment is less than the architecture’s page 4721 | # size, then FileAlignment must match SectionAlignment." 4722 | # 4723 | # The following is a hardcoded constant if the Windows loader 4724 | def adjust_FileAlignment( self, val, file_alignment ): 4725 | global FileAlignment_Warning 4726 | if file_alignment > FILE_ALIGNEMNT_HARDCODED_VALUE: 4727 | # If it's not a power of two, report it: 4728 | if not power_of_two(file_alignment) and FileAlignment_Warning is False: 4729 | self.__warnings.append( 4730 | 'If FileAlignment > 0x200 it should be a power of 2. Value: %x' % ( 4731 | file_alignment) ) 4732 | FileAlignment_Warning = True 4733 | 4734 | if file_alignment < FILE_ALIGNEMNT_HARDCODED_VALUE: 4735 | return val 4736 | return (val / 0x200) * 0x200 4737 | 4738 | 4739 | # According to the document: 4740 | # [ Microsoft Portable Executable and Common Object File Format Specification ] 4741 | # "The alignment (in bytes) of sections when they are loaded into memory. It must be 4742 | # greater than or equal to FileAlignment. The default is the page size for the 4743 | # architecture." 4744 | # 4745 | def adjust_SectionAlignment( self, val, section_alignment, file_alignment ): 4746 | global SectionAlignment_Warning 4747 | if file_alignment < FILE_ALIGNEMNT_HARDCODED_VALUE: 4748 | if file_alignment != section_alignment and SectionAlignment_Warning is False: 4749 | self.__warnings.append( 4750 | 'If FileAlignment(%x) < 0x200 it should equal SectionAlignment(%x)' % ( 4751 | file_alignment, section_alignment) ) 4752 | SectionAlignment_Warning = True 4753 | 4754 | if section_alignment < 0x1000: # page size 4755 | section_alignment = file_alignment 4756 | 4757 | # 0x200 is the minimum valid FileAlignment according to the documentation 4758 | # although ntoskrnl.exe has an alignment of 0x80 in some Windows versions 4759 | # 4760 | #elif section_alignment < 0x80: 4761 | # section_alignment = 0x80 4762 | 4763 | if section_alignment and val % section_alignment: 4764 | return section_alignment * ( val / section_alignment ) 4765 | return val 4766 | 4767 | -------------------------------------------------------------------------------- /modules/unknown_blacklist.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/unknown_blacklist.txt -------------------------------------------------------------------------------- /modules/unknown_regexblacklist.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/unknown_regexblacklist.txt -------------------------------------------------------------------------------- /yaraGenerator.py: -------------------------------------------------------------------------------- 1 | #! /usr/bin/python 2 | # YaraGenerator Will Automatically Build Yara Rules For Malware Families 3 | # As of Yet this Only Works Well With Executables 4 | # Copyright 2013 Chris Clark chris@xenosec.org 5 | # Released under GPL3 Licence 6 | 7 | import re, sys, os, argparse, hashlib, random, email 8 | from datetime import datetime 9 | 10 | #Ensure Import path is in syspath 11 | pathname = os.path.abspath(os.path.dirname(sys.argv[0])) 12 | sys.path.append(pathname + '/modules') 13 | 14 | 15 | #Make sure requred imports are present 16 | try: 17 | import pefile 18 | except: 19 | print "[!] PEfile not installed or present in ./modules directory" 20 | sys.exit(1) 21 | 22 | def getFiles(workingdir): 23 | global hashList 24 | fileDict = {} 25 | hashList = [] 26 | #get hashes 27 | for f in os.listdir(workingdir): 28 | if os.path.isfile(workingdir + f) and not f.startswith("."): 29 | fhash = md5sum(workingdir + f) 30 | fileDict[fhash] = workingdir + f 31 | hashList.append(fhash) 32 | if len(fileDict) == 0: 33 | print "[!] No Files Present in \"" + workingdir +"\"" 34 | sys.exit(1) 35 | else: 36 | return fileDict 37 | 38 | 39 | #Use PEfile for executables and remove import/api calls from sigs 40 | def exeImportsFuncs(filename, allstrings): 41 | try: 42 | pe = pefile.PE(filename) 43 | importlist = [] 44 | for entry in pe.DIRECTORY_ENTRY_IMPORT: 45 | importlist.append(entry.dll) 46 | for imp in entry.imports: 47 | importlist.append(imp.name) 48 | for imp in importlist: 49 | if imp in allstrings: allstrings.remove(imp) 50 | if len(allstrings) > 0: 51 | return list(set(allstrings)) 52 | else: 53 | print '[!] No Extractable Attributes Present in Hash: '+str(md5sum(filename)) + ' Please Remove it from the Sample Set and Try Again!' 54 | sys.exit(1) 55 | except: 56 | return allstrings 57 | 58 | 59 | #EML File parsing, and comparision based on dictionary entries .... plus regexes looking for domains/links in text/html 60 | def emailParse(filename): 61 | try: 62 | def emailStrings(text): 63 | #same as normal string extract except for " " so each word will be isolated, and nuking <>,. to excude HTML tags and punctuation 64 | chars = r"A-Za-z0-9/\-:_$%@'()\\\{\};\]\[" 65 | regexp = '[%s]{%d,100}' % (chars, 6) 66 | pattern = re.compile(regexp) 67 | strlist = pattern.findall(text) 68 | return strlist 69 | 70 | uselesskeys = ['DKIM-Signature', 'X-SENDER-REPUTATION', 'References', 'To', 'Delivered-To', 'Received','Message-ID', 'MIME-Version','In-Reply-To', 'Date', 'Content-Type', 'X-Original-To'] 71 | emailfile = open(filename, 'r') 72 | msg = email.message_from_file(emailfile) 73 | emaildict = dict(msg.items()) 74 | if len(emaildict) == 0: 75 | print '[!] This File is not an EML File: '+str(md5sum(filename)) + ' Please Remove it from the Sample Set or Select Proper FileType!' 76 | sys.exit(1) 77 | for uselesskey in uselesskeys: 78 | if uselesskey in emaildict: 79 | del emaildict[uselesskey] 80 | emaillist = [] 81 | for part in msg.walk(): 82 | part_ct = str(part.get_content_type()) 83 | if "plain" in part_ct: 84 | bodyplain = part.get_payload(decode=True) 85 | # emaildict['Body-Plaintxt'] = list(set(emailStrings(bodyplain))) 86 | textlinks = linkSearch(bodyplain) 87 | if textlinks: 88 | emaildict['Body-Links'] = textlinks 89 | if "html" in part_ct: 90 | bodyhtml = part.get_payload(decode=True) 91 | # emaildict['Body-HTML'] = list(set(emailStrings(bodyhtml))) 92 | htmllinks = linkSearch(bodyhtml) 93 | if htmllinks: 94 | emaildict['Body-Links'] = htmllinks 95 | if "application" in part_ct: 96 | if part.get_filename(): 97 | emaildict['attachmentName'] = part.get_filename() 98 | for key, value in emaildict.iteritems(): 99 | if isinstance(value, list): 100 | for subval in value: 101 | emaillist.append(subval) 102 | else: 103 | emaillist.append(value) 104 | return emaillist 105 | except Exception: 106 | print '[!] This File is not an EML File: '+str(md5sum(filename)) + ' Please Remove it from the Sample Set or Select Proper FileType!' 107 | sys.exit(1) 108 | 109 | def linkSearch(attachment): 110 | urls = list(set(re.compile('(?:ftp|hxxp)[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', re.I).findall(attachment))) 111 | return urls 112 | 113 | #Simple String / ASCII Wide and URL string extraction 114 | def getStrings(filename): 115 | try: 116 | data = open(filename,'rb').read() 117 | chars = r"A-Za-z0-9/\-:.,_$%@'()\\\{\};\]\[<> " 118 | regexp = '[%s]{%d,100}' % (chars, 6) 119 | pattern = re.compile(regexp) 120 | strlist = pattern.findall(data) 121 | #Get Wide Strings 122 | unicode_str = re.compile( ur'(?:[\x20-\x7E][\x00]){6,100}',re.UNICODE ) 123 | unicodelist = unicode_str.findall(data) 124 | allstrings = unicodelist + strlist 125 | #Extract URLs if present 126 | exeurls = linkSearch(data) 127 | if exeurls: 128 | for url in exeurls: 129 | allstrings.append(url) 130 | # use pefile to extract names of imports and function calls and remove them from string list 131 | if len(allstrings) > 0: 132 | return list(set(allstrings)) 133 | else: 134 | print '[!] No Extractable Attributes Present in Hash: '+str(md5sum(filename)) + ' Please Remove it from the Sample Set and Try Again!' 135 | sys.exit(1) 136 | except Exception: 137 | print '[!] No Extractable Attributes Present in Hash: '+str(md5sum(filename)) + ' Please Remove it from the Sample Set and Try Again!' 138 | sys.exit(1) 139 | 140 | def md5sum(filename): 141 | fh = open(filename, 'rb') 142 | m = hashlib.md5() 143 | while True: 144 | data = fh.read(8192) 145 | if not data: 146 | break 147 | m.update(data) 148 | return m.hexdigest() 149 | 150 | 151 | #find common strings and check against filetype specific blacklists 152 | def findCommonStrings(fileDict, filetype): 153 | baseStringList = random.choice(fileDict.values()) 154 | finalStringList = [] 155 | matchNumber = len(fileDict) 156 | for s in baseStringList: 157 | sNum = 0 158 | for key, value in fileDict.iteritems(): 159 | if s in value: 160 | sNum +=1 161 | if sNum == matchNumber: 162 | finalStringList.append(s) 163 | 164 | #import and use filetype specific blacklist/regexlist to exclude unwanted sig material 165 | #Various utility functions to extract strings/data/info and isolate signature material 166 | with open(pathname +'/modules/'+filetype+'_blacklist.txt') as f: 167 | blacklist = f.read().splitlines() 168 | with open(pathname +'/modules/'+filetype+'_regexblacklist.txt') as f: 169 | regblacklist = f.read().splitlines() 170 | #Match Against Blacklist 171 | for black in blacklist: 172 | if black in finalStringList: finalStringList.remove(black) 173 | #Match Against Regex Blacklist 174 | regmatchlist = [] 175 | for regblack in regblacklist: 176 | for string in finalStringList: 177 | regex = re.compile(regblack) 178 | if regex.search(string): regmatchlist.append(string) 179 | if len(regmatchlist) > 0: 180 | for match in list(set(regmatchlist)): 181 | finalStringList.remove(match) 182 | 183 | return finalStringList 184 | 185 | #Build the actual rule 186 | def buildYara(options, strings, hashes): 187 | date = datetime.now().strftime("%Y-%m-%d") 188 | randStrings = [] 189 | #Ensure we have shared attributes and select twenty 190 | try: 191 | for i in range(1,20): 192 | randStrings.append(random.choice(strings)) 193 | except IndexError: 194 | print '[!] No Common Attributes Found For All Samples, Please Be More Selective' 195 | sys.exit(1) 196 | 197 | #Prioritize based on specific filetype 198 | if options.FileType == 'email': 199 | for string in strings: 200 | if "@" in string: 201 | randStrings.append(string) 202 | if "." in string: 203 | randStrings.append(string) 204 | 205 | #Remove Duplicates 206 | randStrings = list(set(randStrings)) 207 | 208 | ruleOutFile = open(options.RuleName + ".yar", "w") 209 | ruleOutFile.write("rule "+options.RuleName) 210 | if options.Tags: 211 | ruleOutFile.write(" : " + options.Tags) 212 | ruleOutFile.write("\n") 213 | ruleOutFile.write("{\n") 214 | ruleOutFile.write("meta:\n") 215 | ruleOutFile.write("\tauthor = \""+ options.Author + "\"\n") 216 | ruleOutFile.write("\tdate = \""+ date +"\"\n") 217 | ruleOutFile.write("\tdescription = \""+ options.Description + "\"\n") 218 | for h in hashes: 219 | ruleOutFile.write("\thash"+str(hashes.index(h))+" = \""+ h + "\"\n") 220 | ruleOutFile.write("\tsample_filetype = \""+ options.FileType + "\"\n") 221 | ruleOutFile.write("\tyaragenerator = \"https://github.com/Xen0ph0n/YaraGenerator\"\n") 222 | ruleOutFile.write("strings:\n") 223 | for s in randStrings: 224 | if "\x00" in s: 225 | ruleOutFile.write("\t$string"+str(randStrings.index(s))+" = \""+ s.replace("\\","\\\\").replace('"','\\"').replace("\x00","") +"\" wide\n") 226 | else: 227 | ruleOutFile.write("\t$string"+str(randStrings.index(s))+" = \""+ s.replace("\\","\\\\") +"\"\n") 228 | ruleOutFile.write("condition:\n") 229 | if options.FileType == 'email': 230 | ruleOutFile.write("\t any of them\n") 231 | else: 232 | ruleOutFile.write("\t"+str(len(randStrings) - 1)+" of them\n") 233 | ruleOutFile.write("}\n") 234 | ruleOutFile.close() 235 | return 236 | 237 | #Per filetype execution paths 238 | def unknownFile(fileDict): 239 | #Unknown is the default and will mirror executable excepting the blacklist 240 | for fhash, path in fileDict.iteritems(): 241 | fileDict[fhash] = getStrings(path) 242 | finalStringList = findCommonStrings(fileDict, 'unknown') 243 | return finalStringList 244 | 245 | def exeFile(fileDict): 246 | for fhash, path in fileDict.iteritems(): 247 | fileDict[fhash] = exeImportsFuncs(path, getStrings(path)) 248 | finalStringList = findCommonStrings(fileDict, 'exe') 249 | return finalStringList 250 | 251 | def pdfFile(fileDict): 252 | for fhash, path in fileDict.iteritems(): 253 | fileDict[fhash] = getStrings(path) 254 | finalStringList = findCommonStrings(fileDict, 'pdf') 255 | return finalStringList 256 | 257 | def emailFile(fileDict): 258 | for fhash, path in fileDict.iteritems(): 259 | fileDict[fhash] = emailParse(path) 260 | finalStringList = findCommonStrings(fileDict, 'email') 261 | return finalStringList 262 | 263 | def officeFile(fileDict): 264 | for fhash, path in fileDict.iteritems(): 265 | fileDict[fhash] = getStrings(path) 266 | finalStringList = findCommonStrings(fileDict, 'office') 267 | return finalStringList 268 | 269 | def jshtmlFile(fileDict): 270 | for fhash, path in fileDict.iteritems(): 271 | fileDict[fhash] = getStrings(path) 272 | finalStringList = findCommonStrings(fileDict, 'jshtml') 273 | return finalStringList 274 | 275 | #Main 276 | def main(): 277 | filetypeoptions = ['unknown','exe','pdf','email','office','js-html'] 278 | opt = argparse.ArgumentParser(description="YaraGenerator") 279 | opt.add_argument("InputDirectory", help="Path To Files To Create Yara Rule From") 280 | opt.add_argument("-r", "--RuleName", required=True , help="Enter A Rule/Alert Name (No Spaces + Must Start with Letter)") 281 | opt.add_argument("-a", "--Author", default="Anonymous", help="Enter Author Name") 282 | opt.add_argument("-d", "--Description",default="No Description Provided",help="Provide a useful description of the Yara Rule") 283 | opt.add_argument("-t", "--Tags",default="",help="Apply Tags to Yara Rule For Easy Reference (AlphaNumeric)") 284 | opt.add_argument("-v", "--Verbose",default=False,action="store_true", help= "Print Finished Rule To Standard Out") 285 | opt.add_argument("-f", "--FileType", required=True, default='unknown',choices=filetypeoptions, help= "Select Sample Set FileType choices are: "+', '.join(filetypeoptions), metavar="") 286 | if len(sys.argv)<=3: 287 | opt.print_help() 288 | sys.exit(1) 289 | options = opt.parse_args() 290 | if " " in options.RuleName or not options.RuleName[0].isalpha(): 291 | print "[!] Rule Name Can Not Contain Spaces or Begin With A Non Alpha Character" 292 | 293 | 294 | #Get Filenames and hashes 295 | fileDict = getFiles(options.InputDirectory) 296 | print "\n[+] Generating Yara Rule " + options.RuleName + " from files located in: " + options.InputDirectory 297 | 298 | #Begin per-filetype processing paths 299 | if options.FileType == 'exe': 300 | finalStringList = exeFile(fileDict) 301 | elif options.FileType == 'pdf': 302 | finalStringList = pdfFile(fileDict) 303 | elif options.FileType == 'email': 304 | finalStringList = emailFile(fileDict) 305 | elif options.FileType == 'office': 306 | finalStringList = officeFile(fileDict) 307 | elif options.FileType == 'js-html': 308 | finalStringList = jshtmlFile(fileDict) 309 | else: 310 | finalStringList = unknownFile(fileDict) 311 | 312 | #Build and Write Yara Rule 313 | global hashList 314 | buildYara(options, finalStringList, hashList) 315 | print "\n[+] Yara Rule Generated: "+options.RuleName+".yar\n" 316 | print " [+] Files Examined: " + str(hashList) 317 | print " [+] Author Credited: " + options.Author 318 | print " [+] Rule Description: " + options.Description 319 | if options.Tags: 320 | print " [+] Rule Tags: " + options.Tags +"\n" 321 | if options.Verbose: 322 | print "[+] Rule Below:\n" 323 | with open(options.RuleName + ".yar", 'r') as donerule: 324 | print donerule.read() 325 | 326 | print "[+] YaraGenerator (C) 2013 Chris@xenosec.org https://github.com/Xen0ph0n/YaraGenerator" 327 | 328 | 329 | if __name__ == "__main__": 330 | main() --------------------------------------------------------------------------------