├── README.md
├── modules
    ├── email_blacklist.txt
    ├── email_regexblacklist.txt
    ├── exe_blacklist.txt
    ├── exe_regexblacklist.txt
    ├── jshtml_blacklist.txt
    ├── jshtml_regexblacklist.txt
    ├── office_blacklist.txt
    ├── office_regexblacklist.txt
    ├── pdf_blacklist.txt
    ├── pdf_regexblacklist.txt
    ├── pefile.py
    ├── unknown_blacklist.txt
    └── unknown_regexblacklist.txt
└── yaraGenerator.py


/README.md:
--------------------------------------------------------------------------------
  1 | ### Information
  2 | This is a project to build a tool to attempt to allow for quick, simple, and effective yara rule creation to isolate malware families and other malicious objects of interest. This is an experiment and thus far I've had pretty good success with it. It is a work in progress and I welcome forks and feedback!
  3 | 
  4 | To utilize this you must find a few files from a malware family, or if not executables then containing the attribute of interest, you wish to profile, (the more the better, three to four samples seems to be effective for malware, however to isolate exploits in carrier documents it often takes many more). Please note however that this tool will only be as precise as you are in choosing what you are looking for... visit http://yaragenerator.com for a webapplication version of this tool. 
  5 | 
  6 | ### Version and Updates
  7 | 0.6.1 - Added logic for parsing and prioritizing strings/emails/headers from emails (must submit in .eml file in order for python library to parse it properly). Added per filetype string prioritization logic (IE include all email addresses and IP's common across emails before random words from email bodys). Due to targeted parsing, effective signatures can be built from a single email. Also boolean logic for email rules is "all of them" to allow for future variance in delivery methods.
  8 | 
  9 | 0.6 - Refactored all of the code to allow for selectable filetype of samples (-f). This allows for entirely different signature generation for PDFs vs EXEs vs EMails. In addition to dispirate execution paths, each filetype has it's own string blacklist and regexlist to exclude unwanted things such as your gateway, usernames, @yourco.com etc. (Note: No custom per file code exists for anything beyond executables at this point, but the framework is now there)
 10 | 
 11 | 0.5 - Added Regexes in modules/regexblacklist.txt which will remove matches from potential strings included in yara rules also added 30K strings to blacklist. Lowered hit requirment from 100 to 95% to allow more true positives from slight variants (example change of embeded C2 or UA)
 12 | 
 13 | 0.4 - Added PEfile (http://code.google.com/p/pefile/) to extract and remove imports and functions from yara rules, added blacklist.txt to remove unwanted strings
 14 | 
 15 | 0.3 - Added support for Tags, Unicode Wide Strings (Automatically Adds "wide" tag)
 16 | 
 17 | 0.2 - Updated CLI and error handeling, removed hidden files, and ignored subdirectories
 18 | 
 19 | 0.1 - Released, supports regular string extraction
 20 | 
 21 | ### ToDo
 22 | [+] Allow for scanning of benign/baseline files to automatically populate blacklists for various filetypes
 23 | 
 24 | [+] Create custom execution paths leveraging opensource tools for various filetypes (IE email/pdf/office docs ..etc)
 25 | 
 26 | [+] Continue to improve fidelity and flexibility of algos and underlying methodologies to generate signatures
 27 | 
 28 | 
 29 | ### Example
 30 | 
 31 | Usage is as follows with an example of a basic search +  hitting all of
 32 | the switches below:
 33 | ```
 34 | 
 35 | usage: yaraGenerator.py [-h] -r RULENAME -f FILETYPE [-a AUTHOR] [-d DESCRIPTION] [-t TAGS] InputDirectory 
 36 | 
 37 | YaraGenerator
 38 | 
 39 | positional arguments:
 40 |   InputDirectory        Path To Files To Create Yara Rule From
 41 | 
 42 | optional arguments:
 43 |   -h , --help             show this help message and exit
 44 |   -r , --RuleName         Enter A Rule/Alert Name (No Spaces + Must Start with Letter)
 45 |   -a , --Author           Enter Author Name
 46 |   -d , --Description      Provide a useful description of the Yara Rule
 47 |   -t , --Tags             Apply Tags to Yara Rule For Easy Reference (AlphaNumeric)
 48 |   -v , --Verbose          Print Finished Rule To Standard Out
 49 |   -f , --FileType         Select Sample Set FileType choices are: unknown, exe,
 50 |                           pdf, email, office, js-html
 51 | ```
 52 | 
 53 | The blacklist.txt file in the /modules directory allows entry of one string per line, these strings will never appear in a rule generated by YaraGenerator.
 54 | 
 55 | The regexblacklist.txt in the /modules directory allows entry of one Regular Expression per line. * Remember to use ^ and $ for the begining and end of a string if you wish to match exactly* YaraGenerator will disqualify any string which hits on any regex in the list from input into a Yara Rule.
 56 | 
 57 | Example for a Specific Family of APT1 Malware:
 58 | 
 59 | ```
 60 | python yaraGenerator.py ../greencat/ -r Win_Trojan_APT1_GreenCat -a "Chris Clark" -d "APT Trojan Comment Panda" -t "APT" -f "exe"
 61 | 
 62 | [+] Generating Yara Rule Win_Trojan_APT1_GreenCat from files located in: ../greencat/
 63 | 
 64 | [+] Yara Rule Generated: Win_Trojan_APT1_GreenCat.yar
 65 | 
 66 |   [+] Files Examined: ['871cc547feb9dbec0285321068e392b8', '6570163cd34454b3d1476c134d44b9d9', '57e79f7df13c0cb01910d0c688fcd296']
 67 |   [+] Author Credited: Chris Clark
 68 |   [+] Rule Description: APT Trojan Comment Panda
 69 |   [+] Rule Tags: APT
 70 | 
 71 | [+] YaraGenerator (C) 2013 Chris@xenosec.org https://github.com/Xen0ph0n/YaraGenerator
 72 | ```
 73 | Resulting Yara Rules:
 74 | ```
 75 | rule Win_Trojan_APT_APT1_Greencat : APT
 76 | {
 77 | meta:
 78 |   author = "Chris Clark"
 79 |   date = "2013-06-04"
 80 |   description = "APT Trojan Comment Crew Greencat"
 81 |   hash0 = "57e79f7df13c0cb01910d0c688fcd296"
 82 |   hash1 = "871cc547feb9dbec0285321068e392b8"
 83 |   hash2 = "6570163cd34454b3d1476c134d44b9d9"
 84 |   sample_filetype = "exe"
 85 |   yaragenerator = "https://github.com/Xen0ph0n/YaraGenerator"
 86 | strings:
 87 |   $string0 = "Ramdisk"
 88 |   $string1 = "Cache-Control:max-age"
 89 |   $string2 = "YYSSSSS"
 90 |   $string3 = "\\cmd.exe"
 91 |   $string4 = "Translation" wide
 92 |   $string5 = "CD-ROM"
 93 |   $string6 = "Mozilla/5.0"
 94 |   $string7 = "Volume on this computer:"
 95 |   $string8 = "pidrun"
 96 |   $string9 = "3@YAXPAX@Z"
 97 |   $string10 = "SMAgent.exe" wide
 98 |   $string11 = "Shell started successfully"
 99 |   $string12 = "Content-Length: %d"
100 |   $string13 = "t4j SV3"
101 |   $string14 = "Program started"
102 |   $string15 = "Started already,"
103 |   $string16 = "SoundMAX service agent" wide
104 | condition:
105 |   16 of them
106 | }
107 | 
108 | 
109 | ```
110 | ### Results
111 | 
112 | GreenCat Rule:
113 | 
114 | ```
115 | 100% Hits on Test Samples:
116 | 
117 | $ yara -rg Trojan_Win_GreenCat.yar greencat/
118 | Trojan_Win_GreenCat [APT] ../greencat//8bf5a9e8d5bc1f44133c3f118fe8ca1701d9665a72b3893f509367905feb0a00
119 | Trojan_Win_GreenCat [APT] ../greencat//c196cac319e5c55e8169b6ed6930a10359b3db322abe8f00ed8cb83cf0888d3b
120 | Trojan_Win_GreenCat [APT] ../greencat//c23039cf2f859e659e59ec362277321fbcdac680e6d9bc93fc03c8971333c25e
121 | 
122 | 100% True Positives On Other Samples In the APT1 Cadre which were detected as Green Cat By Other Yara Rules:
123 | 
124 | $ yara -r Trojan_Win_GreenCat.yar .Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//1877a5d2f9c415109a8ac323f43be1dc10c546a72ab7207a96c6e6e71a132956
125 | Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//20ed6218575155517f19d4ce46a9addbf49dcadb8f5d7bd93efdccfe1925c7d0
126 | Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//4144820d9b31c4d3c54025a4368b32f727077c3ec253753360349a783846747f
127 | Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//4487b345f63d20c6b91eec8ee86c307911b1f2c3e29f337aa96a4a238bf2e87c
128 | Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//8bf5a9e8d5bc1f44133c3f118fe8ca1701d9665a72b3893f509367905feb0a00
129 | Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//c196cac319e5c55e8169b6ed6930a10359b3db322abe8f00ed8cb83cf0888d3b
130 | Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//c23039cf2f859e659e59ec362277321fbcdac680e6d9bc93fc03c8971333c25e
131 | Win_Trojan_APT1_GreenCat [APT] ../../MalwareSamples/APT1Malware//f76dd93b10fc173eaf901ff1fb00ff8a9e1f31e3bd86e00ff773b244b54292c5
132 | 
133 | 100% True Negatives on clean files:
134 | 
135 | $ yara -r Trojan_Win_GreenCat.yar ../../CleanFiles/
136 | 
137 | ```
138 | 
139 | ### Author & License
140 | 
141 | YaraGenerator is copyrighted by Chris Clark 2013. Contact me at Chris@xenosys.org
142 | 
143 | YaraGenerator is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
144 | YaraGenerator is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
145 | 
146 | You should have received a copy of the GNU General Public License along with YaraGenerator. If not, see http://www.gnu.org/licenses/.
147 | 
148 | 


--------------------------------------------------------------------------------
/modules/email_blacklist.txt:
--------------------------------------------------------------------------------
1 | undisclosed-recipients:;


--------------------------------------------------------------------------------
/modules/email_regexblacklist.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/email_regexblacklist.txt


--------------------------------------------------------------------------------
/modules/exe_regexblacklist.txt:
--------------------------------------------------------------------------------
1 | ^thisisaplaceholder$


--------------------------------------------------------------------------------
/modules/jshtml_blacklist.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/jshtml_blacklist.txt


--------------------------------------------------------------------------------
/modules/jshtml_regexblacklist.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/jshtml_regexblacklist.txt


--------------------------------------------------------------------------------
/modules/office_blacklist.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/office_blacklist.txt


--------------------------------------------------------------------------------
/modules/office_regexblacklist.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/office_regexblacklist.txt


--------------------------------------------------------------------------------
/modules/pdf_blacklist.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/pdf_blacklist.txt


--------------------------------------------------------------------------------
/modules/pdf_regexblacklist.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/pdf_regexblacklist.txt


--------------------------------------------------------------------------------
/modules/pefile.py:
--------------------------------------------------------------------------------
   1 | # -*- coding: Latin-1 -*-
   2 | """pefile, Portable Executable reader module
   3 | 
   4 | 
   5 | All the PE file basic structures are available with their default names
   6 | as attributes of the instance returned.
   7 | 
   8 | Processed elements such as the import table are made available with lowercase
   9 | names, to differentiate them from the upper case basic structure names.
  10 | 
  11 | pefile has been tested against the limits of valid PE headers, that is, malware.
  12 | Lots of packed malware attempt to abuse the format way beyond its standard use.
  13 | To the best of my knowledge most of the abuses are handled gracefully.
  14 | 
  15 | Copyright (c) 2005-2011 Ero Carrera <ero.carrera@gmail.com>
  16 | 
  17 | All rights reserved.
  18 | 
  19 | For detailed copyright information see the file COPYING in
  20 | the root of the distribution archive.
  21 | """
  22 | 
  23 | __revision__ = "$LastChangedRevision: 114 $"
  24 | __author__ = 'Ero Carrera'
  25 | __version__ = '1.2.10-%d' % int( __revision__[21:-2] )
  26 | __contact__ = 'ero.carrera@gmail.com'
  27 | 
  28 | 
  29 | import os
  30 | import struct
  31 | import time
  32 | import math
  33 | import re
  34 | import exceptions
  35 | import string
  36 | import array
  37 | import mmap
  38 | 
  39 | sha1, sha256, sha512, md5 = None, None, None, None
  40 | 
  41 | try:
  42 |     import hashlib
  43 |     sha1 = hashlib.sha1
  44 |     sha256 = hashlib.sha256
  45 |     sha512 = hashlib.sha512
  46 |     md5 = hashlib.md5
  47 | except ImportError:
  48 |     try:
  49 |         import sha
  50 |         sha1 = sha.new
  51 |     except ImportError:
  52 |         pass
  53 |     try:
  54 |         import md5
  55 |         md5 = md5.new
  56 |     except ImportError:
  57 |         pass
  58 | 
  59 | try:
  60 |     enumerate
  61 | except NameError:
  62 |     def enumerate(iter):
  63 |         L = list(iter)
  64 |         return zip(range(0, len(L)), L)
  65 | 
  66 | 
  67 | fast_load = False
  68 | 
  69 | # This will set a maximum length of a string to be retrieved from the file.
  70 | # It's there to prevent loading massive amounts of data from memory mapped
  71 | # files. Strings longer than 1MB should be rather rare.
  72 | MAX_STRING_LENGTH = 0x100000 # 2^20 
  73 | 
  74 | IMAGE_DOS_SIGNATURE             = 0x5A4D
  75 | IMAGE_DOSZM_SIGNATURE           = 0x4D5A
  76 | IMAGE_NE_SIGNATURE              = 0x454E
  77 | IMAGE_LE_SIGNATURE              = 0x454C
  78 | IMAGE_LX_SIGNATURE              = 0x584C
  79 | 
  80 | IMAGE_NT_SIGNATURE              = 0x00004550
  81 | IMAGE_NUMBEROF_DIRECTORY_ENTRIES= 16
  82 | IMAGE_ORDINAL_FLAG              = 0x80000000L
  83 | IMAGE_ORDINAL_FLAG64            = 0x8000000000000000L
  84 | OPTIONAL_HEADER_MAGIC_PE        = 0x10b
  85 | OPTIONAL_HEADER_MAGIC_PE_PLUS   = 0x20b
  86 | 
  87 | 
  88 | directory_entry_types = [
  89 |     ('IMAGE_DIRECTORY_ENTRY_EXPORT',        0),
  90 |     ('IMAGE_DIRECTORY_ENTRY_IMPORT',        1),
  91 |     ('IMAGE_DIRECTORY_ENTRY_RESOURCE',      2),
  92 |     ('IMAGE_DIRECTORY_ENTRY_EXCEPTION',     3),
  93 |     ('IMAGE_DIRECTORY_ENTRY_SECURITY',      4),
  94 |     ('IMAGE_DIRECTORY_ENTRY_BASERELOC',     5),
  95 |     ('IMAGE_DIRECTORY_ENTRY_DEBUG',         6),
  96 |     ('IMAGE_DIRECTORY_ENTRY_COPYRIGHT',     7),
  97 |     ('IMAGE_DIRECTORY_ENTRY_GLOBALPTR',     8),
  98 |     ('IMAGE_DIRECTORY_ENTRY_TLS',           9),
  99 |     ('IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG',   10),
 100 |     ('IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT',  11),
 101 |     ('IMAGE_DIRECTORY_ENTRY_IAT',           12),
 102 |     ('IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT',  13),
 103 |     ('IMAGE_DIRECTORY_ENTRY_COM_DESCRIPTOR',14),
 104 |     ('IMAGE_DIRECTORY_ENTRY_RESERVED',      15) ]
 105 | 
 106 | DIRECTORY_ENTRY = dict([(e[1], e[0]) for e in directory_entry_types]+directory_entry_types)
 107 | 
 108 | 
 109 | image_characteristics = [
 110 |     ('IMAGE_FILE_RELOCS_STRIPPED',          0x0001),
 111 |     ('IMAGE_FILE_EXECUTABLE_IMAGE',         0x0002),
 112 |     ('IMAGE_FILE_LINE_NUMS_STRIPPED',       0x0004),
 113 |     ('IMAGE_FILE_LOCAL_SYMS_STRIPPED',      0x0008),
 114 |     ('IMAGE_FILE_AGGRESIVE_WS_TRIM',        0x0010),
 115 |     ('IMAGE_FILE_LARGE_ADDRESS_AWARE',      0x0020),
 116 |     ('IMAGE_FILE_16BIT_MACHINE',            0x0040),
 117 |     ('IMAGE_FILE_BYTES_REVERSED_LO',        0x0080),
 118 |     ('IMAGE_FILE_32BIT_MACHINE',            0x0100),
 119 |     ('IMAGE_FILE_DEBUG_STRIPPED',           0x0200),
 120 |     ('IMAGE_FILE_REMOVABLE_RUN_FROM_SWAP',  0x0400),
 121 |     ('IMAGE_FILE_NET_RUN_FROM_SWAP',        0x0800),
 122 |     ('IMAGE_FILE_SYSTEM',                   0x1000),
 123 |     ('IMAGE_FILE_DLL',                      0x2000),
 124 |     ('IMAGE_FILE_UP_SYSTEM_ONLY',           0x4000),
 125 |     ('IMAGE_FILE_BYTES_REVERSED_HI',        0x8000) ]
 126 | 
 127 | IMAGE_CHARACTERISTICS = dict([(e[1], e[0]) for e in
 128 |     image_characteristics]+image_characteristics)
 129 |     
 130 | 
 131 | section_characteristics = [
 132 |     ('IMAGE_SCN_CNT_CODE',                  0x00000020),
 133 |     ('IMAGE_SCN_CNT_INITIALIZED_DATA',      0x00000040),
 134 |     ('IMAGE_SCN_CNT_UNINITIALIZED_DATA',    0x00000080),
 135 |     ('IMAGE_SCN_LNK_OTHER',                 0x00000100),
 136 |     ('IMAGE_SCN_LNK_INFO',                  0x00000200),
 137 |     ('IMAGE_SCN_LNK_REMOVE',                0x00000800),
 138 |     ('IMAGE_SCN_LNK_COMDAT',                0x00001000),
 139 |     ('IMAGE_SCN_MEM_FARDATA',               0x00008000),
 140 |     ('IMAGE_SCN_MEM_PURGEABLE',             0x00020000),
 141 |     ('IMAGE_SCN_MEM_16BIT',                 0x00020000),
 142 |     ('IMAGE_SCN_MEM_LOCKED',                0x00040000),
 143 |     ('IMAGE_SCN_MEM_PRELOAD',               0x00080000),
 144 |     ('IMAGE_SCN_ALIGN_1BYTES',              0x00100000),
 145 |     ('IMAGE_SCN_ALIGN_2BYTES',              0x00200000),
 146 |     ('IMAGE_SCN_ALIGN_4BYTES',              0x00300000),
 147 |     ('IMAGE_SCN_ALIGN_8BYTES',              0x00400000),
 148 |     ('IMAGE_SCN_ALIGN_16BYTES',             0x00500000),
 149 |     ('IMAGE_SCN_ALIGN_32BYTES',             0x00600000),
 150 |     ('IMAGE_SCN_ALIGN_64BYTES',             0x00700000),
 151 |     ('IMAGE_SCN_ALIGN_128BYTES',            0x00800000),
 152 |     ('IMAGE_SCN_ALIGN_256BYTES',            0x00900000),
 153 |     ('IMAGE_SCN_ALIGN_512BYTES',            0x00A00000),
 154 |     ('IMAGE_SCN_ALIGN_1024BYTES',           0x00B00000),
 155 |     ('IMAGE_SCN_ALIGN_2048BYTES',           0x00C00000),
 156 |     ('IMAGE_SCN_ALIGN_4096BYTES',           0x00D00000),
 157 |     ('IMAGE_SCN_ALIGN_8192BYTES',           0x00E00000),
 158 |     ('IMAGE_SCN_ALIGN_MASK',                0x00F00000),
 159 |     ('IMAGE_SCN_LNK_NRELOC_OVFL',           0x01000000),
 160 |     ('IMAGE_SCN_MEM_DISCARDABLE',           0x02000000),
 161 |     ('IMAGE_SCN_MEM_NOT_CACHED',            0x04000000),
 162 |     ('IMAGE_SCN_MEM_NOT_PAGED',             0x08000000),
 163 |     ('IMAGE_SCN_MEM_SHARED',                0x10000000),
 164 |     ('IMAGE_SCN_MEM_EXECUTE',               0x20000000),
 165 |     ('IMAGE_SCN_MEM_READ',                  0x40000000),
 166 |     ('IMAGE_SCN_MEM_WRITE',                 0x80000000L) ]
 167 | 
 168 | SECTION_CHARACTERISTICS = dict([(e[1], e[0]) for e in
 169 |     section_characteristics]+section_characteristics)
 170 | 
 171 | 
 172 | debug_types = [
 173 |     ('IMAGE_DEBUG_TYPE_UNKNOWN',        0),
 174 |     ('IMAGE_DEBUG_TYPE_COFF',           1),
 175 |     ('IMAGE_DEBUG_TYPE_CODEVIEW',       2),
 176 |     ('IMAGE_DEBUG_TYPE_FPO',            3),
 177 |     ('IMAGE_DEBUG_TYPE_MISC',           4),
 178 |     ('IMAGE_DEBUG_TYPE_EXCEPTION',      5),
 179 |     ('IMAGE_DEBUG_TYPE_FIXUP',          6),
 180 |     ('IMAGE_DEBUG_TYPE_OMAP_TO_SRC',    7),
 181 |     ('IMAGE_DEBUG_TYPE_OMAP_FROM_SRC',  8),
 182 |     ('IMAGE_DEBUG_TYPE_BORLAND',        9),
 183 |     ('IMAGE_DEBUG_TYPE_RESERVED10',     10) ]
 184 | 
 185 | DEBUG_TYPE = dict([(e[1], e[0]) for e in debug_types]+debug_types)
 186 | 
 187 | 
 188 | subsystem_types = [
 189 |     ('IMAGE_SUBSYSTEM_UNKNOWN',     0),
 190 |     ('IMAGE_SUBSYSTEM_NATIVE',      1),
 191 |     ('IMAGE_SUBSYSTEM_WINDOWS_GUI', 2),
 192 |     ('IMAGE_SUBSYSTEM_WINDOWS_CUI', 3),
 193 |     ('IMAGE_SUBSYSTEM_OS2_CUI',     5),
 194 |     ('IMAGE_SUBSYSTEM_POSIX_CUI',   7),
 195 |     ('IMAGE_SUBSYSTEM_WINDOWS_CE_GUI',  9),
 196 |     ('IMAGE_SUBSYSTEM_EFI_APPLICATION', 10),
 197 |     ('IMAGE_SUBSYSTEM_EFI_BOOT_SERVICE_DRIVER', 11),
 198 |     ('IMAGE_SUBSYSTEM_EFI_RUNTIME_DRIVER',      12),
 199 |     ('IMAGE_SUBSYSTEM_EFI_ROM',     13),
 200 |     ('IMAGE_SUBSYSTEM_XBOX',        14)]
 201 | 
 202 | SUBSYSTEM_TYPE = dict([(e[1], e[0]) for e in subsystem_types]+subsystem_types)
 203 | 
 204 | 
 205 | machine_types = [
 206 |     ('IMAGE_FILE_MACHINE_UNKNOWN',  0),
 207 |     ('IMAGE_FILE_MACHINE_AM33',     0x1d3),
 208 |     ('IMAGE_FILE_MACHINE_AMD64',    0x8664),
 209 |     ('IMAGE_FILE_MACHINE_ARM',      0x1c0),
 210 |     ('IMAGE_FILE_MACHINE_EBC',      0xebc),
 211 |     ('IMAGE_FILE_MACHINE_I386',     0x14c),
 212 |     ('IMAGE_FILE_MACHINE_IA64',     0x200),
 213 |     ('IMAGE_FILE_MACHINE_MR32',     0x9041),
 214 |     ('IMAGE_FILE_MACHINE_MIPS16',   0x266),
 215 |     ('IMAGE_FILE_MACHINE_MIPSFPU',  0x366),
 216 |     ('IMAGE_FILE_MACHINE_MIPSFPU16',0x466),
 217 |     ('IMAGE_FILE_MACHINE_POWERPC',  0x1f0),
 218 |     ('IMAGE_FILE_MACHINE_POWERPCFP',0x1f1),
 219 |     ('IMAGE_FILE_MACHINE_R4000',    0x166),
 220 |     ('IMAGE_FILE_MACHINE_SH3',      0x1a2),
 221 |     ('IMAGE_FILE_MACHINE_SH3DSP',   0x1a3),
 222 |     ('IMAGE_FILE_MACHINE_SH4',      0x1a6),
 223 |     ('IMAGE_FILE_MACHINE_SH5',      0x1a8),
 224 |     ('IMAGE_FILE_MACHINE_THUMB',    0x1c2),
 225 |     ('IMAGE_FILE_MACHINE_WCEMIPSV2',0x169),
 226 |  ]
 227 | 
 228 | MACHINE_TYPE = dict([(e[1], e[0]) for e in machine_types]+machine_types)
 229 | 
 230 | 
 231 | relocation_types = [
 232 |     ('IMAGE_REL_BASED_ABSOLUTE',        0),
 233 |     ('IMAGE_REL_BASED_HIGH',            1),
 234 |     ('IMAGE_REL_BASED_LOW',             2),
 235 |     ('IMAGE_REL_BASED_HIGHLOW',         3),
 236 |     ('IMAGE_REL_BASED_HIGHADJ',         4),
 237 |     ('IMAGE_REL_BASED_MIPS_JMPADDR',    5),
 238 |     ('IMAGE_REL_BASED_SECTION',         6),
 239 |     ('IMAGE_REL_BASED_REL',             7),
 240 |     ('IMAGE_REL_BASED_MIPS_JMPADDR16',  9),
 241 |     ('IMAGE_REL_BASED_IA64_IMM64',      9),
 242 |     ('IMAGE_REL_BASED_DIR64',           10),
 243 |     ('IMAGE_REL_BASED_HIGH3ADJ',        11) ]
 244 | 
 245 | RELOCATION_TYPE = dict([(e[1], e[0]) for e in relocation_types]+relocation_types)
 246 | 
 247 | 
 248 | dll_characteristics = [
 249 |     ('IMAGE_DLL_CHARACTERISTICS_RESERVED_0x0001', 0x0001),
 250 |     ('IMAGE_DLL_CHARACTERISTICS_RESERVED_0x0002', 0x0002),
 251 |     ('IMAGE_DLL_CHARACTERISTICS_RESERVED_0x0004', 0x0004),
 252 |     ('IMAGE_DLL_CHARACTERISTICS_RESERVED_0x0008', 0x0008),
 253 |     ('IMAGE_DLL_CHARACTERISTICS_DYNAMIC_BASE',      0x0040),
 254 |     ('IMAGE_DLL_CHARACTERISTICS_FORCE_INTEGRITY',   0x0080),
 255 |     ('IMAGE_DLL_CHARACTERISTICS_NX_COMPAT',         0x0100),
 256 |     ('IMAGE_DLL_CHARACTERISTICS_NO_ISOLATION',      0x0200),
 257 |     ('IMAGE_DLL_CHARACTERISTICS_NO_SEH',    0x0400),
 258 |     ('IMAGE_DLL_CHARACTERISTICS_NO_BIND',   0x0800),
 259 |     ('IMAGE_DLL_CHARACTERISTICS_RESERVED_0x1000', 0x1000),
 260 |     ('IMAGE_DLL_CHARACTERISTICS_WDM_DRIVER',    0x2000),
 261 |     ('IMAGE_DLL_CHARACTERISTICS_TERMINAL_SERVER_AWARE', 0x8000) ]
 262 | 
 263 | DLL_CHARACTERISTICS = dict([(e[1], e[0]) for e in dll_characteristics]+dll_characteristics)
 264 | 
 265 | 
 266 | # Resource types
 267 | resource_type = [
 268 |     ('RT_CURSOR',          1),
 269 |     ('RT_BITMAP',          2),
 270 |     ('RT_ICON',            3),
 271 |     ('RT_MENU',            4),
 272 |     ('RT_DIALOG',          5),
 273 |     ('RT_STRING',          6),
 274 |     ('RT_FONTDIR',         7),
 275 |     ('RT_FONT',            8),
 276 |     ('RT_ACCELERATOR',     9),
 277 |     ('RT_RCDATA',          10),
 278 |     ('RT_MESSAGETABLE',    11),
 279 |     ('RT_GROUP_CURSOR',    12),
 280 |     ('RT_GROUP_ICON',      14),
 281 |     ('RT_VERSION',         16),
 282 |     ('RT_DLGINCLUDE',      17),
 283 |     ('RT_PLUGPLAY',        19),
 284 |     ('RT_VXD',             20),
 285 |     ('RT_ANICURSOR',       21),
 286 |     ('RT_ANIICON',         22),
 287 |     ('RT_HTML',            23),
 288 |     ('RT_MANIFEST',        24) ]
 289 | 
 290 | RESOURCE_TYPE = dict([(e[1], e[0]) for e in resource_type]+resource_type)
 291 |     
 292 | 
 293 | # Language definitions
 294 | lang = [
 295 |  ('LANG_NEUTRAL',       0x00),
 296 |  ('LANG_INVARIANT',     0x7f),
 297 |  ('LANG_AFRIKAANS',     0x36),
 298 |  ('LANG_ALBANIAN',      0x1c),
 299 |  ('LANG_ARABIC',        0x01),
 300 |  ('LANG_ARMENIAN',      0x2b),
 301 |  ('LANG_ASSAMESE',      0x4d),
 302 |  ('LANG_AZERI',         0x2c),
 303 |  ('LANG_BASQUE',        0x2d),
 304 |  ('LANG_BELARUSIAN',    0x23),
 305 |  ('LANG_BENGALI',       0x45),
 306 |  ('LANG_BULGARIAN',     0x02),
 307 |  ('LANG_CATALAN',       0x03),
 308 |  ('LANG_CHINESE',       0x04),
 309 |  ('LANG_CROATIAN',      0x1a),
 310 |  ('LANG_CZECH',         0x05),
 311 |  ('LANG_DANISH',        0x06),
 312 |  ('LANG_DIVEHI',        0x65),
 313 |  ('LANG_DUTCH',         0x13),
 314 |  ('LANG_ENGLISH',       0x09),
 315 |  ('LANG_ESTONIAN',      0x25),
 316 |  ('LANG_FAEROESE',      0x38),
 317 |  ('LANG_FARSI',         0x29),
 318 |  ('LANG_FINNISH',       0x0b),
 319 |  ('LANG_FRENCH',        0x0c),
 320 |  ('LANG_GALICIAN',      0x56),
 321 |  ('LANG_GEORGIAN',      0x37),
 322 |  ('LANG_GERMAN',        0x07),
 323 |  ('LANG_GREEK',         0x08),
 324 |  ('LANG_GUJARATI',      0x47),
 325 |  ('LANG_HEBREW',        0x0d),
 326 |  ('LANG_HINDI',         0x39),
 327 |  ('LANG_HUNGARIAN',     0x0e),
 328 |  ('LANG_ICELANDIC',     0x0f),
 329 |  ('LANG_INDONESIAN',    0x21),
 330 |  ('LANG_ITALIAN',       0x10),
 331 |  ('LANG_JAPANESE',      0x11),
 332 |  ('LANG_KANNADA',       0x4b),
 333 |  ('LANG_KASHMIRI',      0x60),
 334 |  ('LANG_KAZAK',         0x3f),
 335 |  ('LANG_KONKANI',       0x57),
 336 |  ('LANG_KOREAN',        0x12),
 337 |  ('LANG_KYRGYZ',        0x40),
 338 |  ('LANG_LATVIAN',       0x26),
 339 |  ('LANG_LITHUANIAN',    0x27),
 340 |  ('LANG_MACEDONIAN',    0x2f),
 341 |  ('LANG_MALAY',         0x3e),
 342 |  ('LANG_MALAYALAM',     0x4c),
 343 |  ('LANG_MANIPURI',      0x58),
 344 |  ('LANG_MARATHI',       0x4e),
 345 |  ('LANG_MONGOLIAN',     0x50),
 346 |  ('LANG_NEPALI',        0x61),
 347 |  ('LANG_NORWEGIAN',     0x14),
 348 |  ('LANG_ORIYA',         0x48),
 349 |  ('LANG_POLISH',        0x15),
 350 |  ('LANG_PORTUGUESE',    0x16),
 351 |  ('LANG_PUNJABI',       0x46),
 352 |  ('LANG_ROMANIAN',      0x18),
 353 |  ('LANG_RUSSIAN',       0x19),
 354 |  ('LANG_SANSKRIT',      0x4f),
 355 |  ('LANG_SERBIAN',       0x1a),
 356 |  ('LANG_SINDHI',        0x59),
 357 |  ('LANG_SLOVAK',        0x1b),
 358 |  ('LANG_SLOVENIAN',     0x24),
 359 |  ('LANG_SPANISH',       0x0a),
 360 |  ('LANG_SWAHILI',       0x41),
 361 |  ('LANG_SWEDISH',       0x1d),
 362 |  ('LANG_SYRIAC',        0x5a),
 363 |  ('LANG_TAMIL',         0x49),
 364 |  ('LANG_TATAR',         0x44),
 365 |  ('LANG_TELUGU',        0x4a),
 366 |  ('LANG_THAI',          0x1e),
 367 |  ('LANG_TURKISH',       0x1f),
 368 |  ('LANG_UKRAINIAN',     0x22),
 369 |  ('LANG_URDU',          0x20),
 370 |  ('LANG_UZBEK',         0x43),
 371 |  ('LANG_VIETNAMESE',    0x2a),
 372 |  ('LANG_GAELIC',        0x3c),
 373 |  ('LANG_MALTESE',       0x3a),
 374 |  ('LANG_MAORI',         0x28),
 375 |  ('LANG_RHAETO_ROMANCE',0x17),
 376 |  ('LANG_SAAMI',         0x3b),
 377 |  ('LANG_SORBIAN',       0x2e),
 378 |  ('LANG_SUTU',          0x30),
 379 |  ('LANG_TSONGA',        0x31),
 380 |  ('LANG_TSWANA',        0x32),
 381 |  ('LANG_VENDA',         0x33),
 382 |  ('LANG_XHOSA',         0x34),
 383 |  ('LANG_ZULU',          0x35),
 384 |  ('LANG_ESPERANTO',     0x8f),
 385 |  ('LANG_WALON',         0x90),
 386 |  ('LANG_CORNISH',       0x91),
 387 |  ('LANG_WELSH',         0x92),
 388 |  ('LANG_BRETON',        0x93) ]
 389 | 
 390 | LANG = dict(lang+[(e[1], e[0]) for e in lang])
 391 | 
 392 | 
 393 | # Sublanguage definitions
 394 | sublang =  [
 395 |  ('SUBLANG_NEUTRAL',                        0x00),
 396 |  ('SUBLANG_DEFAULT',                        0x01),
 397 |  ('SUBLANG_SYS_DEFAULT',                    0x02),
 398 |  ('SUBLANG_ARABIC_SAUDI_ARABIA',            0x01),
 399 |  ('SUBLANG_ARABIC_IRAQ',                    0x02),
 400 |  ('SUBLANG_ARABIC_EGYPT',                   0x03),
 401 |  ('SUBLANG_ARABIC_LIBYA',                   0x04),
 402 |  ('SUBLANG_ARABIC_ALGERIA',                 0x05),
 403 |  ('SUBLANG_ARABIC_MOROCCO',                 0x06),
 404 |  ('SUBLANG_ARABIC_TUNISIA',                 0x07),
 405 |  ('SUBLANG_ARABIC_OMAN',                    0x08),
 406 |  ('SUBLANG_ARABIC_YEMEN',                   0x09),
 407 |  ('SUBLANG_ARABIC_SYRIA',                   0x0a),
 408 |  ('SUBLANG_ARABIC_JORDAN',                  0x0b),
 409 |  ('SUBLANG_ARABIC_LEBANON',                 0x0c),
 410 |  ('SUBLANG_ARABIC_KUWAIT',                  0x0d),
 411 |  ('SUBLANG_ARABIC_UAE',                     0x0e),
 412 |  ('SUBLANG_ARABIC_BAHRAIN',                 0x0f),
 413 |  ('SUBLANG_ARABIC_QATAR',                   0x10),
 414 |  ('SUBLANG_AZERI_LATIN',                    0x01),
 415 |  ('SUBLANG_AZERI_CYRILLIC',                 0x02),
 416 |  ('SUBLANG_CHINESE_TRADITIONAL',            0x01),
 417 |  ('SUBLANG_CHINESE_SIMPLIFIED',             0x02),
 418 |  ('SUBLANG_CHINESE_HONGKONG',               0x03),
 419 |  ('SUBLANG_CHINESE_SINGAPORE',              0x04),
 420 |  ('SUBLANG_CHINESE_MACAU',                  0x05),
 421 |  ('SUBLANG_DUTCH',                          0x01),
 422 |  ('SUBLANG_DUTCH_BELGIAN',                  0x02),
 423 |  ('SUBLANG_ENGLISH_US',                     0x01),
 424 |  ('SUBLANG_ENGLISH_UK',                     0x02),
 425 |  ('SUBLANG_ENGLISH_AUS',                    0x03),
 426 |  ('SUBLANG_ENGLISH_CAN',                    0x04),
 427 |  ('SUBLANG_ENGLISH_NZ',                     0x05),
 428 |  ('SUBLANG_ENGLISH_EIRE',                   0x06),
 429 |  ('SUBLANG_ENGLISH_SOUTH_AFRICA',           0x07),
 430 |  ('SUBLANG_ENGLISH_JAMAICA',                0x08),
 431 |  ('SUBLANG_ENGLISH_CARIBBEAN',              0x09),
 432 |  ('SUBLANG_ENGLISH_BELIZE',                 0x0a),
 433 |  ('SUBLANG_ENGLISH_TRINIDAD',               0x0b),
 434 |  ('SUBLANG_ENGLISH_ZIMBABWE',               0x0c),
 435 |  ('SUBLANG_ENGLISH_PHILIPPINES',            0x0d),
 436 |  ('SUBLANG_FRENCH',                         0x01),
 437 |  ('SUBLANG_FRENCH_BELGIAN',                 0x02),
 438 |  ('SUBLANG_FRENCH_CANADIAN',                0x03),
 439 |  ('SUBLANG_FRENCH_SWISS',                   0x04),
 440 |  ('SUBLANG_FRENCH_LUXEMBOURG',              0x05),
 441 |  ('SUBLANG_FRENCH_MONACO',                  0x06),
 442 |  ('SUBLANG_GERMAN',                         0x01),
 443 |  ('SUBLANG_GERMAN_SWISS',                   0x02),
 444 |  ('SUBLANG_GERMAN_AUSTRIAN',                0x03),
 445 |  ('SUBLANG_GERMAN_LUXEMBOURG',              0x04),
 446 |  ('SUBLANG_GERMAN_LIECHTENSTEIN',           0x05),
 447 |  ('SUBLANG_ITALIAN',                        0x01),
 448 |  ('SUBLANG_ITALIAN_SWISS',                  0x02),
 449 |  ('SUBLANG_KASHMIRI_SASIA',                 0x02),
 450 |  ('SUBLANG_KASHMIRI_INDIA',                 0x02),
 451 |  ('SUBLANG_KOREAN',                         0x01),
 452 |  ('SUBLANG_LITHUANIAN',                     0x01),
 453 |  ('SUBLANG_MALAY_MALAYSIA',                 0x01),
 454 |  ('SUBLANG_MALAY_BRUNEI_DARUSSALAM',        0x02),
 455 |  ('SUBLANG_NEPALI_INDIA',                   0x02),
 456 |  ('SUBLANG_NORWEGIAN_BOKMAL',               0x01),
 457 |  ('SUBLANG_NORWEGIAN_NYNORSK',              0x02),
 458 |  ('SUBLANG_PORTUGUESE',                     0x02),
 459 |  ('SUBLANG_PORTUGUESE_BRAZILIAN',           0x01),
 460 |  ('SUBLANG_SERBIAN_LATIN',                  0x02),
 461 |  ('SUBLANG_SERBIAN_CYRILLIC',               0x03),
 462 |  ('SUBLANG_SPANISH',                        0x01),
 463 |  ('SUBLANG_SPANISH_MEXICAN',                0x02),
 464 |  ('SUBLANG_SPANISH_MODERN',                 0x03),
 465 |  ('SUBLANG_SPANISH_GUATEMALA',              0x04),
 466 |  ('SUBLANG_SPANISH_COSTA_RICA',             0x05),
 467 |  ('SUBLANG_SPANISH_PANAMA',                 0x06),
 468 |  ('SUBLANG_SPANISH_DOMINICAN_REPUBLIC',     0x07),
 469 |  ('SUBLANG_SPANISH_VENEZUELA',              0x08),
 470 |  ('SUBLANG_SPANISH_COLOMBIA',               0x09),
 471 |  ('SUBLANG_SPANISH_PERU',                   0x0a),
 472 |  ('SUBLANG_SPANISH_ARGENTINA',              0x0b),
 473 |  ('SUBLANG_SPANISH_ECUADOR',                0x0c),
 474 |  ('SUBLANG_SPANISH_CHILE',                  0x0d),
 475 |  ('SUBLANG_SPANISH_URUGUAY',                0x0e),
 476 |  ('SUBLANG_SPANISH_PARAGUAY',               0x0f),
 477 |  ('SUBLANG_SPANISH_BOLIVIA',                0x10),
 478 |  ('SUBLANG_SPANISH_EL_SALVADOR',            0x11),
 479 |  ('SUBLANG_SPANISH_HONDURAS',               0x12),
 480 |  ('SUBLANG_SPANISH_NICARAGUA',              0x13),
 481 |  ('SUBLANG_SPANISH_PUERTO_RICO',            0x14),
 482 |  ('SUBLANG_SWEDISH',                        0x01),
 483 |  ('SUBLANG_SWEDISH_FINLAND',                0x02),
 484 |  ('SUBLANG_URDU_PAKISTAN',                  0x01),
 485 |  ('SUBLANG_URDU_INDIA',                     0x02),
 486 |  ('SUBLANG_UZBEK_LATIN',                    0x01),
 487 |  ('SUBLANG_UZBEK_CYRILLIC',                 0x02),
 488 |  ('SUBLANG_DUTCH_SURINAM',                  0x03),
 489 |  ('SUBLANG_ROMANIAN',                       0x01),
 490 |  ('SUBLANG_ROMANIAN_MOLDAVIA',              0x02),
 491 |  ('SUBLANG_RUSSIAN',                        0x01),
 492 |  ('SUBLANG_RUSSIAN_MOLDAVIA',               0x02),
 493 |  ('SUBLANG_CROATIAN',                       0x01),
 494 |  ('SUBLANG_LITHUANIAN_CLASSIC',             0x02),
 495 |  ('SUBLANG_GAELIC',                         0x01),
 496 |  ('SUBLANG_GAELIC_SCOTTISH',                0x02),
 497 |  ('SUBLANG_GAELIC_MANX',                    0x03) ]
 498 | 
 499 | SUBLANG = dict(sublang+[(e[1], e[0]) for e in sublang])
 500 | 
 501 | # Initialize the dictionary with all the name->value pairs
 502 | SUBLANG = dict( sublang )
 503 | # Now add all the value->name information, handling duplicates appropriately
 504 | for sublang_name, sublang_value in sublang:
 505 |     if SUBLANG.has_key( sublang_value ):
 506 |         SUBLANG[ sublang_value ].append( sublang_name )
 507 |     else:
 508 |         SUBLANG[ sublang_value ] = [ sublang_name ]
 509 |         
 510 | # Resolve a sublang name given the main lang name
 511 | #
 512 | def get_sublang_name_for_lang( lang_value, sublang_value ):
 513 |     lang_name = LANG.get(lang_value, '*unknown*')
 514 |     for sublang_name in SUBLANG.get(sublang_value, list()):
 515 |         # if the main language is a substring of sublang's name, then 
 516 |         # return that
 517 |         if lang_name in sublang_name:
 518 |             return sublang_name
 519 |     # otherwise return the first sublang name
 520 |     return SUBLANG.get(sublang_value, ['*unknown*'])[0]
 521 | 
 522 | 
 523 | # Ange Albertini's code to process resources' strings
 524 | #
 525 | def parse_strings(data, counter, l):
 526 |     i = 0
 527 |     error_count = 0
 528 |     while i < len(data):
 529 |         
 530 |         data_slice = data[i:i + 2]
 531 |         if len(data_slice) < 2:
 532 |             break
 533 |         
 534 |         len_ = struct.unpack("<h", data_slice)[0]
 535 |         i += 2
 536 |         if len_ != 0 and 0 <= len_*2 <= len(data):
 537 |             try:
 538 |                 l[counter] = data[i: i + len_ * 2].decode('utf-16')
 539 |             except UnicodeDecodeError:
 540 |                 error_count += 1
 541 |                 pass
 542 |             if error_count >= 3:
 543 |                 break
 544 |             i += len_ * 2
 545 |         counter += 1
 546 | 
 547 | 
 548 | def retrieve_flags(flag_dict, flag_filter):
 549 |     """Read the flags from a dictionary and return them in a usable form.
 550 |     
 551 |     Will return a list of (flag, value) for all flags in "flag_dict"
 552 |     matching the filter "flag_filter".
 553 |     """
 554 |     
 555 |     return [(f[0], f[1]) for f in flag_dict.items() if
 556 |             isinstance(f[0], str) and f[0].startswith(flag_filter)]
 557 | 
 558 | 
 559 | def set_flags(obj, flag_field, flags):
 560 |     """Will process the flags and set attributes in the object accordingly.
 561 | 
 562 |     The object "obj" will gain attributes named after the flags provided in
 563 |     "flags" and valued True/False, matching the results of applying each
 564 |     flag value from "flags" to flag_field.
 565 |     """
 566 | 
 567 |     for flag in flags:
 568 |         if flag[1] & flag_field:
 569 |             #setattr(obj, flag[0], True)
 570 |             obj.__dict__[flag[0]] = True
 571 |         else:
 572 |             #setattr(obj, flag[0], False)
 573 |             obj.__dict__[flag[0]] = False
 574 | 
 575 | 
 576 | def power_of_two(val):
 577 |     return val != 0 and (val & (val-1)) == 0
 578 | 
 579 | 
 580 | FILE_ALIGNEMNT_HARDCODED_VALUE = 0x200
 581 | FileAlignment_Warning = False # We only want to print the warning once
 582 | SectionAlignment_Warning = False # We only want to print the warning once
 583 | 
 584 | 
 585 | 
 586 | class UnicodeStringWrapperPostProcessor:
 587 |     """This class attempts to help the process of identifying strings
 588 |     that might be plain Unicode or Pascal. A list of strings will be
 589 |     wrapped on it with the hope the overlappings will help make the
 590 |     decision about their type."""
 591 |     
 592 |     def __init__(self, pe, rva_ptr):
 593 |         self.pe = pe
 594 |         self.rva_ptr = rva_ptr
 595 |         self.string = None
 596 |         
 597 |     
 598 |     def get_rva(self):
 599 |         """Get the RVA of the string."""
 600 |         
 601 |         return self.rva_ptr
 602 |         
 603 |     
 604 |     def __str__(self):
 605 |         """Return the escaped ASCII representation of the string."""
 606 |         
 607 |         def convert_char(char):
 608 |             if char in string.printable:
 609 |                 return char
 610 |             else:
 611 |                 return r'\x%02x' % ord(char)
 612 |         
 613 |         if self.string:
 614 |             return ''.join([convert_char(c) for c in self.string])
 615 |         
 616 |         return ''
 617 |         
 618 |     
 619 |     def invalidate(self):
 620 |         """Make this instance None, to express it's no known string type."""
 621 |         
 622 |         self = None
 623 |         
 624 |     
 625 |     def render_pascal_16(self):
 626 |         
 627 |         self.string = self.pe.get_string_u_at_rva(
 628 |             self.rva_ptr+2,
 629 |             max_length=self.__get_pascal_16_length())
 630 |         
 631 |     
 632 |     def ask_pascal_16(self, next_rva_ptr):
 633 |         """The next RVA is taken to be the one immediately following this one.
 634 |         
 635 |         Such RVA could indicate the natural end of the string and will be checked
 636 |         with the possible length contained in the first word.
 637 |         """
 638 |         
 639 |         length = self.__get_pascal_16_length()
 640 |         
 641 |         if length == (next_rva_ptr - (self.rva_ptr+2)) / 2:
 642 |             self.length = length
 643 |             return True
 644 |         
 645 |         return False
 646 |         
 647 |     
 648 |     def __get_pascal_16_length(self):
 649 |         
 650 |         return self.__get_word_value_at_rva(self.rva_ptr)
 651 |         
 652 |     
 653 |     def __get_word_value_at_rva(self, rva):
 654 |         
 655 |         try:
 656 |             data = self.pe.get_data(self.rva_ptr, 2)
 657 |         except PEFormatError, e:
 658 |             return False
 659 |         
 660 |         if len(data)<2:
 661 |             return False
 662 |         
 663 |         return struct.unpack('<H', data)[0]
 664 |         
 665 |     
 666 |     #def render_pascal_8(self):
 667 |     #    """"""
 668 |         
 669 |     
 670 |     def ask_unicode_16(self, next_rva_ptr):
 671 |         """The next RVA is taken to be the one immediately following this one.
 672 |         
 673 |         Such RVA could indicate the natural end of the string and will be checked
 674 |         to see if there's a Unicode NULL character there.
 675 |         """
 676 |         
 677 |         if self.__get_word_value_at_rva(next_rva_ptr-2) == 0:
 678 |             self.length = next_rva_ptr - self.rva_ptr
 679 |             return True
 680 |         
 681 |         return False
 682 |         
 683 |     
 684 |     def render_unicode_16(self):
 685 |         """"""
 686 |         
 687 |         self.string = self.pe.get_string_u_at_rva(self.rva_ptr)
 688 |     
 689 | 
 690 | class PEFormatError(Exception):
 691 |     """Generic PE format error exception."""
 692 |     
 693 |     def __init__(self, value):
 694 |         self.value = value
 695 |     
 696 |     def __str__(self):
 697 |         return repr(self.value)
 698 | 
 699 | 
 700 | class Dump:
 701 |     """Convenience class for dumping the PE information."""
 702 |     
 703 |     def __init__(self):
 704 |         self.text = list()
 705 |         
 706 |     
 707 |     def add_lines(self, txt, indent=0):
 708 |         """Adds a list of lines.
 709 |         
 710 |         The list can be indented with the optional argument 'indent'.
 711 |         """
 712 |         for line in txt:
 713 |             self.add_line(line, indent)
 714 |             
 715 |     
 716 |     def add_line(self, txt, indent=0):
 717 |         """Adds a line.
 718 |         
 719 |         The line can be indented with the optional argument 'indent'.
 720 |         """
 721 |         
 722 |         self.add(txt+'\n', indent)
 723 |         
 724 |     
 725 |     def add(self, txt, indent=0):
 726 |         """Adds some text, no newline will be appended.
 727 |         
 728 |         The text can be indented with the optional argument 'indent'.
 729 |         """
 730 |         
 731 |         if isinstance(txt, unicode):
 732 |             try:
 733 |                 txt = str(txt)
 734 |             except UnicodeEncodeError:
 735 |                 s = []
 736 |                 for c in txt:
 737 |                     try:
 738 |                         s.append(str(c))
 739 |                     except UnicodeEncodeError:
 740 |                         s.append(repr(c))
 741 |                 
 742 |                 txt = ''.join(s)
 743 |         
 744 |         self.text.append( ' '*indent + txt )
 745 |         
 746 |     
 747 |     def add_header(self, txt):
 748 |         """Adds a header element."""
 749 |         
 750 |         self.add_line('-'*10+txt+'-'*10+'\n')
 751 |         
 752 |     
 753 |     def add_newline(self):
 754 |         """Adds a newline."""
 755 |         
 756 |         self.text.append( '\n' )
 757 |         
 758 |     
 759 |     def get_text(self):
 760 |         """Get the text in its current state."""
 761 |         
 762 |         return ''.join( self.text )
 763 | 
 764 | 
 765 | STRUCT_SIZEOF_TYPES = {
 766 |     'x': 1, 'c': 1, 'b': 1, 'B': 1, 
 767 |     'h': 2, 'H': 2, 
 768 |     'i': 4, 'I': 4, 'l': 4, 'L': 4, 'f': 4,
 769 |     'q': 8, 'Q': 8, 'd': 8,
 770 |     's': 1 }
 771 | 
 772 | class Structure:
 773 |     """Prepare structure object to extract members from data.
 774 |     
 775 |     Format is a list containing definitions for the elements
 776 |     of the structure.
 777 |     """
 778 |     
 779 |     
 780 |     def __init__(self, format, name=None, file_offset=None):
 781 |         # Format is forced little endian, for big endian non Intel platforms
 782 |         self.__format__ = '<'
 783 |         self.__keys__ = []
 784 |         #self.values = {}
 785 |         self.__format_length__ = 0
 786 |         self.__field_offsets__ = dict()
 787 |         self.__set_format__(format[1])
 788 |         self.__all_zeroes__ = False
 789 |         self.__unpacked_data_elms__ = None
 790 |         self.__file_offset__ = file_offset
 791 |         if name:
 792 |             self.name = name
 793 |         else:
 794 |             self.name = format[0]
 795 |             
 796 |     
 797 |     def __get_format__(self):
 798 |         return self.__format__
 799 |         
 800 |     def get_field_absolute_offset(self, field_name):
 801 |         """Return the offset within the field for the requested field in the structure."""
 802 |         return self.__file_offset__ + self.__field_offsets__[field_name]
 803 | 
 804 |     def get_field_relative_offset(self, field_name):
 805 |         """Return the offset within the structure for the requested field."""
 806 |         return self.__field_offsets__[field_name]
 807 |     
 808 |     def get_file_offset(self):
 809 |         return self.__file_offset__
 810 |     
 811 |     def set_file_offset(self, offset):
 812 |         self.__file_offset__ = offset
 813 |     
 814 |     def all_zeroes(self):
 815 |         """Returns true is the unpacked data is all zeroes."""
 816 |         
 817 |         return self.__all_zeroes__
 818 |                 
 819 |     def sizeof_type(self, t):
 820 |         count = 1
 821 |         _t = t
 822 |         if t[0] in string.digits:
 823 |             # extract the count
 824 |             count = int( ''.join([d for d in t if d in string.digits]) )
 825 |             _t = ''.join([d for d in t if d not in string.digits])
 826 |         return STRUCT_SIZEOF_TYPES[_t] * count
 827 |     
 828 |     def __set_format__(self, format):
 829 |         
 830 |         offset = 0
 831 |         for elm in format:
 832 |             if ',' in elm:
 833 |                 elm_type, elm_name = elm.split(',', 1)
 834 |                 self.__format__ += elm_type
 835 |                 
 836 |                 elm_names = elm_name.split(',')
 837 |                 names = []
 838 |                 for elm_name in elm_names:
 839 |                     if elm_name in self.__keys__:
 840 |                         search_list = [x[:len(elm_name)] for x in self.__keys__]
 841 |                         occ_count = search_list.count(elm_name)
 842 |                         elm_name = elm_name+'_'+str(occ_count)
 843 |                     names.append(elm_name)
 844 |                     self.__field_offsets__[elm_name] = offset
 845 | 
 846 |                 offset += self.sizeof_type(elm_type)
 847 | 
 848 |                 # Some PE header structures have unions on them, so a certain
 849 |                 # value might have different names, so each key has a list of
 850 |                 # all the possible members referring to the data.
 851 |                 self.__keys__.append(names)
 852 |         
 853 |         self.__format_length__ = struct.calcsize(self.__format__)
 854 |         
 855 |     
 856 |     def sizeof(self):
 857 |         """Return size of the structure."""
 858 |         
 859 |         return self.__format_length__
 860 |         
 861 |     
 862 |     def __unpack__(self, data):
 863 |         
 864 |         if len(data) > self.__format_length__:
 865 |             data = data[:self.__format_length__]
 866 |         
 867 |         # OC Patch:
 868 |         # Some malware have incorrect header lengths.
 869 |         # Fail gracefully if this occurs
 870 |         # Buggy malware: a29b0118af8b7408444df81701ad5a7f
 871 |         #
 872 |         elif len(data) < self.__format_length__:
 873 |             raise PEFormatError('Data length less than expected header length.')
 874 |             
 875 |         
 876 |         if data.count(chr(0)) == len(data):
 877 |             self.__all_zeroes__ = True
 878 |         
 879 |         self.__unpacked_data_elms__ = struct.unpack(self.__format__, data)
 880 |         for i in xrange(len(self.__unpacked_data_elms__)):
 881 |             for key in self.__keys__[i]:
 882 |                 #self.values[key] = self.__unpacked_data_elms__[i]
 883 |                 setattr(self, key, self.__unpacked_data_elms__[i])
 884 |     
 885 |     
 886 |     def __pack__(self):
 887 |         
 888 |         new_values = []
 889 |         
 890 |         for i in xrange(len(self.__unpacked_data_elms__)):
 891 |             
 892 |             for key in self.__keys__[i]:
 893 |                 new_val = getattr(self, key)
 894 |                 old_val = self.__unpacked_data_elms__[i]
 895 |                 
 896 |                 # In the case of Unions, when the first changed value
 897 |                 # is picked the loop is exited
 898 |                 if new_val != old_val:
 899 |                     break
 900 |             
 901 |             new_values.append(new_val)
 902 |         
 903 |         return struct.pack(self.__format__, *new_values)
 904 |                 
 905 |     
 906 |     def __str__(self):
 907 |         return '\n'.join( self.dump() )
 908 |     
 909 |     def __repr__(self):
 910 |         return '<Structure: %s>' % (' '.join( [' '.join(s.split()) for s in self.dump()] ))
 911 |         
 912 |     
 913 |     def dump(self, indentation=0):
 914 |         """Returns a string representation of the structure."""
 915 |         
 916 |         dump = []
 917 |         
 918 |         dump.append('[%s]' % self.name)
 919 |         
 920 |         # Refer to the __set_format__ method for an explanation
 921 |         # of the following construct.
 922 |         for keys in self.__keys__:
 923 |             for key in keys:
 924 |                 
 925 |                 val = getattr(self, key)
 926 |                 if isinstance(val, int) or isinstance(val, long):
 927 |                     val_str = '0x%-8X' % (val)
 928 |                     if key == 'TimeDateStamp' or key == 'dwTimeStamp':
 929 |                         try:
 930 |                             val_str += ' [%s UTC]' % time.asctime(time.gmtime(val))
 931 |                         except exceptions.ValueError, e:
 932 |                             val_str += ' [INVALID TIME]'
 933 |                 else:
 934 |                     val_str = ''.join(filter(lambda c:c != '\0', str(val)))
 935 |                 
 936 |                 dump.append('0x%-8X 0x%-3X %-30s %s' % (
 937 |                     self.__field_offsets__[key] + self.__file_offset__, 
 938 |                     self.__field_offsets__[key], key+':', val_str))
 939 |         
 940 |         return dump
 941 | 
 942 | 
 943 | 
 944 | class SectionStructure(Structure):
 945 |     """Convenience section handling class."""
 946 |     
 947 |     def __init__(self, *argl, **argd):
 948 |         if 'pe' in argd:
 949 |             self.pe = argd['pe']
 950 |             del argd['pe']
 951 |             
 952 |         Structure.__init__(self, *argl, **argd)
 953 |         
 954 |     def get_data(self, start=None, length=None):
 955 |         """Get data chunk from a section.
 956 |         
 957 |         Allows to query data from the section by passing the
 958 |         addresses where the PE file would be loaded by default.
 959 |         It is then possible to retrieve code and data by its real
 960 |         addresses as it would be if loaded.
 961 |         """
 962 |         
 963 |         PointerToRawData_adj = self.pe.adjust_FileAlignment( self.PointerToRawData,
 964 |             self.pe.OPTIONAL_HEADER.FileAlignment )
 965 |         VirtualAddress_adj = self.pe.adjust_SectionAlignment( self.VirtualAddress, 
 966 |             self.pe.OPTIONAL_HEADER.SectionAlignment, self.pe.OPTIONAL_HEADER.FileAlignment )
 967 | 
 968 |         if start is None:
 969 |             offset = PointerToRawData_adj
 970 |         else:
 971 |             offset = ( start - VirtualAddress_adj ) + PointerToRawData_adj
 972 |         
 973 |         if length is not None:
 974 |             end = offset + length
 975 |         else:
 976 |             end = offset + self.SizeOfRawData
 977 |             
 978 |         # PointerToRawData is not adjusted here as we might want to read any possible extra bytes
 979 |         # that might get cut off by aligning the start (and hence cutting something off the end)
 980 |         #
 981 |         if end > self.PointerToRawData + self.SizeOfRawData:
 982 |             end = self.PointerToRawData + self.SizeOfRawData
 983 |         
 984 |         return self.pe.__data__[offset:end]
 985 |     
 986 |     
 987 |     def __setattr__(self, name, val):
 988 |     
 989 |         if name == 'Characteristics':
 990 |             section_flags = retrieve_flags(SECTION_CHARACTERISTICS, 'IMAGE_SCN_')
 991 |             
 992 |             # Set the section's flags according the the Characteristics member
 993 |             set_flags(self, val, section_flags)
 994 |             
 995 |         elif 'IMAGE_SCN_' in name and hasattr(self, name):
 996 |             if val:
 997 |                 self.__dict__['Characteristics'] |= SECTION_CHARACTERISTICS[name]
 998 |             else:
 999 |                 self.__dict__['Characteristics'] ^= SECTION_CHARACTERISTICS[name]
1000 |                 
1001 |         self.__dict__[name] = val
1002 |     
1003 |     
1004 |     def get_rva_from_offset(self, offset):
1005 |         return offset - self.pe.adjust_FileAlignment( self.PointerToRawData,
1006 |             self.pe.OPTIONAL_HEADER.FileAlignment ) + self.pe.adjust_SectionAlignment( self.VirtualAddress, 
1007 |             self.pe.OPTIONAL_HEADER.SectionAlignment, self.pe.OPTIONAL_HEADER.FileAlignment )
1008 |     
1009 |     
1010 |     def get_offset_from_rva(self, rva):
1011 |         return (rva - 
1012 |             self.pe.adjust_SectionAlignment( 
1013 |                 self.VirtualAddress, 
1014 |                 self.pe.OPTIONAL_HEADER.SectionAlignment, 
1015 |                 self.pe.OPTIONAL_HEADER.FileAlignment )
1016 |             ) + self.pe.adjust_FileAlignment( 
1017 |                 self.PointerToRawData,
1018 |                 self.pe.OPTIONAL_HEADER.FileAlignment )
1019 |     
1020 |     
1021 |     def contains_offset(self, offset):
1022 |         """Check whether the section contains the file offset provided."""
1023 |         
1024 |         if self.PointerToRawData is None:
1025 |            # bss and other sections containing only uninitialized data must have 0
1026 |            # and do not take space in the file
1027 |            return False
1028 |         return ( self.pe.adjust_FileAlignment( self.PointerToRawData, 
1029 |                 self.pe.OPTIONAL_HEADER.FileAlignment ) <= 
1030 |                     offset < 
1031 |                         self.pe.adjust_FileAlignment( self.PointerToRawData,
1032 |                             self.pe.OPTIONAL_HEADER.FileAlignment ) + 
1033 |                                 self.SizeOfRawData )
1034 |     
1035 |     
1036 |     def contains_rva(self, rva):
1037 |         """Check whether the section contains the address provided."""
1038 |         
1039 |         # Check if the SizeOfRawData is realistic. If it's bigger than the size of
1040 |         # the whole PE file minus the start address of the section it could be
1041 |         # either truncated or the SizeOfRawData contain a misleading value.
1042 |         # In either of those cases we take the VirtualSize
1043 |         #
1044 |         if len(self.pe.__data__) - self.pe.adjust_FileAlignment( self.PointerToRawData,
1045 |             self.pe.OPTIONAL_HEADER.FileAlignment ) < self.SizeOfRawData:
1046 |             # PECOFF documentation v8 says:
1047 |             # VirtualSize: The total size of the section when loaded into memory.
1048 |             # If this value is greater than SizeOfRawData, the section is zero-padded.
1049 |             # This field is valid only for executable images and should be set to zero
1050 |             # for object files.
1051 |             #
1052 |             size = self.Misc_VirtualSize
1053 |         else:
1054 |             size = max(self.SizeOfRawData, self.Misc_VirtualSize)
1055 |         
1056 |         VirtualAddress_adj = self.pe.adjust_SectionAlignment( self.VirtualAddress, 
1057 |             self.pe.OPTIONAL_HEADER.SectionAlignment, self.pe.OPTIONAL_HEADER.FileAlignment )
1058 |         
1059 |         return VirtualAddress_adj <= rva < VirtualAddress_adj + size
1060 |     
1061 |     
1062 |     def contains(self, rva):
1063 |         #print "DEPRECATION WARNING: you should use contains_rva() instead of contains()"
1064 |         return self.contains_rva(rva)
1065 |     
1066 |     
1067 |     #def set_data(self, data):
1068 |     #    """Set the data belonging to the section."""
1069 |     #    
1070 |     #    self.data = data
1071 |         
1072 |     
1073 |     def get_entropy(self):
1074 |         """Calculate and return the entropy for the section."""
1075 |         
1076 |         return self.entropy_H( self.get_data() )
1077 |     
1078 |     
1079 |     def get_hash_sha1(self):
1080 |         """Get the SHA-1 hex-digest of the section's data."""
1081 |         
1082 |         if sha1 is not None:
1083 |             return sha1( self.get_data() ).hexdigest()
1084 |     
1085 |     
1086 |     def get_hash_sha256(self):
1087 |         """Get the SHA-256 hex-digest of the section's data."""
1088 |         
1089 |         if sha256 is not None:
1090 |             return sha256( self.get_data() ).hexdigest()
1091 |     
1092 |     
1093 |     def get_hash_sha512(self):
1094 |         """Get the SHA-512 hex-digest of the section's data."""
1095 |         
1096 |         if sha512 is not None:
1097 |             return sha512( self.get_data() ).hexdigest()
1098 |     
1099 |     
1100 |     def get_hash_md5(self):
1101 |         """Get the MD5 hex-digest of the section's data."""
1102 |         
1103 |         if md5 is not None:
1104 |             return md5( self.get_data() ).hexdigest()
1105 |     
1106 |     
1107 |     def entropy_H(self, data):
1108 |         """Calculate the entropy of a chunk of data."""
1109 |         
1110 |         if len(data) == 0:
1111 |             return 0.0
1112 |         
1113 |         occurences = array.array('L', [0]*256)
1114 |         
1115 |         for x in data:
1116 |             occurences[ord(x)] += 1
1117 |         
1118 |         entropy = 0
1119 |         for x in occurences:
1120 |             if x:
1121 |                 p_x = float(x) / len(data)
1122 |                 entropy -= p_x*math.log(p_x, 2)
1123 |         
1124 |         return entropy
1125 | 
1126 | 
1127 | 
1128 | class DataContainer:
1129 |     """Generic data container."""
1130 |     
1131 |     def __init__(self, **args):
1132 |         for key, value in args.items():
1133 |             setattr(self, key, value)
1134 | 
1135 | 
1136 | 
1137 | class ImportDescData(DataContainer):
1138 |     """Holds import descriptor information.
1139 |     
1140 |     dll:        name of the imported DLL
1141 |     imports:    list of imported symbols (ImportData instances)
1142 |     struct:     IMAGE_IMPORT_DESCRIPTOR structure
1143 |     """
1144 | 
1145 | class ImportData(DataContainer):
1146 |     """Holds imported symbol's information.
1147 |     
1148 |     ordinal:    Ordinal of the symbol
1149 |     name:       Name of the symbol
1150 |     bound:      If the symbol is bound, this contains
1151 |                 the address.
1152 |     """
1153 | 
1154 |     
1155 |     def __setattr__(self, name, val):
1156 | 
1157 |         # If the instance doesn't yet have an ordinal attribute
1158 |         # it's not fully initialized so can't do any of the
1159 |         # following
1160 |         #
1161 |         if hasattr(self, 'ordinal') and hasattr(self, 'bound') and hasattr(self, 'name'):
1162 |         
1163 |             if name == 'ordinal':
1164 | 
1165 |                 if self.pe.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE:
1166 |                     ordinal_flag = IMAGE_ORDINAL_FLAG
1167 |                 elif self.pe.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS:
1168 |                     ordinal_flag = IMAGE_ORDINAL_FLAG64
1169 | 
1170 |                 # Set the ordinal and flag the entry as imporing by ordinal
1171 |                 self.struct_table.Ordinal = ordinal_flag | (val & 0xffff)
1172 |                 self.struct_table.AddressOfData = self.struct_table.Ordinal
1173 |                 self.struct_table.Function = self.struct_table.Ordinal
1174 |                 self.struct_table.ForwarderString = self.struct_table.Ordinal
1175 |             elif name == 'bound':
1176 |                 if self.struct_iat is not None:
1177 |                     self.struct_iat.AddressOfData = val
1178 |                     self.struct_iat.AddressOfData = self.struct_iat.AddressOfData
1179 |                     self.struct_iat.Function = self.struct_iat.AddressOfData
1180 |                     self.struct_iat.ForwarderString = self.struct_iat.AddressOfData
1181 |             elif name == 'address':
1182 |                 self.struct_table.AddressOfData = val
1183 |                 self.struct_table.Ordinal = self.struct_table.AddressOfData
1184 |                 self.struct_table.Function = self.struct_table.AddressOfData
1185 |                 self.struct_table.ForwarderString = self.struct_table.AddressOfData
1186 |             elif name == 'name':
1187 |                 # Make sure we reset the entry in case the import had been set to import by ordinal
1188 |                 if self.name_offset:
1189 | 
1190 |                     name_rva = self.pe.get_rva_from_offset( self.name_offset )
1191 |                     self.pe.set_dword_at_offset( self.ordinal_offset, (0<<31) | name_rva )
1192 |                 
1193 |                     # Complain if the length of the new name is longer than the existing one
1194 |                     if len(val) > len(self.name):
1195 |                         #raise Exception('The export name provided is longer than the existing one.')
1196 |                         pass
1197 |                     self.pe.set_bytes_at_offset( self.name_offset, val )
1198 | 
1199 |         self.__dict__[name] = val
1200 |         
1201 |         
1202 | class ExportDirData(DataContainer):
1203 |     """Holds export directory information.
1204 |     
1205 |     struct:     IMAGE_EXPORT_DIRECTORY structure
1206 |     symbols:    list of exported symbols (ExportData instances)
1207 | """
1208 | 
1209 | class ExportData(DataContainer):
1210 |     """Holds exported symbols' information.
1211 |     
1212 |     ordinal:    ordinal of the symbol
1213 |     address:    address of the symbol
1214 |     name:       name of the symbol (None if the symbol is
1215 |                 exported by ordinal only)
1216 |     forwarder:  if the symbol is forwarded it will
1217 |                 contain the name of the target symbol,
1218 |                 None otherwise.
1219 |     """
1220 | 
1221 |     def __setattr__(self, name, val):
1222 | 
1223 |         # If the instance doesn't yet have an ordinal attribute
1224 |         # it's not fully initialized so can't do any of the
1225 |         # following
1226 |         #
1227 |         if hasattr(self, 'ordinal') and hasattr(self, 'address') and hasattr(self, 'forwarder') and hasattr(self, 'name'):
1228 |         
1229 |             if name == 'ordinal':
1230 |                 self.pe.set_word_at_offset( self.ordinal_offset, val )
1231 |             elif name == 'address':
1232 |                 self.pe.set_dword_at_offset( self.address_offset, val )
1233 |             elif name == 'name':
1234 |                 # Complain if the length of the new name is longer than the existing one
1235 |                 if len(val) > len(self.name):
1236 |                     #raise Exception('The export name provided is longer than the existing one.')
1237 |                     pass
1238 |                 self.pe.set_bytes_at_offset( self.name_offset, val )
1239 |             elif name == 'forwarder':
1240 |                 # Complain if the length of the new name is longer than the existing one
1241 |                 if len(val) > len(self.forwarder):
1242 |                     #raise Exception('The forwarder name provided is longer than the existing one.')
1243 |                     pass
1244 |                 self.pe.set_bytes_at_offset( self.forwarder_offset, val )
1245 | 
1246 |         self.__dict__[name] = val
1247 | 
1248 | 
1249 | class ResourceDirData(DataContainer):
1250 |     """Holds resource directory information.
1251 |     
1252 |     struct:     IMAGE_RESOURCE_DIRECTORY structure
1253 |     entries:    list of entries (ResourceDirEntryData instances)
1254 |     """
1255 | 
1256 | class ResourceDirEntryData(DataContainer):
1257 |     """Holds resource directory entry data.
1258 |     
1259 |     struct:     IMAGE_RESOURCE_DIRECTORY_ENTRY structure
1260 |     name:       If the resource is identified by name this
1261 |                 attribute will contain the name string. None
1262 |                 otherwise. If identified by id, the id is
1263 |                 available at 'struct.Id'
1264 |     id:         the id, also in struct.Id
1265 |     directory:  If this entry has a lower level directory
1266 |                 this attribute will point to the
1267 |                 ResourceDirData instance representing it.
1268 |     data:       If this entry has no further lower directories
1269 |                 and points to the actual resource data, this
1270 |                 attribute will reference the corresponding
1271 |                 ResourceDataEntryData instance.
1272 |     (Either of the 'directory' or 'data' attribute will exist,
1273 |     but not both.)
1274 |     """
1275 | 
1276 | class ResourceDataEntryData(DataContainer):
1277 |     """Holds resource data entry information.
1278 |     
1279 |     struct:     IMAGE_RESOURCE_DATA_ENTRY structure
1280 |     lang:       Primary language ID
1281 |     sublang:    Sublanguage ID
1282 |     """
1283 | 
1284 | class DebugData(DataContainer):
1285 |     """Holds debug information.
1286 |     
1287 |     struct:     IMAGE_DEBUG_DIRECTORY structure
1288 |     """
1289 | 
1290 | class BaseRelocationData(DataContainer):
1291 |     """Holds base relocation information.
1292 |     
1293 |     struct:     IMAGE_BASE_RELOCATION structure
1294 |     entries:    list of relocation data (RelocationData instances)
1295 |     """
1296 | 
1297 | class RelocationData(DataContainer):
1298 |     """Holds relocation information.
1299 |     
1300 |     type:       Type of relocation
1301 |                 The type string is can be obtained by
1302 |                 RELOCATION_TYPE[type]
1303 |     rva:        RVA of the relocation
1304 |     """
1305 |     def __setattr__(self, name, val):
1306 | 
1307 |         # If the instance doesn't yet have a struct attribute
1308 |         # it's not fully initialized so can't do any of the
1309 |         # following
1310 |         #
1311 |         if hasattr(self, 'struct'):
1312 |             # Get the word containing the type and data
1313 |             #
1314 |             word = self.struct.Data
1315 |         
1316 |             if name == 'type':
1317 |                 word = (val << 12) | (word & 0xfff)
1318 |             elif name == 'rva':
1319 |                 offset = val-self.base_rva
1320 |                 if offset < 0:
1321 |                     offset = 0
1322 |                 word = ( word & 0xf000) | ( offset & 0xfff)
1323 | 
1324 |             # Store the modified data
1325 |             #
1326 |             self.struct.Data = word
1327 | 
1328 |         self.__dict__[name] = val
1329 | 
1330 | class TlsData(DataContainer):
1331 |     """Holds TLS information.
1332 |     
1333 |     struct:     IMAGE_TLS_DIRECTORY structure
1334 |     """
1335 | 
1336 | class BoundImportDescData(DataContainer):
1337 |     """Holds bound import descriptor data.
1338 |     
1339 |     This directory entry will provide with information on the
1340 |     DLLs this PE files has been bound to (if bound at all).
1341 |     The structure will contain the name and timestamp of the
1342 |     DLL at the time of binding so that the loader can know
1343 |     whether it differs from the one currently present in the
1344 |     system and must, therefore, re-bind the PE's imports.
1345 |     
1346 |     struct:     IMAGE_BOUND_IMPORT_DESCRIPTOR structure
1347 |     name:       DLL name
1348 |     entries:    list of entries (BoundImportRefData instances)
1349 |                 the entries will exist if this DLL has forwarded
1350 |                 symbols. If so, the destination DLL will have an
1351 |                 entry in this list.
1352 |     """
1353 | 
1354 | class LoadConfigData(DataContainer):
1355 |     """Holds Load Config data.
1356 |     
1357 |     struct:     IMAGE_LOAD_CONFIG_DIRECTORY structure
1358 |     name:       dll name
1359 |     """
1360 | 
1361 | class BoundImportRefData(DataContainer):
1362 |     """Holds bound import forwarder reference data.
1363 |     
1364 |     Contains the same information as the bound descriptor but
1365 |     for forwarded DLLs, if any.
1366 |     
1367 |     struct:     IMAGE_BOUND_FORWARDER_REF structure
1368 |     name:       dll name
1369 |     """
1370 | 
1371 | 
1372 | # Valid FAT32 8.3 short filename characters according to:
1373 | #  http://en.wikipedia.org/wiki/8.3_filename
1374 | # This will help decide whether DLL ASCII names are likely
1375 | # to be valid of otherwise corruted data
1376 | #
1377 | # The flename length is not checked because the DLLs filename
1378 | # can be longer that the 8.3
1379 | allowed_filename = string.lowercase + string.uppercase + string.digits + "!#$%&'()-@^_`{}~+,.;=[]" + ''.join( [chr(i) for i in range(128, 256)] )
1380 | def is_valid_dos_filename(s):
1381 |     if s is None or not isinstance(s, str):
1382 |         return False
1383 |     for c in s:
1384 |         if c not in allowed_filename:
1385 |             return False
1386 |     return True
1387 | 
1388 | 
1389 | # Check if a imported name uses the valid accepted characters expected in mangled
1390 | # function names. If the symbol's characters don't fall within this charset
1391 | # we will assume the name is invalid
1392 | #
1393 | allowed_function_name = string.lowercase + string.uppercase + string.digits + '_?@$()'
1394 | def is_valid_function_name(s):
1395 |     if s is None or not isinstance(s, str):
1396 |         return False
1397 |     for c in s:
1398 |         if c not in allowed_function_name:
1399 |             return False
1400 |     return True
1401 | 
1402 | 
1403 | 
1404 | class PE:
1405 |     """A Portable Executable representation.
1406 |     
1407 |     This class provides access to most of the information in a PE file.
1408 |     
1409 |     It expects to be supplied the name of the file to load or PE data
1410 |     to process and an optional argument 'fast_load' (False by default)
1411 |     which controls whether to load all the directories information,
1412 |     which can be quite time consuming.
1413 |     
1414 |     pe = pefile.PE('module.dll')
1415 |     pe = pefile.PE(name='module.dll')
1416 |     
1417 |     would load 'module.dll' and process it. If the data would be already
1418 |     available in a buffer the same could be achieved with:
1419 |     
1420 |     pe = pefile.PE(data=module_dll_data)
1421 |     
1422 |     The "fast_load" can be set to a default by setting its value in the
1423 |     module itself by means,for instance, of a "pefile.fast_load = True".
1424 |     That will make all the subsequent instances not to load the
1425 |     whole PE structure. The "full_load" method can be used to parse
1426 |     the missing data at a later stage.
1427 |     
1428 |     Basic headers information will be available in the attributes:
1429 |     
1430 |     DOS_HEADER
1431 |     NT_HEADERS
1432 |     FILE_HEADER
1433 |     OPTIONAL_HEADER
1434 |     
1435 |     All of them will contain among their attributes the members of the
1436 |     corresponding structures as defined in WINNT.H
1437 |     
1438 |     The raw data corresponding to the header (from the beginning of the
1439 |     file up to the start of the first section) will be available in the
1440 |     instance's attribute 'header' as a string.
1441 |     
1442 |     The sections will be available as a list in the 'sections' attribute.
1443 |     Each entry will contain as attributes all the structure's members.
1444 |     
1445 |     Directory entries will be available as attributes (if they exist):
1446 |     (no other entries are processed at this point)
1447 |     
1448 |     DIRECTORY_ENTRY_IMPORT (list of ImportDescData instances)
1449 |     DIRECTORY_ENTRY_EXPORT (ExportDirData instance)
1450 |     DIRECTORY_ENTRY_RESOURCE (ResourceDirData instance)
1451 |     DIRECTORY_ENTRY_DEBUG (list of DebugData instances)
1452 |     DIRECTORY_ENTRY_BASERELOC (list of BaseRelocationData instances)
1453 |     DIRECTORY_ENTRY_TLS
1454 |     DIRECTORY_ENTRY_BOUND_IMPORT (list of BoundImportData instances)
1455 |     
1456 |     The following dictionary attributes provide ways of mapping different
1457 |     constants. They will accept the numeric value and return the string
1458 |     representation and the opposite, feed in the string and get the
1459 |     numeric constant:
1460 |     
1461 |     DIRECTORY_ENTRY
1462 |     IMAGE_CHARACTERISTICS
1463 |     SECTION_CHARACTERISTICS
1464 |     DEBUG_TYPE
1465 |     SUBSYSTEM_TYPE
1466 |     MACHINE_TYPE
1467 |     RELOCATION_TYPE
1468 |     RESOURCE_TYPE
1469 |     LANG
1470 |     SUBLANG
1471 |     """
1472 |     
1473 |     #
1474 |     # Format specifications for PE structures.
1475 |     #
1476 |     
1477 |     __IMAGE_DOS_HEADER_format__ = ('IMAGE_DOS_HEADER',
1478 |         ('H,e_magic', 'H,e_cblp', 'H,e_cp',
1479 |         'H,e_crlc', 'H,e_cparhdr', 'H,e_minalloc',
1480 |         'H,e_maxalloc', 'H,e_ss', 'H,e_sp', 'H,e_csum',
1481 |         'H,e_ip', 'H,e_cs', 'H,e_lfarlc', 'H,e_ovno', '8s,e_res',
1482 |         'H,e_oemid', 'H,e_oeminfo', '20s,e_res2',
1483 |         'I,e_lfanew'))
1484 |     
1485 |     __IMAGE_FILE_HEADER_format__ = ('IMAGE_FILE_HEADER',
1486 |         ('H,Machine', 'H,NumberOfSections',
1487 |         'I,TimeDateStamp', 'I,PointerToSymbolTable',
1488 |         'I,NumberOfSymbols', 'H,SizeOfOptionalHeader',
1489 |         'H,Characteristics'))
1490 |     
1491 |     __IMAGE_DATA_DIRECTORY_format__ = ('IMAGE_DATA_DIRECTORY',
1492 |         ('I,VirtualAddress', 'I,Size'))
1493 |     
1494 |     
1495 |     __IMAGE_OPTIONAL_HEADER_format__ = ('IMAGE_OPTIONAL_HEADER',
1496 |         ('H,Magic', 'B,MajorLinkerVersion',
1497 |         'B,MinorLinkerVersion', 'I,SizeOfCode',
1498 |         'I,SizeOfInitializedData', 'I,SizeOfUninitializedData',
1499 |         'I,AddressOfEntryPoint', 'I,BaseOfCode', 'I,BaseOfData',
1500 |         'I,ImageBase', 'I,SectionAlignment', 'I,FileAlignment',
1501 |         'H,MajorOperatingSystemVersion', 'H,MinorOperatingSystemVersion',
1502 |         'H,MajorImageVersion', 'H,MinorImageVersion',
1503 |         'H,MajorSubsystemVersion', 'H,MinorSubsystemVersion',
1504 |         'I,Reserved1', 'I,SizeOfImage', 'I,SizeOfHeaders',
1505 |         'I,CheckSum', 'H,Subsystem', 'H,DllCharacteristics',
1506 |         'I,SizeOfStackReserve', 'I,SizeOfStackCommit',
1507 |         'I,SizeOfHeapReserve', 'I,SizeOfHeapCommit',
1508 |         'I,LoaderFlags', 'I,NumberOfRvaAndSizes' ))
1509 |     
1510 |     
1511 |     __IMAGE_OPTIONAL_HEADER64_format__ = ('IMAGE_OPTIONAL_HEADER64',
1512 |         ('H,Magic', 'B,MajorLinkerVersion',
1513 |         'B,MinorLinkerVersion', 'I,SizeOfCode',
1514 |         'I,SizeOfInitializedData', 'I,SizeOfUninitializedData',
1515 |         'I,AddressOfEntryPoint', 'I,BaseOfCode',
1516 |         'Q,ImageBase', 'I,SectionAlignment', 'I,FileAlignment',
1517 |         'H,MajorOperatingSystemVersion', 'H,MinorOperatingSystemVersion',
1518 |         'H,MajorImageVersion', 'H,MinorImageVersion',
1519 |         'H,MajorSubsystemVersion', 'H,MinorSubsystemVersion',
1520 |         'I,Reserved1', 'I,SizeOfImage', 'I,SizeOfHeaders',
1521 |         'I,CheckSum', 'H,Subsystem', 'H,DllCharacteristics',
1522 |         'Q,SizeOfStackReserve', 'Q,SizeOfStackCommit',
1523 |         'Q,SizeOfHeapReserve', 'Q,SizeOfHeapCommit',
1524 |         'I,LoaderFlags', 'I,NumberOfRvaAndSizes' ))
1525 |         
1526 |     
1527 |     __IMAGE_NT_HEADERS_format__ = ('IMAGE_NT_HEADERS', ('I,Signature',))
1528 |     
1529 |     __IMAGE_SECTION_HEADER_format__ = ('IMAGE_SECTION_HEADER',
1530 |         ('8s,Name', 'I,Misc,Misc_PhysicalAddress,Misc_VirtualSize',
1531 |         'I,VirtualAddress', 'I,SizeOfRawData', 'I,PointerToRawData',
1532 |         'I,PointerToRelocations', 'I,PointerToLinenumbers',
1533 |         'H,NumberOfRelocations', 'H,NumberOfLinenumbers',
1534 |         'I,Characteristics'))
1535 |     
1536 |     __IMAGE_DELAY_IMPORT_DESCRIPTOR_format__ = ('IMAGE_DELAY_IMPORT_DESCRIPTOR',
1537 |         ('I,grAttrs', 'I,szName', 'I,phmod', 'I,pIAT', 'I,pINT',
1538 |         'I,pBoundIAT', 'I,pUnloadIAT', 'I,dwTimeStamp'))
1539 |     
1540 |     __IMAGE_IMPORT_DESCRIPTOR_format__ =  ('IMAGE_IMPORT_DESCRIPTOR',
1541 |         ('I,OriginalFirstThunk,Characteristics',
1542 |         'I,TimeDateStamp', 'I,ForwarderChain', 'I,Name', 'I,FirstThunk'))
1543 |     
1544 |     __IMAGE_EXPORT_DIRECTORY_format__ =  ('IMAGE_EXPORT_DIRECTORY',
1545 |         ('I,Characteristics',
1546 |         'I,TimeDateStamp', 'H,MajorVersion', 'H,MinorVersion', 'I,Name',
1547 |         'I,Base', 'I,NumberOfFunctions', 'I,NumberOfNames',
1548 |         'I,AddressOfFunctions', 'I,AddressOfNames', 'I,AddressOfNameOrdinals'))
1549 |     
1550 |     __IMAGE_RESOURCE_DIRECTORY_format__ = ('IMAGE_RESOURCE_DIRECTORY',
1551 |         ('I,Characteristics',
1552 |         'I,TimeDateStamp', 'H,MajorVersion', 'H,MinorVersion',
1553 |         'H,NumberOfNamedEntries', 'H,NumberOfIdEntries'))
1554 |     
1555 |     __IMAGE_RESOURCE_DIRECTORY_ENTRY_format__ = ('IMAGE_RESOURCE_DIRECTORY_ENTRY',
1556 |         ('I,Name',
1557 |         'I,OffsetToData'))
1558 |     
1559 |     __IMAGE_RESOURCE_DATA_ENTRY_format__ = ('IMAGE_RESOURCE_DATA_ENTRY',
1560 |         ('I,OffsetToData', 'I,Size', 'I,CodePage', 'I,Reserved'))
1561 |     
1562 |     __VS_VERSIONINFO_format__ = ( 'VS_VERSIONINFO',
1563 |         ('H,Length', 'H,ValueLength', 'H,Type' ))
1564 |     
1565 |     __VS_FIXEDFILEINFO_format__ = ( 'VS_FIXEDFILEINFO',
1566 |         ('I,Signature', 'I,StrucVersion', 'I,FileVersionMS', 'I,FileVersionLS',
1567 |          'I,ProductVersionMS', 'I,ProductVersionLS', 'I,FileFlagsMask', 'I,FileFlags',
1568 |          'I,FileOS', 'I,FileType', 'I,FileSubtype', 'I,FileDateMS', 'I,FileDateLS'))
1569 |     
1570 |     __StringFileInfo_format__ = ( 'StringFileInfo',
1571 |         ('H,Length', 'H,ValueLength', 'H,Type' ))
1572 |     
1573 |     __StringTable_format__ = ( 'StringTable',
1574 |         ('H,Length', 'H,ValueLength', 'H,Type' ))
1575 |     
1576 |     __String_format__ = ( 'String',
1577 |         ('H,Length', 'H,ValueLength', 'H,Type' ))
1578 |     
1579 |     __Var_format__ = ( 'Var', ('H,Length', 'H,ValueLength', 'H,Type' ))
1580 |     
1581 |     __IMAGE_THUNK_DATA_format__ = ('IMAGE_THUNK_DATA',
1582 |         ('I,ForwarderString,Function,Ordinal,AddressOfData',))
1583 |     
1584 |     __IMAGE_THUNK_DATA64_format__ = ('IMAGE_THUNK_DATA',
1585 |         ('Q,ForwarderString,Function,Ordinal,AddressOfData',))
1586 |     
1587 |     __IMAGE_DEBUG_DIRECTORY_format__ = ('IMAGE_DEBUG_DIRECTORY',
1588 |         ('I,Characteristics', 'I,TimeDateStamp', 'H,MajorVersion',
1589 |         'H,MinorVersion', 'I,Type', 'I,SizeOfData', 'I,AddressOfRawData',
1590 |         'I,PointerToRawData'))
1591 |     
1592 |     __IMAGE_BASE_RELOCATION_format__ = ('IMAGE_BASE_RELOCATION',
1593 |         ('I,VirtualAddress', 'I,SizeOfBlock') )
1594 |     
1595 |     __IMAGE_BASE_RELOCATION_ENTRY_format__ = ('IMAGE_BASE_RELOCATION_ENTRY',
1596 |         ('H,Data',) )
1597 | 
1598 |     __IMAGE_TLS_DIRECTORY_format__ = ('IMAGE_TLS_DIRECTORY',
1599 |         ('I,StartAddressOfRawData', 'I,EndAddressOfRawData',
1600 |         'I,AddressOfIndex', 'I,AddressOfCallBacks',
1601 |         'I,SizeOfZeroFill', 'I,Characteristics' ) )
1602 |     
1603 |     __IMAGE_TLS_DIRECTORY64_format__ = ('IMAGE_TLS_DIRECTORY',
1604 |         ('Q,StartAddressOfRawData', 'Q,EndAddressOfRawData',
1605 |         'Q,AddressOfIndex', 'Q,AddressOfCallBacks',
1606 |         'I,SizeOfZeroFill', 'I,Characteristics' ) )
1607 |     
1608 |     __IMAGE_LOAD_CONFIG_DIRECTORY_format__ = ('IMAGE_LOAD_CONFIG_DIRECTORY',
1609 |         ('I,Size', 'I,TimeDateStamp',
1610 |         'H,MajorVersion', 'H,MinorVersion',
1611 |         'I,GlobalFlagsClear', 'I,GlobalFlagsSet',
1612 |         'I,CriticalSectionDefaultTimeout',
1613 |         'I,DeCommitFreeBlockThreshold',
1614 |         'I,DeCommitTotalFreeThreshold',
1615 |         'I,LockPrefixTable',
1616 |         'I,MaximumAllocationSize',
1617 |         'I,VirtualMemoryThreshold',
1618 |         'I,ProcessHeapFlags',
1619 |         'I,ProcessAffinityMask',
1620 |         'H,CSDVersion', 'H,Reserved1',
1621 |         'I,EditList', 'I,SecurityCookie',
1622 |         'I,SEHandlerTable', 'I,SEHandlerCount' ) )
1623 |         
1624 |     __IMAGE_LOAD_CONFIG_DIRECTORY64_format__ = ('IMAGE_LOAD_CONFIG_DIRECTORY',
1625 |         ('I,Size', 'I,TimeDateStamp',
1626 |       'H,MajorVersion', 'H,MinorVersion',
1627 |       'I,GlobalFlagsClear', 'I,GlobalFlagsSet',
1628 |       'I,CriticalSectionDefaultTimeout',
1629 |       'Q,DeCommitFreeBlockThreshold',
1630 |       'Q,DeCommitTotalFreeThreshold',
1631 |       'Q,LockPrefixTable',
1632 |       'Q,MaximumAllocationSize',
1633 |       'Q,VirtualMemoryThreshold',
1634 |       'Q,ProcessAffinityMask',
1635 |       'I,ProcessHeapFlags',
1636 |       'H,CSDVersion', 'H,Reserved1',
1637 |       'Q,EditList', 'Q,SecurityCookie',
1638 |       'Q,SEHandlerTable', 'Q,SEHandlerCount' ) )
1639 |     
1640 |     __IMAGE_BOUND_IMPORT_DESCRIPTOR_format__ = ('IMAGE_BOUND_IMPORT_DESCRIPTOR',
1641 |         ('I,TimeDateStamp', 'H,OffsetModuleName', 'H,NumberOfModuleForwarderRefs'))
1642 |     
1643 |     __IMAGE_BOUND_FORWARDER_REF_format__ = ('IMAGE_BOUND_FORWARDER_REF',
1644 |         ('I,TimeDateStamp', 'H,OffsetModuleName', 'H,Reserved') )
1645 |     
1646 |     
1647 |     def __init__(self, name=None, data=None, fast_load=None):
1648 |         
1649 |         self.sections = []
1650 |         
1651 |         self.__warnings = []
1652 |         
1653 |         self.PE_TYPE = None
1654 |         
1655 |         if  not name and not data:
1656 |             return
1657 |         
1658 |         # This list will keep track of all the structures created.
1659 |         # That will allow for an easy iteration through the list
1660 |         # in order to save the modifications made
1661 |         self.__structures__ = []
1662 |         self.__from_file = None
1663 |         
1664 |         if not fast_load:
1665 |             fast_load = globals()['fast_load']
1666 |         try:
1667 |             self.__parse__(name, data, fast_load)
1668 |         except:
1669 |             self.close()
1670 |             raise
1671 |             
1672 |             
1673 |     def close(self):
1674 |         if ( self.__from_file is True and hasattr(self, '__data__') and 
1675 |             ((isinstance(mmap.mmap, type) and isinstance(self.__data__, mmap.mmap)) or
1676 |            'mmap.mmap' in repr(type(self.__data__))) ):
1677 |                 self.__data__.close()
1678 |     
1679 | 
1680 |     def __unpack_data__(self, format, data, file_offset):
1681 |         """Apply structure format to raw data.
1682 |         
1683 |         Returns and unpacked structure object if successful, None otherwise.
1684 |         """
1685 |         
1686 |         structure = Structure(format, file_offset=file_offset)
1687 |         
1688 |         try:
1689 |             structure.__unpack__(data)
1690 |         except PEFormatError, err:
1691 |             self.__warnings.append(
1692 |                 'Corrupt header "%s" at file offset %d. Exception: %s' % (
1693 |                     format[0], file_offset, str(err))  )
1694 |             return None
1695 |         
1696 |         self.__structures__.append(structure)
1697 |         
1698 |         return structure
1699 |         
1700 |     
1701 |     def __parse__(self, fname, data, fast_load):
1702 |         """Parse a Portable Executable file.
1703 |         
1704 |         Loads a PE file, parsing all its structures and making them available
1705 |         through the instance's attributes.
1706 |         """
1707 |         
1708 |         if fname:
1709 |             stat = os.stat(fname)
1710 |             if stat.st_size == 0:
1711 |                 raise PEFormatError('The file is empty')
1712 |             try:
1713 |                 fd = file(fname, 'rb')
1714 |                 self.fileno = fd.fileno()
1715 |                 self.__data__ = mmap.mmap(self.fileno, 0, access=mmap.ACCESS_READ)
1716 |                 self.__from_file = True
1717 |             finally:
1718 |                 fd.close()
1719 |         elif data:
1720 |             self.__data__ = data
1721 |             self.__from_file = False
1722 |         
1723 |         dos_header_data = self.__data__[:64]
1724 |         if len(dos_header_data) != 64:
1725 |             raise PEFormatError('Unable to read the DOS Header, possibly a truncated file.')
1726 |             
1727 |         self.DOS_HEADER = self.__unpack_data__(
1728 |             self.__IMAGE_DOS_HEADER_format__,
1729 |             dos_header_data, file_offset=0)
1730 |         
1731 |         if self.DOS_HEADER.e_magic == IMAGE_DOSZM_SIGNATURE:
1732 |             raise PEFormatError('Probably a ZM Executable (not a PE file).')
1733 |         if not self.DOS_HEADER or self.DOS_HEADER.e_magic != IMAGE_DOS_SIGNATURE:
1734 |             raise PEFormatError('DOS Header magic not found.')
1735 |         
1736 |         # OC Patch:
1737 |         # Check for sane value in e_lfanew
1738 |         #
1739 |         if self.DOS_HEADER.e_lfanew > len(self.__data__):
1740 |             raise PEFormatError('Invalid e_lfanew value, probably not a PE file')
1741 |         
1742 |         nt_headers_offset = self.DOS_HEADER.e_lfanew
1743 |         
1744 |         self.NT_HEADERS = self.__unpack_data__(
1745 |             self.__IMAGE_NT_HEADERS_format__,
1746 |             self.__data__[nt_headers_offset:nt_headers_offset+8],
1747 |             file_offset = nt_headers_offset)
1748 |         
1749 |         # We better check the signature right here, before the file screws
1750 |         # around with sections:
1751 |         # OC Patch:
1752 |         # Some malware will cause the Signature value to not exist at all
1753 |         if not self.NT_HEADERS or not self.NT_HEADERS.Signature:
1754 |             raise PEFormatError('NT Headers not found.')
1755 | 
1756 |         if (0xFFFF & self.NT_HEADERS.Signature) == IMAGE_NE_SIGNATURE:
1757 |             raise PEFormatError('Invalid NT Headers signature. Probably a NE file')
1758 |         if (0xFFFF & self.NT_HEADERS.Signature) == IMAGE_LE_SIGNATURE:
1759 |             raise PEFormatError('Invalid NT Headers signature. Probably a LE file')
1760 |         if (0xFFFF & self.NT_HEADERS.Signature) == IMAGE_LX_SIGNATURE:
1761 |             raise PEFormatError('Invalid NT Headers signature. Probably a LX file')
1762 |         if self.NT_HEADERS.Signature != IMAGE_NT_SIGNATURE:
1763 |             raise PEFormatError('Invalid NT Headers signature.')
1764 |         
1765 |         self.FILE_HEADER = self.__unpack_data__(
1766 |             self.__IMAGE_FILE_HEADER_format__,
1767 |             self.__data__[nt_headers_offset+4:nt_headers_offset+4+32],
1768 |             file_offset = nt_headers_offset+4)
1769 |         image_flags = retrieve_flags(IMAGE_CHARACTERISTICS, 'IMAGE_FILE_')
1770 |         
1771 |         if not self.FILE_HEADER:
1772 |             raise PEFormatError('File Header missing')
1773 |         
1774 |         # Set the image's flags according the the Characteristics member
1775 |         set_flags(self.FILE_HEADER, self.FILE_HEADER.Characteristics, image_flags)
1776 |         
1777 |         optional_header_offset =    \
1778 |             nt_headers_offset+4+self.FILE_HEADER.sizeof()
1779 |         
1780 |         # Note: location of sections can be controlled from PE header:
1781 |         sections_offset = optional_header_offset + self.FILE_HEADER.SizeOfOptionalHeader
1782 |         
1783 |         self.OPTIONAL_HEADER = self.__unpack_data__(
1784 |             self.__IMAGE_OPTIONAL_HEADER_format__,
1785 |             self.__data__[optional_header_offset:],
1786 |             file_offset = optional_header_offset)
1787 |         
1788 |         # According to solardesigner's findings for his
1789 |         # Tiny PE project, the optional header does not
1790 |         # need fields beyond "Subsystem" in order to be
1791 |         # loadable by the Windows loader (given that zeroes
1792 |         # are acceptable values and the header is loaded
1793 |         # in a zeroed memory page)
1794 |         # If trying to parse a full Optional Header fails
1795 |         # we try to parse it again with some 0 padding
1796 |         #
1797 |         MINIMUM_VALID_OPTIONAL_HEADER_RAW_SIZE = 69
1798 |         
1799 |         if ( self.OPTIONAL_HEADER is None and
1800 |             len(self.__data__[optional_header_offset:optional_header_offset+0x200])
1801 |                 >= MINIMUM_VALID_OPTIONAL_HEADER_RAW_SIZE ):
1802 |             
1803 |             # Add enough zeroes to make up for the unused fields
1804 |             #
1805 |             padding_length = 128
1806 |             
1807 |             # Create padding
1808 |             #
1809 |             padded_data = self.__data__[optional_header_offset:optional_header_offset+0x200] + (
1810 |                 '\0' * padding_length)
1811 |             
1812 |             self.OPTIONAL_HEADER = self.__unpack_data__(
1813 |                 self.__IMAGE_OPTIONAL_HEADER_format__,
1814 |                 padded_data,
1815 |                 file_offset = optional_header_offset)
1816 |             
1817 |         
1818 |         # Check the Magic in the OPTIONAL_HEADER and set the PE file
1819 |         # type accordingly
1820 |         #
1821 |         if self.OPTIONAL_HEADER is not None:
1822 |             
1823 |             if self.OPTIONAL_HEADER.Magic == OPTIONAL_HEADER_MAGIC_PE:
1824 |                 
1825 |                 self.PE_TYPE = OPTIONAL_HEADER_MAGIC_PE
1826 |             
1827 |             elif self.OPTIONAL_HEADER.Magic == OPTIONAL_HEADER_MAGIC_PE_PLUS:
1828 |                 
1829 |                 self.PE_TYPE = OPTIONAL_HEADER_MAGIC_PE_PLUS
1830 |                 
1831 |                 self.OPTIONAL_HEADER = self.__unpack_data__(
1832 |                     self.__IMAGE_OPTIONAL_HEADER64_format__,
1833 |                     self.__data__[optional_header_offset:optional_header_offset+0x200],
1834 |                     file_offset = optional_header_offset)
1835 |                 
1836 |                 # Again, as explained above, we try to parse
1837 |                 # a reduced form of the Optional Header which
1838 |                 # is still valid despite not including all
1839 |                 # structure members
1840 |                 #
1841 |                 MINIMUM_VALID_OPTIONAL_HEADER_RAW_SIZE = 69+4
1842 |                 
1843 |                 if ( self.OPTIONAL_HEADER is None and
1844 |                     len(self.__data__[optional_header_offset:optional_header_offset+0x200])
1845 |                         >= MINIMUM_VALID_OPTIONAL_HEADER_RAW_SIZE ):
1846 |                     
1847 |                     padding_length = 128
1848 |                     padded_data = self.__data__[optional_header_offset:optional_header_offset+0x200] + (
1849 |                         '\0' * padding_length)
1850 |                     self.OPTIONAL_HEADER = self.__unpack_data__(
1851 |                         self.__IMAGE_OPTIONAL_HEADER64_format__,
1852 |                         padded_data,
1853 |                         file_offset = optional_header_offset)
1854 |         
1855 |         
1856 |         if not self.FILE_HEADER:
1857 |             raise PEFormatError('File Header missing')
1858 |         
1859 |         
1860 |         # OC Patch:
1861 |         # Die gracefully if there is no OPTIONAL_HEADER field
1862 |         # 975440f5ad5e2e4a92c4d9a5f22f75c1
1863 |         if self.PE_TYPE is None or self.OPTIONAL_HEADER is None:
1864 |             raise PEFormatError("No Optional Header found, invalid PE32 or PE32+ file")
1865 |         
1866 |         dll_characteristics_flags = retrieve_flags(DLL_CHARACTERISTICS, 'IMAGE_DLL_CHARACTERISTICS_')
1867 |         
1868 |         # Set the Dll Characteristics flags according the the DllCharacteristics member
1869 |         set_flags(
1870 |             self.OPTIONAL_HEADER,
1871 |             self.OPTIONAL_HEADER.DllCharacteristics,
1872 |             dll_characteristics_flags)
1873 |         
1874 |         
1875 |         self.OPTIONAL_HEADER.DATA_DIRECTORY = []
1876 |         #offset = (optional_header_offset + self.FILE_HEADER.SizeOfOptionalHeader)
1877 |         offset = (optional_header_offset + self.OPTIONAL_HEADER.sizeof())
1878 |         
1879 |         
1880 |         self.NT_HEADERS.FILE_HEADER = self.FILE_HEADER
1881 |         self.NT_HEADERS.OPTIONAL_HEADER = self.OPTIONAL_HEADER
1882 |         
1883 |         
1884 |         # The NumberOfRvaAndSizes is sanitized to stay within
1885 |         # reasonable limits so can be casted to an int
1886 |         #
1887 |         if self.OPTIONAL_HEADER.NumberOfRvaAndSizes > 0x10:
1888 |             self.__warnings.append(
1889 |                 'Suspicious NumberOfRvaAndSizes in the Optional Header. ' +
1890 |                 'Normal values are never larger than 0x10, the value is: 0x%x' %
1891 |                 self.OPTIONAL_HEADER.NumberOfRvaAndSizes )
1892 |         
1893 |         MAX_ASSUMED_VALID_NUMBER_OF_RVA_AND_SIZES = 0x100
1894 |         for i in xrange(int(0x7fffffffL & self.OPTIONAL_HEADER.NumberOfRvaAndSizes)):
1895 |             
1896 |             if len(self.__data__) - offset == 0:
1897 |                 break
1898 |             
1899 |             if len(self.__data__) - offset < 8:
1900 |                 data = self.__data__[offset:] + '\0'*8
1901 |             else:
1902 |                 data = self.__data__[offset:offset+MAX_ASSUMED_VALID_NUMBER_OF_RVA_AND_SIZES]
1903 |             
1904 |             dir_entry = self.__unpack_data__(
1905 |                 self.__IMAGE_DATA_DIRECTORY_format__,
1906 |                 data,
1907 |                 file_offset = offset)
1908 |             
1909 |             if dir_entry is None:
1910 |                 break
1911 |             
1912 |             # Would fail if missing an entry
1913 |             # 1d4937b2fa4d84ad1bce0309857e70ca offending sample
1914 |             try:
1915 |                 dir_entry.name = DIRECTORY_ENTRY[i]
1916 |             except (KeyError, AttributeError):
1917 |                 break
1918 |             
1919 |             offset += dir_entry.sizeof()
1920 |             
1921 |             self.OPTIONAL_HEADER.DATA_DIRECTORY.append(dir_entry)
1922 |             
1923 |             # If the offset goes outside the optional header,
1924 |             # the loop is broken, regardless of how many directories
1925 |             # NumberOfRvaAndSizes says there are
1926 |             #
1927 |             # We assume a normally sized optional header, hence that we do
1928 |             # a sizeof() instead of reading SizeOfOptionalHeader.
1929 |             # Then we add a default number of directories times their size,
1930 |             # if we go beyond that, we assume the number of directories
1931 |             # is wrong and stop processing
1932 |             if offset >= (optional_header_offset +
1933 |                 self.OPTIONAL_HEADER.sizeof() + 8*16) :
1934 |                 
1935 |                 break
1936 |                         
1937 |         
1938 |         offset = self.parse_sections(sections_offset)
1939 |         
1940 |         # OC Patch:
1941 |         # There could be a problem if there are no raw data sections
1942 |         # greater than 0
1943 |         # fc91013eb72529da005110a3403541b6 example
1944 |         # Should this throw an exception in the minimum header offset
1945 |         # can't be found?
1946 |         #
1947 |         rawDataPointers = [
1948 |             self.adjust_FileAlignment( s.PointerToRawData,
1949 |                 self.OPTIONAL_HEADER.FileAlignment ) 
1950 |             for s in self.sections if s.PointerToRawData>0 ]
1951 |         
1952 |         if len(rawDataPointers) > 0:
1953 |             lowest_section_offset = min(rawDataPointers)
1954 |         else:
1955 |             lowest_section_offset = None
1956 |         
1957 |         if not lowest_section_offset or lowest_section_offset < offset:
1958 |             self.header = self.__data__[:offset]
1959 |         else:
1960 |             self.header = self.__data__[:lowest_section_offset]
1961 |         
1962 |         
1963 |         # Check whether the entry point lies within a section
1964 |         #
1965 |         if self.get_section_by_rva(self.OPTIONAL_HEADER.AddressOfEntryPoint) is not None:
1966 |             
1967 |             # Check whether the entry point lies within the file
1968 |             #
1969 |             ep_offset = self.get_offset_from_rva(self.OPTIONAL_HEADER.AddressOfEntryPoint)
1970 |             if ep_offset > len(self.__data__):
1971 |                 
1972 |                 self.__warnings.append(
1973 |                     'Possibly corrupt file. AddressOfEntryPoint lies outside the file. ' +
1974 |                     'AddressOfEntryPoint: 0x%x' %
1975 |                     self.OPTIONAL_HEADER.AddressOfEntryPoint )
1976 |         
1977 |         else:
1978 |             
1979 |             self.__warnings.append(
1980 |                 'AddressOfEntryPoint lies outside the sections\' boundaries. ' +
1981 |                 'AddressOfEntryPoint: 0x%x' %
1982 |                 self.OPTIONAL_HEADER.AddressOfEntryPoint )
1983 |         
1984 |         
1985 |         if not fast_load:
1986 |             self.parse_data_directories()
1987 |             
1988 |             class RichHeader:
1989 |                 pass
1990 |             rich_header = self.parse_rich_header()
1991 |             if rich_header:
1992 |                 self.RICH_HEADER = RichHeader()
1993 |                 self.RICH_HEADER.checksum = rich_header.get('checksum', None)
1994 |                 self.RICH_HEADER.values = rich_header.get('values', None)
1995 |             else:
1996 |                 self.RICH_HEADER = None
1997 | 
1998 | 
1999 |     def parse_rich_header(self):
2000 |         """Parses the rich header
2001 |         see http://www.ntcore.com/files/richsign.htm for more information
2002 |         
2003 |         Structure:
2004 |         00 DanS ^ checksum, checksum, checksum, checksum
2005 |         10 Symbol RVA ^ checksum, Symbol size ^ checksum...
2006 |         ...
2007 |         XX Rich, checksum, 0, 0,...
2008 |         """
2009 |         
2010 |         # Rich Header constants
2011 |         #
2012 |         DANS = 0x536E6144 # 'DanS' as dword
2013 |         RICH = 0x68636952 # 'Rich' as dword
2014 |         
2015 |         # Read a block of data
2016 |         #
2017 |         try:
2018 |             data = list(struct.unpack("<32I", self.get_data(0x80, 0x80)))
2019 |         except:
2020 |             # In the cases where there's not enough data to contain the Rich header
2021 |             # we abort its parsing
2022 |             return None
2023 | 
2024 |         # the checksum should be present 3 times after the DanS signature
2025 |         #
2026 |         checksum = data[1]
2027 |         if (data[0] ^ checksum != DANS
2028 |             or data[2] != checksum
2029 |             or data[3] != checksum):
2030 |             return None
2031 | 
2032 |         result = {"checksum": checksum}
2033 |         headervalues = []
2034 |         result ["values"] = headervalues
2035 | 
2036 |         data = data[4:] 
2037 |         for i in xrange(len(data) / 2):
2038 |             
2039 |             # Stop until the Rich footer signature is found
2040 |             #
2041 |             if data[2 * i] == RICH:
2042 |                 
2043 |                 # it should be followed by the checksum
2044 |                 #
2045 |                 if data[2 * i + 1] != checksum:
2046 |                     self.__warnings.append('Rich Header corrupted')
2047 |                 break
2048 |             
2049 |             # header values come by pairs
2050 |             #
2051 |             headervalues += [data[2 * i] ^ checksum, data[2 * i + 1] ^ checksum]
2052 |         return result
2053 |         
2054 |     
2055 |     def get_warnings(self):
2056 |         """Return the list of warnings.
2057 |         
2058 |         Non-critical problems found when parsing the PE file are
2059 |         appended to a list of warnings. This method returns the
2060 |         full list.
2061 |         """
2062 |         
2063 |         return self.__warnings
2064 |         
2065 |     
2066 |     def show_warnings(self):
2067 |         """Print the list of warnings.
2068 |         
2069 |         Non-critical problems found when parsing the PE file are
2070 |         appended to a list of warnings. This method prints the
2071 |         full list to standard output.
2072 |         """
2073 |         
2074 |         for warning in self.__warnings:
2075 |             print '>', warning
2076 |     
2077 |     
2078 |     def full_load(self):
2079 |         """Process the data directories.
2080 |         
2081 |         This method will load the data directories which might not have
2082 |         been loaded if the "fast_load" option was used.
2083 |         """
2084 |         
2085 |         self.parse_data_directories()
2086 |         
2087 |     
2088 |     def write(self, filename=None):
2089 |         """Write the PE file.
2090 |         
2091 |         This function will process all headers and components
2092 |         of the PE file and include all changes made (by just
2093 |         assigning to attributes in the PE objects) and write
2094 |         the changes back to a file whose name is provided as
2095 |         an argument. The filename is optional, if not 
2096 |         provided the data will be returned as a 'str' object.
2097 |         """
2098 |         
2099 |         file_data = list(self.__data__)
2100 |         for structure in self.__structures__:
2101 |             
2102 |             struct_data = list(structure.__pack__())
2103 |             offset = structure.get_file_offset()
2104 |             
2105 |             file_data[offset:offset+len(struct_data)] = struct_data
2106 |         
2107 |         if hasattr(self, 'VS_VERSIONINFO'):
2108 |             if hasattr(self, 'FileInfo'):
2109 |                 for entry in self.FileInfo:
2110 |                     if hasattr(entry, 'StringTable'):
2111 |                         for st_entry in entry.StringTable:
2112 |                             for key, entry in st_entry.entries.items():
2113 |                                 
2114 |                                 offsets = st_entry.entries_offsets[key]
2115 |                                 lengths = st_entry.entries_lengths[key]
2116 |                                 
2117 |                                 if len( entry ) > lengths[1]:
2118 |                                     
2119 |                                     l = list()
2120 |                                     for idx, c in enumerate(entry):
2121 |                                         if ord(c) > 256:
2122 |                                             l.extend( [ chr(ord(c) & 0xff), chr( (ord(c) & 0xff00) >>8) ]  )
2123 |                                         else:
2124 |                                             l.extend( [chr( ord(c) ), '\0'] )
2125 |                                     
2126 |                                     file_data[
2127 |                                         offsets[1] : offsets[1] + lengths[1]*2 ] = l
2128 |                                 
2129 |                                 else:
2130 |                                     
2131 |                                     l = list()
2132 |                                     for idx, c in enumerate(entry):
2133 |                                         if ord(c) > 256:
2134 |                                             l.extend( [ chr(ord(c) & 0xff), chr( (ord(c) & 0xff00) >>8) ]  )
2135 |                                         else:
2136 |                                             l.extend( [chr( ord(c) ), '\0'] )
2137 | 
2138 |                                     file_data[
2139 |                                         offsets[1] : offsets[1] + len(entry)*2 ] = l
2140 |                                     
2141 |                                     remainder = lengths[1] - len(entry)
2142 |                                     file_data[
2143 |                                         offsets[1] + len(entry)*2 :
2144 |                                         offsets[1] + lengths[1]*2 ] = [
2145 |                                             u'\0' ] * remainder*2
2146 |                 
2147 |         new_file_data = ''.join( [ chr(ord(c)) for c in file_data] )
2148 |         
2149 |         if filename:
2150 |             f = file(filename, 'wb+')
2151 |             f.write(new_file_data)
2152 |             f.close()
2153 |         else:
2154 |             return new_file_data
2155 |     
2156 |     
2157 |     def parse_sections(self, offset):
2158 |         """Fetch the PE file sections.
2159 |         
2160 |         The sections will be readily available in the "sections" attribute.
2161 |         Its attributes will contain all the section information plus "data"
2162 |         a buffer containing the section's data.
2163 |         
2164 |         The "Characteristics" member will be processed and attributes
2165 |         representing the section characteristics (with the 'IMAGE_SCN_'
2166 |         string trimmed from the constant's names) will be added to the
2167 |         section instance.
2168 |         
2169 |         Refer to the SectionStructure class for additional info.
2170 |         """
2171 |         
2172 |         self.sections = []
2173 |         
2174 |         for i in xrange(self.FILE_HEADER.NumberOfSections):
2175 |             section = SectionStructure( self.__IMAGE_SECTION_HEADER_format__, pe=self )
2176 |             if not section:
2177 |                 break
2178 |             section_offset = offset + section.sizeof() * i
2179 |             section.set_file_offset(section_offset)
2180 |             section.__unpack__(self.__data__[section_offset : section_offset + section.sizeof()])
2181 |             self.__structures__.append(section)
2182 |             
2183 |             if section.SizeOfRawData > len(self.__data__):
2184 |                 self.__warnings.append(
2185 |                     ('Error parsing section %d. ' % i) +
2186 |                     'SizeOfRawData is larger than file.')
2187 |             
2188 |             if self.adjust_FileAlignment( section.PointerToRawData,
2189 |                 self.OPTIONAL_HEADER.FileAlignment ) > len(self.__data__):
2190 |                 
2191 |                 self.__warnings.append(
2192 |                     ('Error parsing section %d. ' % i) +
2193 |                     'PointerToRawData points beyond the end of the file.')
2194 |             
2195 |             if section.Misc_VirtualSize > 0x10000000:
2196 |                 self.__warnings.append(
2197 |                     ('Suspicious value found parsing section %d. ' % i) +
2198 |                     'VirtualSize is extremely large > 256MiB.')
2199 |             
2200 |             if self.adjust_SectionAlignment( section.VirtualAddress, 
2201 |                 self.OPTIONAL_HEADER.SectionAlignment, self.OPTIONAL_HEADER.FileAlignment ) > 0x10000000:
2202 |                 self.__warnings.append(
2203 |                     ('Suspicious value found parsing section %d. ' % i) +
2204 |                     'VirtualAddress is beyond 0x10000000.')
2205 |             
2206 |             #
2207 |             # Some packer used a non-aligned PointerToRawData in the sections,
2208 |             # which causes several common tools not to load the section data
2209 |             # properly as they blindly read from the indicated offset.
2210 |             # It seems that Windows will round the offset down to the largest
2211 |             # offset multiple of FileAlignment which is smaller than
2212 |             # PointerToRawData. The following code will do the same.
2213 |             #
2214 |             
2215 |             #alignment = self.OPTIONAL_HEADER.FileAlignment
2216 |             #self.update_section_data(section)
2217 |             
2218 |             if ( self.OPTIONAL_HEADER.FileAlignment != 0 and
2219 |                 ( section.PointerToRawData % self.OPTIONAL_HEADER.FileAlignment) != 0):
2220 |                 self.__warnings.append(
2221 |                     ('Error parsing section %d. ' % i) +
2222 |                     'PointerToRawData should normally be ' +
2223 |                     'a multiple of FileAlignment, this might imply the file ' +
2224 |                     'is trying to confuse tools which parse this incorrectly')
2225 |             
2226 |             
2227 |             section_flags = retrieve_flags(SECTION_CHARACTERISTICS, 'IMAGE_SCN_')
2228 |             
2229 |             # Set the section's flags according the the Characteristics member
2230 |             set_flags(section, section.Characteristics, section_flags)
2231 |             
2232 |             if ( section.__dict__.get('IMAGE_SCN_MEM_WRITE', False)  and
2233 |                 section.__dict__.get('IMAGE_SCN_MEM_EXECUTE', False) ):
2234 |                 
2235 |                 if section.Name == 'PAGE' and self.is_driver():
2236 |                     # Drivers can have a PAGE section with those flags set without
2237 |                     # implying that it is malicious
2238 |                     pass
2239 |                 else:
2240 |                     self.__warnings.append(
2241 |                         ('Suspicious flags set for section %d. ' % i) +
2242 |                         'Both IMAGE_SCN_MEM_WRITE and IMAGE_SCN_MEM_EXECUTE are set. ' +
2243 |                         'This might indicate a packed executable.')
2244 |             
2245 |             self.sections.append(section)
2246 |         
2247 |         if self.FILE_HEADER.NumberOfSections > 0 and self.sections:
2248 |             return offset + self.sections[0].sizeof()*self.FILE_HEADER.NumberOfSections
2249 |         else:
2250 |             return offset
2251 |         
2252 |     
2253 |     
2254 |     def parse_data_directories(self, directories=None):
2255 |         """Parse and process the PE file's data directories.
2256 |         
2257 |         If the optional argument 'directories' is given, only
2258 |         the directories at the specified indices will be parsed.
2259 |         Such functionality allows parsing of areas of interest
2260 |         without the burden of having to parse all others.
2261 |         The directories can then be specified as:
2262 |         
2263 |         For export/import only:
2264 |         
2265 |           directories = [ 0, 1 ]
2266 |           
2267 |         or (more verbosely):
2268 |         
2269 |           directories = [ DIRECTORY_ENTRY['IMAGE_DIRECTORY_ENTRY_IMPORT'], 
2270 |             DIRECTORY_ENTRY['IMAGE_DIRECTORY_ENTRY_EXPORT'] ]
2271 |             
2272 |         If 'directories' is a list, the ones that are processed will be removed,
2273 |         leaving only the ones that are not present in the image.
2274 |         """
2275 |         
2276 |         directory_parsing = (
2277 |             ('IMAGE_DIRECTORY_ENTRY_IMPORT', self.parse_import_directory),
2278 |             ('IMAGE_DIRECTORY_ENTRY_EXPORT', self.parse_export_directory),
2279 |             ('IMAGE_DIRECTORY_ENTRY_RESOURCE', self.parse_resources_directory),
2280 |             ('IMAGE_DIRECTORY_ENTRY_DEBUG', self.parse_debug_directory),
2281 |             ('IMAGE_DIRECTORY_ENTRY_BASERELOC', self.parse_relocations_directory),
2282 |             ('IMAGE_DIRECTORY_ENTRY_TLS', self.parse_directory_tls),
2283 |             ('IMAGE_DIRECTORY_ENTRY_LOAD_CONFIG', self.parse_directory_load_config),
2284 |             ('IMAGE_DIRECTORY_ENTRY_DELAY_IMPORT', self.parse_delay_import_directory),
2285 |             ('IMAGE_DIRECTORY_ENTRY_BOUND_IMPORT', self.parse_directory_bound_imports) )
2286 |             
2287 |         if directories is not None:
2288 |             if not isinstance(directories, (tuple, list)):
2289 |                 directories = [directories]
2290 |         
2291 |         for entry in directory_parsing:
2292 |             # OC Patch:
2293 |             #
2294 |             try:
2295 |                 directory_index = DIRECTORY_ENTRY[entry[0]]
2296 |                 dir_entry = self.OPTIONAL_HEADER.DATA_DIRECTORY[directory_index]
2297 |             except IndexError:
2298 |                 break
2299 |                 
2300 |             # Only process all the directories if no individual ones have
2301 |             # been chosen
2302 |             #
2303 |             if directories is None or directory_index in directories:
2304 |                 
2305 |                 if dir_entry.VirtualAddress:
2306 |                     value = entry[1](dir_entry.VirtualAddress, dir_entry.Size)
2307 |                     if value:
2308 |                         setattr(self, entry[0][6:], value)
2309 |                         
2310 |             if (directories is not None) and isinstance(directories, list) and (entry[0] in directories):
2311 |                 directories.remove(directory_index)
2312 |                 
2313 |         
2314 |     
2315 |     def parse_directory_bound_imports(self, rva, size):
2316 |         """"""
2317 |         
2318 |         bnd_descr = Structure(self.__IMAGE_BOUND_IMPORT_DESCRIPTOR_format__)
2319 |         bnd_descr_size = bnd_descr.sizeof()
2320 |         start = rva
2321 |         
2322 |         bound_imports = []
2323 |         while True:
2324 |             
2325 |             bnd_descr = self.__unpack_data__(
2326 |                 self.__IMAGE_BOUND_IMPORT_DESCRIPTOR_format__,
2327 |                    self.__data__[rva:rva+bnd_descr_size],
2328 |                    file_offset = rva)
2329 |             if bnd_descr is None:
2330 |                 # If can't parse directory then silently return.
2331 |                 # This directory does not necessarily have to be valid to
2332 |                 # still have a valid PE file
2333 |                 
2334 |                 self.__warnings.append(
2335 |                     'The Bound Imports directory exists but can\'t be parsed.')
2336 |                 
2337 |                 return
2338 |             
2339 |             if bnd_descr.all_zeroes():
2340 |                 break
2341 |             
2342 |             rva += bnd_descr.sizeof()
2343 |             
2344 |             forwarder_refs = []
2345 |             for idx in xrange(bnd_descr.NumberOfModuleForwarderRefs):
2346 |                 # Both structures IMAGE_BOUND_IMPORT_DESCRIPTOR and
2347 |                 # IMAGE_BOUND_FORWARDER_REF have the same size.
2348 |                 bnd_frwd_ref = self.__unpack_data__(
2349 |                     self.__IMAGE_BOUND_FORWARDER_REF_format__,
2350 |                     self.__data__[rva:rva+bnd_descr_size],
2351 |                     file_offset = rva)
2352 |                 # OC Patch:
2353 |                 if not bnd_frwd_ref:
2354 |                     raise PEFormatError(
2355 |                         "IMAGE_BOUND_FORWARDER_REF cannot be read")
2356 |                 rva += bnd_frwd_ref.sizeof()
2357 |                 
2358 |                 offset = start+bnd_frwd_ref.OffsetModuleName
2359 |                 name_str =  self.get_string_from_data(
2360 |                     0, self.__data__[offset : offset + MAX_STRING_LENGTH])
2361 |                 
2362 |                 if not name_str:
2363 |                     break
2364 |                 forwarder_refs.append(BoundImportRefData(
2365 |                     struct = bnd_frwd_ref,
2366 |                     name = name_str))
2367 |                     
2368 |             offset = start+bnd_descr.OffsetModuleName
2369 |             name_str = self.get_string_from_data(
2370 |                 0, self.__data__[offset : offset + MAX_STRING_LENGTH])
2371 |             
2372 |             if not name_str:
2373 |                 break
2374 |             bound_imports.append(
2375 |                 BoundImportDescData(
2376 |                     struct = bnd_descr,
2377 |                     name = name_str,
2378 |                     entries = forwarder_refs))
2379 |         
2380 |         return bound_imports
2381 |         
2382 |     
2383 |     def parse_directory_tls(self, rva, size):
2384 |         """"""
2385 |         
2386 |         if self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE:
2387 |             format = self.__IMAGE_TLS_DIRECTORY_format__
2388 |         
2389 |         elif self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS:
2390 |             format = self.__IMAGE_TLS_DIRECTORY64_format__
2391 |         
2392 |         try:
2393 |             tls_struct = self.__unpack_data__(
2394 |                 format,
2395 |                 self.get_data( rva, Structure(format).sizeof() ),
2396 |                 file_offset = self.get_offset_from_rva(rva))
2397 |         except PEFormatError:
2398 |             self.__warnings.append(
2399 |                 'Invalid TLS information. Can\'t read ' +
2400 |                 'data at RVA: 0x%x' % rva)
2401 |             tls_struct = None
2402 |         
2403 |         if not tls_struct:
2404 |             return None
2405 |         
2406 |         return TlsData( struct = tls_struct )
2407 |     
2408 |     
2409 |     def parse_directory_load_config(self, rva, size):
2410 |         """"""
2411 |         
2412 |         if self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE:
2413 |             format = self.__IMAGE_LOAD_CONFIG_DIRECTORY_format__
2414 |         
2415 |         elif self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS:
2416 |             format = self.__IMAGE_LOAD_CONFIG_DIRECTORY64_format__
2417 |         
2418 |         try:
2419 |             load_config_struct = self.__unpack_data__(
2420 |                 format,
2421 |                 self.get_data( rva, Structure(format).sizeof() ),
2422 |                 file_offset = self.get_offset_from_rva(rva))
2423 |         except PEFormatError:
2424 |             self.__warnings.append(
2425 |                 'Invalid LOAD_CONFIG information. Can\'t read ' +
2426 |                 'data at RVA: 0x%x' % rva)
2427 |             load_config_struct = None
2428 | 
2429 |         if not load_config_struct:
2430 |             return None
2431 |         
2432 |         return LoadConfigData( struct = load_config_struct )
2433 | 
2434 | 
2435 |     def parse_relocations_directory(self, rva, size):
2436 |         """"""
2437 |         
2438 |         rlc_size = Structure(self.__IMAGE_BASE_RELOCATION_format__).sizeof()
2439 |         end = rva+size
2440 |         
2441 |         relocations = []
2442 |         while rva < end:
2443 |             
2444 |             # OC Patch:
2445 |             # Malware that has bad RVA entries will cause an error.
2446 |             # Just continue on after an exception
2447 |             #
2448 |             try:
2449 |                 rlc = self.__unpack_data__(
2450 |                     self.__IMAGE_BASE_RELOCATION_format__,
2451 |                     self.get_data(rva, rlc_size),
2452 |                     file_offset = self.get_offset_from_rva(rva) )
2453 |             except PEFormatError:
2454 |                 self.__warnings.append(
2455 |                     'Invalid relocation information. Can\'t read ' +
2456 |                     'data at RVA: 0x%x' % rva)
2457 |                 rlc = None
2458 |             
2459 |             if not rlc:
2460 |                 break
2461 |             
2462 |             # rlc.VirtualAddress must lie within the Image
2463 |             if rlc.VirtualAddress > self.OPTIONAL_HEADER.SizeOfImage:
2464 |                 self.__warnings.append(
2465 |                     'Invalid relocation information. VirtualAddress outside' +
2466 |                     ' of Image: 0x%x' % rlc.VirtualAddress)
2467 |                 break
2468 |             
2469 |             # rlc.SizeOfBlock must be less or equal than the size of the image
2470 |             # (It's a rather loose sanity test)
2471 |             if rlc.SizeOfBlock > self.OPTIONAL_HEADER.SizeOfImage:
2472 |                 self.__warnings.append(
2473 |                     'Invalid relocation information. SizeOfBlock too large' +
2474 |                     ': %d' % rlc.SizeOfBlock)
2475 |                 break
2476 |             
2477 |             reloc_entries = self.parse_relocations(
2478 |                 rva+rlc_size, rlc.VirtualAddress, rlc.SizeOfBlock-rlc_size )
2479 |             
2480 |             relocations.append(
2481 |                 BaseRelocationData(
2482 |                     struct = rlc,
2483 |                     entries = reloc_entries))
2484 |             
2485 |             if not rlc.SizeOfBlock:
2486 |                 break
2487 |             rva += rlc.SizeOfBlock
2488 |         
2489 |         return relocations
2490 |         
2491 |     
2492 |     def parse_relocations(self, data_rva, rva, size):
2493 |         """"""
2494 | 
2495 |         data = self.get_data(data_rva, size)
2496 |         file_offset = self.get_offset_from_rva(data_rva)
2497 |         
2498 |         entries = []
2499 |         for idx in xrange( len(data) / 2 ):
2500 |             
2501 |             entry = self.__unpack_data__(
2502 |                 self.__IMAGE_BASE_RELOCATION_ENTRY_format__,
2503 |                 data[idx*2:(idx+1)*2],
2504 |                 file_offset = file_offset )
2505 |                 
2506 |             if not entry:
2507 |                 break
2508 |             word = entry.Data
2509 |             
2510 |             reloc_type = (word>>12)
2511 |             reloc_offset = (word & 0x0fff)
2512 |             entries.append(
2513 |                 RelocationData(
2514 |                     struct = entry,
2515 |                     type = reloc_type,
2516 |                     base_rva = rva,
2517 |                     rva = reloc_offset+rva))
2518 |             file_offset += entry.sizeof()
2519 |         
2520 |         return entries
2521 |         
2522 |     
2523 |     def parse_debug_directory(self, rva, size):
2524 |         """"""
2525 |         
2526 |         dbg_size = Structure(self.__IMAGE_DEBUG_DIRECTORY_format__).sizeof()
2527 |         
2528 |         debug = []
2529 |         for idx in xrange(size/dbg_size):
2530 |             try:
2531 |                 data = self.get_data(rva+dbg_size*idx, dbg_size)
2532 |             except PEFormatError, e:
2533 |                 self.__warnings.append(
2534 |                     'Invalid debug information. Can\'t read ' +
2535 |                     'data at RVA: 0x%x' % rva)
2536 |                 return None
2537 |             
2538 |             dbg = self.__unpack_data__(
2539 |                 self.__IMAGE_DEBUG_DIRECTORY_format__,
2540 |                 data, file_offset = self.get_offset_from_rva(rva+dbg_size*idx))
2541 |             
2542 |             if not dbg:
2543 |                 return None
2544 |             
2545 |             debug.append(
2546 |                 DebugData(
2547 |                     struct = dbg))
2548 |         
2549 |         return debug
2550 |                         
2551 |     
2552 |     def parse_resources_directory(self, rva, size=0, base_rva = None, level = 0, dirs=None):
2553 |         """Parse the resources directory.
2554 |         
2555 |         Given the RVA of the resources directory, it will process all
2556 |         its entries.
2557 |         
2558 |         The root will have the corresponding member of its structure,
2559 |         IMAGE_RESOURCE_DIRECTORY plus 'entries', a list of all the
2560 |         entries in the directory.
2561 |         
2562 |         Those entries will have, correspondingly, all the structure's
2563 |         members (IMAGE_RESOURCE_DIRECTORY_ENTRY) and an additional one,
2564 |         "directory", pointing to the IMAGE_RESOURCE_DIRECTORY structure
2565 |         representing upper layers of the tree. This one will also have
2566 |         an 'entries' attribute, pointing to the 3rd, and last, level.
2567 |         Another directory with more entries. Those last entries will
2568 |         have a new attribute (both 'leaf' or 'data_entry' can be used to
2569 |         access it). This structure finally points to the resource data.
2570 |         All the members of this structure, IMAGE_RESOURCE_DATA_ENTRY,
2571 |         are available as its attributes.
2572 |         """
2573 |         
2574 |         # OC Patch:
2575 |         if dirs is None:
2576 |             dirs = [rva]
2577 |         
2578 |         if base_rva is None:
2579 |             base_rva = rva
2580 |         
2581 |         resources_section = self.get_section_by_rva(rva)
2582 |         
2583 |         try:
2584 |             # If the RVA is invalid all would blow up. Some EXEs seem to be
2585 |             # specially nasty and have an invalid RVA.
2586 |             data = self.get_data(rva, Structure(self.__IMAGE_RESOURCE_DIRECTORY_format__).sizeof() )
2587 |         except PEFormatError, e:
2588 |             self.__warnings.append(
2589 |                 'Invalid resources directory. Can\'t read ' +
2590 |                 'directory data at RVA: 0x%x' % rva)
2591 |             return None
2592 |         
2593 |         # Get the resource directory structure, that is, the header
2594 |         # of the table preceding the actual entries
2595 |         #
2596 |         resource_dir = self.__unpack_data__(
2597 |             self.__IMAGE_RESOURCE_DIRECTORY_format__, data,
2598 |             file_offset = self.get_offset_from_rva(rva) )
2599 |         if resource_dir is None:
2600 |             # If can't parse resources directory then silently return.
2601 |             # This directory does not necessarily have to be valid to
2602 |             # still have a valid PE file
2603 |             self.__warnings.append(
2604 |                 'Invalid resources directory. Can\'t parse ' +
2605 |                 'directory data at RVA: 0x%x' % rva)
2606 |             return None
2607 |         
2608 |         dir_entries = []
2609 |         
2610 |         # Advance the RVA to the positon immediately following the directory
2611 |         # table header and pointing to the first entry in the table
2612 |         #
2613 |         rva += resource_dir.sizeof()
2614 |         
2615 |         number_of_entries = (
2616 |             resource_dir.NumberOfNamedEntries +
2617 |             resource_dir.NumberOfIdEntries )
2618 |         
2619 |         # Set a hard limit on the maximum resonable number of entries
2620 |         MAX_ALLOWED_ENTRIES = 4096
2621 |         if number_of_entries > MAX_ALLOWED_ENTRIES:
2622 |             self.__warnings.append(
2623 |                 'Error parsing the resources directory, '
2624 |                 'The directory contains %d entries (>%s)' %
2625 |                 (number_of_entries, MAX_ALLOWED_ENTRIES) )
2626 |             return None
2627 |         
2628 |         strings_to_postprocess = list()
2629 |         
2630 |         for idx in xrange(number_of_entries):
2631 |             
2632 |             
2633 |             res = self.parse_resource_entry(rva)
2634 |             if res is None:
2635 |                 self.__warnings.append(
2636 |                     'Error parsing the resources directory, '
2637 |                     'Entry %d is invalid, RVA = 0x%x. ' %
2638 |                     (idx, rva) )
2639 |                 break
2640 |             
2641 |             entry_name = None
2642 |             entry_id = None
2643 |             
2644 |             # If all named entries have been processed, only Id ones
2645 |             # remain
2646 |             
2647 |             if idx >= resource_dir.NumberOfNamedEntries:
2648 |                 entry_id = res.Name
2649 |             else:
2650 |                 ustr_offset = base_rva+res.NameOffset
2651 |                 try:
2652 |                     #entry_name = self.get_string_u_at_rva(ustr_offset, max_length=16)
2653 |                     entry_name = UnicodeStringWrapperPostProcessor(self, ustr_offset)
2654 |                     strings_to_postprocess.append(entry_name)
2655 |                 
2656 |                 except PEFormatError, excp:
2657 |                     self.__warnings.append(
2658 |                         'Error parsing the resources directory, '
2659 |                         'attempting to read entry name. '
2660 |                         'Can\'t read unicode string at offset 0x%x' %
2661 |                         (ustr_offset) )
2662 |                 
2663 |             
2664 |             if res.DataIsDirectory:
2665 |                 # OC Patch:
2666 |                 #
2667 |                 # One trick malware can do is to recursively reference
2668 |                 # the next directory. This causes hilarity to ensue when
2669 |                 # trying to parse everything correctly.
2670 |                 # If the original RVA given to this function is equal to
2671 |                 # the next one to parse, we assume that it's a trick.
2672 |                 # Instead of raising a PEFormatError this would skip some
2673 |                 # reasonable data so we just break.
2674 |                 #
2675 |                 # 9ee4d0a0caf095314fd7041a3e4404dc is the offending sample
2676 |                 if (base_rva + res.OffsetToDirectory) in dirs:
2677 |                     
2678 |                     break
2679 |                 
2680 |                 else:
2681 |                     entry_directory = self.parse_resources_directory(
2682 |                         base_rva+res.OffsetToDirectory,
2683 |                         size-(rva-base_rva), # size
2684 |                         base_rva=base_rva, level = level+1,
2685 |                         dirs=dirs + [base_rva + res.OffsetToDirectory])
2686 |                 
2687 |                 if not entry_directory:
2688 |                     break
2689 |                     
2690 |                 # Ange Albertini's code to process resources' strings
2691 |                 #
2692 |                 strings = None
2693 |                 if entry_id == RESOURCE_TYPE['RT_STRING']:
2694 |                     strings = dict()
2695 |                     for resource_id in entry_directory.entries:
2696 |                         if hasattr(resource_id, 'directory'):
2697 |                             
2698 |                             resource_strings = dict()
2699 |                             
2700 |                             for resource_lang in resource_id.directory.entries:
2701 |                                 
2702 |                                 
2703 |                                 if (resource_lang is None or not hasattr(resource_lang, 'data') or 
2704 |                                     resource_lang.data.struct.Size is None or resource_id.id is None):
2705 |                                     continue
2706 |                                     
2707 |                                 string_entry_rva = resource_lang.data.struct.OffsetToData
2708 |                                 string_entry_size = resource_lang.data.struct.Size
2709 |                                 string_entry_id = resource_id.id
2710 |                                 
2711 |                                 string_entry_data = self.get_data(string_entry_rva, string_entry_size)
2712 |                                 parse_strings( string_entry_data, (int(string_entry_id) - 1) * 16, resource_strings )
2713 |                                 strings.update(resource_strings)
2714 |                                 
2715 |                             resource_id.directory.strings = resource_strings
2716 |                                 
2717 |                 dir_entries.append(
2718 |                     ResourceDirEntryData(
2719 |                         struct = res,
2720 |                         name = entry_name,
2721 |                         id = entry_id,
2722 |                         directory = entry_directory))
2723 |             
2724 |             else:
2725 |                 struct = self.parse_resource_data_entry(
2726 |                     base_rva + res.OffsetToDirectory)
2727 |                 
2728 |                 if struct:
2729 |                     entry_data = ResourceDataEntryData(
2730 |                         struct = struct,
2731 |                         lang = res.Name & 0x3ff,
2732 |                         sublang = res.Name >> 10 )
2733 |                     
2734 |                     dir_entries.append(
2735 |                         ResourceDirEntryData(
2736 |                             struct = res,
2737 |                             name = entry_name,
2738 |                             id = entry_id,
2739 |                             data = entry_data))
2740 |                 
2741 |                 else:
2742 |                     break
2743 |                 
2744 |             
2745 |             
2746 |             # Check if this entry contains version information
2747 |             #
2748 |             if level == 0 and res.Id == RESOURCE_TYPE['RT_VERSION']:
2749 |                 if len(dir_entries)>0:
2750 |                     last_entry = dir_entries[-1]
2751 |                 
2752 |                 rt_version_struct = None
2753 |                 try:
2754 |                     rt_version_struct = last_entry.directory.entries[0].directory.entries[0].data.struct
2755 |                 except:
2756 |                     # Maybe a malformed directory structure...?
2757 |                     # Lets ignore it
2758 |                     pass
2759 |                 
2760 |                 if rt_version_struct is not None:
2761 |                     self.parse_version_information(rt_version_struct)
2762 |             
2763 |             rva += res.sizeof()
2764 |                     
2765 |         
2766 |         string_rvas = [s.get_rva() for s in strings_to_postprocess]
2767 |         string_rvas.sort()
2768 |         
2769 |         for idx, s in enumerate(strings_to_postprocess):
2770 |             s.render_pascal_16()
2771 |             
2772 |         
2773 |         resource_directory_data = ResourceDirData(
2774 |             struct = resource_dir,
2775 |             entries = dir_entries)
2776 |         
2777 |         return resource_directory_data
2778 |         
2779 |     
2780 |     def parse_resource_data_entry(self, rva):
2781 |         """Parse a data entry from the resources directory."""
2782 |         
2783 |         try:
2784 |             # If the RVA is invalid all would blow up. Some EXEs seem to be
2785 |             # specially nasty and have an invalid RVA.
2786 |             data = self.get_data(rva, Structure(self.__IMAGE_RESOURCE_DATA_ENTRY_format__).sizeof() )
2787 |         except PEFormatError, excp:
2788 |             self.__warnings.append(
2789 |                 'Error parsing a resource directory data entry, ' +
2790 |                 'the RVA is invalid: 0x%x' % ( rva ) )
2791 |             return None
2792 |         
2793 |         data_entry = self.__unpack_data__(
2794 |             self.__IMAGE_RESOURCE_DATA_ENTRY_format__, data,
2795 |             file_offset = self.get_offset_from_rva(rva) )
2796 |         
2797 |         return data_entry
2798 |         
2799 |     
2800 |     def parse_resource_entry(self, rva):
2801 |         """Parse a directory entry from the resources directory."""
2802 | 
2803 |         try:
2804 |             data = self.get_data( rva, Structure(self.__IMAGE_RESOURCE_DIRECTORY_ENTRY_format__).sizeof() )
2805 |         except PEFormatError, excp:
2806 |             # A warning will be added by the caller if this method returns None
2807 |             return None
2808 | 
2809 |         resource = self.__unpack_data__(
2810 |             self.__IMAGE_RESOURCE_DIRECTORY_ENTRY_format__, data,
2811 |             file_offset = self.get_offset_from_rva(rva) )
2812 |         
2813 |         if resource is None:
2814 |             return None
2815 |         
2816 |         #resource.NameIsString = (resource.Name & 0x80000000L) >> 31
2817 |         resource.NameOffset = resource.Name & 0x7FFFFFFFL
2818 |         
2819 |         resource.__pad = resource.Name & 0xFFFF0000L
2820 |         resource.Id = resource.Name & 0x0000FFFFL
2821 |         
2822 |         resource.DataIsDirectory = (resource.OffsetToData & 0x80000000L) >> 31
2823 |         resource.OffsetToDirectory = resource.OffsetToData & 0x7FFFFFFFL
2824 |         
2825 |         return resource
2826 |         
2827 |     
2828 |     def parse_version_information(self, version_struct):
2829 |         """Parse version information structure.
2830 |         
2831 |         The date will be made available in three attributes of the PE object.
2832 |         
2833 |         VS_VERSIONINFO     will contain the first three fields of the main structure:
2834 |             'Length', 'ValueLength', and 'Type'
2835 |         
2836 |         VS_FIXEDFILEINFO    will hold the rest of the fields, accessible as sub-attributes:
2837 |             'Signature', 'StrucVersion', 'FileVersionMS', 'FileVersionLS',
2838 |             'ProductVersionMS', 'ProductVersionLS', 'FileFlagsMask', 'FileFlags',
2839 |             'FileOS', 'FileType', 'FileSubtype', 'FileDateMS', 'FileDateLS'
2840 |         
2841 |         FileInfo    is a list of all StringFileInfo and VarFileInfo structures.
2842 |         
2843 |         StringFileInfo structures will have a list as an attribute named 'StringTable'
2844 |         containing all the StringTable structures. Each of those structures contains a
2845 |         dictionary 'entries' with all the key/value version information string pairs.
2846 |         
2847 |         VarFileInfo structures will have a list as an attribute named 'Var' containing
2848 |         all Var structures. Each Var structure will have a dictionary as an attribute
2849 |         named 'entry' which will contain the name and value of the Var.
2850 |         """
2851 |         
2852 |         
2853 |         # Retrieve the data for the version info resource
2854 |         #
2855 |         start_offset = self.get_offset_from_rva( version_struct.OffsetToData )
2856 |         raw_data = self.__data__[ start_offset : start_offset+version_struct.Size ]
2857 |         
2858 |         
2859 |         # Map the main structure and the subsequent string
2860 |         #
2861 |         versioninfo_struct = self.__unpack_data__(
2862 |             self.__VS_VERSIONINFO_format__, raw_data,
2863 |             file_offset = start_offset )
2864 |         
2865 |         if versioninfo_struct is None:
2866 |             return
2867 |         
2868 |         ustr_offset = version_struct.OffsetToData + versioninfo_struct.sizeof()
2869 |         try:
2870 |             versioninfo_string = self.get_string_u_at_rva( ustr_offset )
2871 |         except PEFormatError, excp:
2872 |             self.__warnings.append(
2873 |                 'Error parsing the version information, ' +
2874 |                 'attempting to read VS_VERSION_INFO string. Can\'t ' +
2875 |                 'read unicode string at offset 0x%x' % (
2876 |                 ustr_offset ) )
2877 |             
2878 |             versioninfo_string = None
2879 |         
2880 |         # If the structure does not contain the expected name, it's assumed to be invalid
2881 |         #
2882 |         if versioninfo_string != u'VS_VERSION_INFO':
2883 |             
2884 |             self.__warnings.append('Invalid VS_VERSION_INFO block')
2885 |             return
2886 |         
2887 |         
2888 |         # Set the PE object's VS_VERSIONINFO to this one
2889 |         #
2890 |         self.VS_VERSIONINFO = versioninfo_struct
2891 |         
2892 |         # The the Key attribute to point to the unicode string identifying the structure
2893 |         #
2894 |         self.VS_VERSIONINFO.Key = versioninfo_string
2895 |         
2896 |         
2897 |         # Process the fixed version information, get the offset and structure
2898 |         #
2899 |         fixedfileinfo_offset = self.dword_align(
2900 |             versioninfo_struct.sizeof() + 2 * (len(versioninfo_string) + 1),
2901 |             version_struct.OffsetToData)
2902 |         fixedfileinfo_struct = self.__unpack_data__(
2903 |             self.__VS_FIXEDFILEINFO_format__,
2904 |             raw_data[fixedfileinfo_offset:],
2905 |             file_offset = start_offset+fixedfileinfo_offset )
2906 |         
2907 |         if not fixedfileinfo_struct:
2908 |             return
2909 |         
2910 |         # Set the PE object's VS_FIXEDFILEINFO to this one
2911 |         #
2912 |         self.VS_FIXEDFILEINFO = fixedfileinfo_struct
2913 |         
2914 |         
2915 |         # Start parsing all the StringFileInfo and VarFileInfo structures
2916 |         #
2917 |         
2918 |         # Get the first one
2919 |         #
2920 |         stringfileinfo_offset = self.dword_align(
2921 |             fixedfileinfo_offset + fixedfileinfo_struct.sizeof(),
2922 |             version_struct.OffsetToData)
2923 |         original_stringfileinfo_offset = stringfileinfo_offset
2924 |         
2925 |         
2926 |         # Set the PE object's attribute that will contain them all.
2927 |         #
2928 |         self.FileInfo = list()
2929 |         
2930 |         
2931 |         while True:
2932 |             
2933 |             # Process the StringFileInfo/VarFileInfo struct
2934 |             #
2935 |             stringfileinfo_struct = self.__unpack_data__(
2936 |                 self.__StringFileInfo_format__,
2937 |                 raw_data[stringfileinfo_offset:],
2938 |                 file_offset = start_offset+stringfileinfo_offset )
2939 | 
2940 |             if stringfileinfo_struct is None:
2941 |                 self.__warnings.append(
2942 |                     'Error parsing StringFileInfo/VarFileInfo struct' )
2943 |                 return None
2944 |             
2945 |             # Get the subsequent string defining the structure.
2946 |             #
2947 |             ustr_offset = ( version_struct.OffsetToData +
2948 |                 stringfileinfo_offset + versioninfo_struct.sizeof() )
2949 |             try:
2950 |                 stringfileinfo_string = self.get_string_u_at_rva( ustr_offset )
2951 |             except PEFormatError, excp:
2952 |                 self.__warnings.append(
2953 |                     'Error parsing the version information, ' +
2954 |                     'attempting to read StringFileInfo string. Can\'t ' +
2955 |                     'read unicode string at offset 0x%x' %  ( ustr_offset ) )
2956 |                 break
2957 |             
2958 |             # Set such string as the Key attribute
2959 |             #
2960 |             stringfileinfo_struct.Key = stringfileinfo_string
2961 |             
2962 |             
2963 |             # Append the structure to the PE object's list
2964 |             #
2965 |             self.FileInfo.append(stringfileinfo_struct)
2966 |             
2967 |             
2968 |             # Parse a StringFileInfo entry
2969 |             #
2970 |             if stringfileinfo_string and stringfileinfo_string.startswith(u'StringFileInfo'):
2971 |                 
2972 |                 if stringfileinfo_struct.Type in (0,1) and stringfileinfo_struct.ValueLength == 0:
2973 |                     
2974 |                     stringtable_offset = self.dword_align(
2975 |                         stringfileinfo_offset + stringfileinfo_struct.sizeof() +
2976 |                             2*(len(stringfileinfo_string)+1),
2977 |                         version_struct.OffsetToData)
2978 |                     
2979 |                     stringfileinfo_struct.StringTable = list()
2980 |                     
2981 |                     # Process the String Table entries
2982 |                     #
2983 |                     while True:
2984 |                         
2985 |                         stringtable_struct = self.__unpack_data__(
2986 |                             self.__StringTable_format__,
2987 |                             raw_data[stringtable_offset:],
2988 |                             file_offset = start_offset+stringtable_offset )
2989 |                         
2990 |                         if not stringtable_struct:
2991 |                             break
2992 |                         
2993 |                         ustr_offset = ( version_struct.OffsetToData + stringtable_offset +
2994 |                             stringtable_struct.sizeof() )
2995 |                         try:
2996 |                             stringtable_string = self.get_string_u_at_rva( ustr_offset )
2997 |                         except PEFormatError, excp:
2998 |                             self.__warnings.append(
2999 |                                 'Error parsing the version information, ' +
3000 |                                 'attempting to read StringTable string. Can\'t ' +
3001 |                                 'read unicode string at offset 0x%x' % ( ustr_offset ) )
3002 |                             break
3003 |                         
3004 |                         stringtable_struct.LangID = stringtable_string
3005 |                         stringtable_struct.entries = dict()
3006 |                         stringtable_struct.entries_offsets = dict()
3007 |                         stringtable_struct.entries_lengths = dict()
3008 |                         stringfileinfo_struct.StringTable.append(stringtable_struct)
3009 |                         
3010 |                         entry_offset = self.dword_align(
3011 |                             stringtable_offset + stringtable_struct.sizeof() +
3012 |                                 2*(len(stringtable_string)+1),
3013 |                             version_struct.OffsetToData)
3014 |                         
3015 |                         # Process all entries in the string table
3016 |                         #
3017 |                         
3018 |                         while entry_offset < stringtable_offset + stringtable_struct.Length:
3019 |                             
3020 |                             string_struct = self.__unpack_data__(
3021 |                                 self.__String_format__, raw_data[entry_offset:],
3022 |                                 file_offset = start_offset+entry_offset )
3023 |                             
3024 |                             if not string_struct:
3025 |                                 break
3026 |                             
3027 |                             ustr_offset = ( version_struct.OffsetToData + entry_offset +
3028 |                                 string_struct.sizeof() )
3029 |                             try:
3030 |                                 key = self.get_string_u_at_rva( ustr_offset )
3031 |                                 key_offset = self.get_offset_from_rva( ustr_offset )
3032 |                             except PEFormatError, excp:
3033 |                                 self.__warnings.append(
3034 |                                     'Error parsing the version information, ' +
3035 |                                     'attempting to read StringTable Key string. Can\'t ' +
3036 |                                     'read unicode string at offset 0x%x' % ( ustr_offset ) )
3037 |                                 break
3038 |                             
3039 |                             value_offset = self.dword_align(
3040 |                                 2*(len(key)+1) + entry_offset + string_struct.sizeof(),
3041 |                                 version_struct.OffsetToData)
3042 |                             
3043 |                             ustr_offset = version_struct.OffsetToData + value_offset
3044 |                             try:
3045 |                                 value = self.get_string_u_at_rva( ustr_offset,
3046 |                                     max_length = string_struct.ValueLength )
3047 |                                 value_offset = self.get_offset_from_rva( ustr_offset )
3048 |                             except PEFormatError, excp:
3049 |                                 self.__warnings.append(
3050 |                                     'Error parsing the version information, ' +
3051 |                                     'attempting to read StringTable Value string. ' +
3052 |                                     'Can\'t read unicode string at offset 0x%x' % (
3053 |                                     ustr_offset ) )
3054 |                                 break
3055 |                             
3056 |                             if string_struct.Length == 0:
3057 |                                 entry_offset = stringtable_offset + stringtable_struct.Length
3058 |                             else:
3059 |                                 entry_offset = self.dword_align(
3060 |                                     string_struct.Length+entry_offset, version_struct.OffsetToData)
3061 |                             
3062 |                             key_as_char = []
3063 |                             for c in key:
3064 |                                 if ord(c) >= 0x80:
3065 |                                     key_as_char.append('\\x%02x' % ord(c))
3066 |                                 else:
3067 |                                     key_as_char.append(c)
3068 |                             
3069 |                             key_as_char = ''.join(key_as_char)
3070 | 
3071 |                             setattr(stringtable_struct, key_as_char, value)
3072 |                             stringtable_struct.entries[key] = value
3073 |                             stringtable_struct.entries_offsets[key] = (key_offset, value_offset)
3074 |                             stringtable_struct.entries_lengths[key] = (len(key), len(value))
3075 |                         
3076 |                         
3077 |                         new_stringtable_offset = self.dword_align(
3078 |                             stringtable_struct.Length + stringtable_offset,
3079 |                             version_struct.OffsetToData)
3080 |                             
3081 |                         # check if the entry is crafted in a way that would lead to an infinite
3082 |                         # loop and break if so
3083 |                         #
3084 |                         if new_stringtable_offset == stringtable_offset:
3085 |                             break
3086 |                         stringtable_offset = new_stringtable_offset
3087 |                             
3088 |                         if stringtable_offset >= stringfileinfo_struct.Length:
3089 |                             break
3090 |             
3091 |             # Parse a VarFileInfo entry
3092 |             #
3093 |             elif stringfileinfo_string and stringfileinfo_string.startswith( u'VarFileInfo' ):
3094 |                 
3095 |                 varfileinfo_struct = stringfileinfo_struct
3096 |                 varfileinfo_struct.name = 'VarFileInfo'
3097 |                 
3098 |                 if varfileinfo_struct.Type in (0, 1) and varfileinfo_struct.ValueLength == 0:
3099 |                     
3100 |                     var_offset = self.dword_align(
3101 |                         stringfileinfo_offset + varfileinfo_struct.sizeof() +
3102 |                             2*(len(stringfileinfo_string)+1),
3103 |                         version_struct.OffsetToData)
3104 |                     
3105 |                     varfileinfo_struct.Var = list()
3106 |                     
3107 |                     # Process all entries
3108 |                     #
3109 |                     
3110 |                     while True:
3111 |                         var_struct = self.__unpack_data__(
3112 |                             self.__Var_format__,
3113 |                             raw_data[var_offset:],
3114 |                             file_offset = start_offset+var_offset )
3115 |                         
3116 |                         if not var_struct:
3117 |                             break
3118 |                         
3119 |                         ustr_offset = ( version_struct.OffsetToData + var_offset +
3120 |                             var_struct.sizeof() )
3121 |                         try:
3122 |                             var_string = self.get_string_u_at_rva( ustr_offset )
3123 |                         except PEFormatError, excp:
3124 |                             self.__warnings.append(
3125 |                                 'Error parsing the version information, ' +
3126 |                                 'attempting to read VarFileInfo Var string. ' +
3127 |                                 'Can\'t read unicode string at offset 0x%x' % (ustr_offset))
3128 |                             break
3129 |                         
3130 |                         
3131 |                         varfileinfo_struct.Var.append(var_struct)
3132 |                         
3133 |                         varword_offset = self.dword_align(
3134 |                             2*(len(var_string)+1) + var_offset + var_struct.sizeof(),
3135 |                             version_struct.OffsetToData)
3136 |                         orig_varword_offset = varword_offset
3137 |                         
3138 |                         while varword_offset < orig_varword_offset + var_struct.ValueLength:
3139 |                             word1 = self.get_word_from_data(
3140 |                                 raw_data[varword_offset:varword_offset+2], 0)
3141 |                             word2 = self.get_word_from_data(
3142 |                                 raw_data[varword_offset+2:varword_offset+4], 0)
3143 |                             varword_offset += 4
3144 |                             
3145 |                             if isinstance(word1, (int, long)) and isinstance(word2, (int, long)):
3146 |                                 var_struct.entry = {var_string: '0x%04x 0x%04x' % (word1, word2)}
3147 |                         
3148 |                         var_offset = self.dword_align(
3149 |                             var_offset+var_struct.Length, version_struct.OffsetToData)
3150 |                         
3151 |                         if var_offset <= var_offset+var_struct.Length:
3152 |                             break
3153 |                 
3154 |             
3155 |             # Increment and align the offset
3156 |             #
3157 |             stringfileinfo_offset = self.dword_align(
3158 |                 stringfileinfo_struct.Length+stringfileinfo_offset,
3159 |                 version_struct.OffsetToData)
3160 |             
3161 |             # Check if all the StringFileInfo and VarFileInfo items have been processed
3162 |             #
3163 |             if stringfileinfo_struct.Length == 0 or stringfileinfo_offset >= versioninfo_struct.Length:
3164 |                 break
3165 |             
3166 |         
3167 |     
3168 |     def parse_export_directory(self, rva, size):
3169 |         """Parse the export directory.
3170 |         
3171 |         Given the RVA of the export directory, it will process all
3172 |         its entries.
3173 |         
3174 |         The exports will be made available through a list "exports"
3175 |         containing a tuple with the following elements:
3176 |             
3177 |             (ordinal, symbol_address, symbol_name)
3178 |         
3179 |         And also through a dictionary "exports_by_ordinal" whose keys
3180 |         will be the ordinals and the values tuples of the from:
3181 |             
3182 |             (symbol_address, symbol_name)
3183 |         
3184 |         The symbol addresses are relative, not absolute.
3185 |         """
3186 |         
3187 |         try:
3188 |             export_dir =  self.__unpack_data__(
3189 |                 self.__IMAGE_EXPORT_DIRECTORY_format__,
3190 |                 self.get_data( rva, Structure(self.__IMAGE_EXPORT_DIRECTORY_format__).sizeof() ),
3191 |                 file_offset = self.get_offset_from_rva(rva) )
3192 |         except PEFormatError:
3193 |             self.__warnings.append(
3194 |                 'Error parsing export directory at RVA: 0x%x' % ( rva ) )
3195 |             return
3196 |         
3197 |         if not export_dir:
3198 |             return
3199 |         
3200 |         # We keep track of the bytes left in the file and use it to set a upper
3201 |         # bound in the number of items that can be read from the different
3202 |         # arrays
3203 |         #
3204 |         def length_until_eof(rva):
3205 |             return len(self.__data__) - self.get_offset_from_rva(rva)
3206 |         
3207 |         try:
3208 |             address_of_names = self.get_data(
3209 |                 export_dir.AddressOfNames, min( length_until_eof(export_dir.AddressOfNames), export_dir.NumberOfNames*4))
3210 |             address_of_name_ordinals = self.get_data(
3211 |                 export_dir.AddressOfNameOrdinals, min( length_until_eof(export_dir.AddressOfNameOrdinals), export_dir.NumberOfNames*4) )
3212 |             address_of_functions = self.get_data(
3213 |                 export_dir.AddressOfFunctions, min( length_until_eof(export_dir.AddressOfFunctions), export_dir.NumberOfFunctions*4) )
3214 |         except PEFormatError:
3215 |             self.__warnings.append(
3216 |                 'Error parsing export directory at RVA: 0x%x' % ( rva ) )
3217 |             return
3218 |         
3219 |         exports = []
3220 |         
3221 |         max_failed_entries_before_giving_up = 10
3222 |         
3223 |         for i in xrange( min( export_dir.NumberOfNames, length_until_eof(export_dir.AddressOfNames)/4) ):
3224 |             
3225 |             symbol_name_address = self.get_dword_from_data(address_of_names, i)
3226 | 
3227 |             if symbol_name_address is None:
3228 |                 max_failed_entries_before_giving_up -= 1
3229 |                 if max_failed_entries_before_giving_up <= 0:
3230 |                     break
3231 |                 
3232 |             symbol_name = self.get_string_at_rva( symbol_name_address )
3233 |             try:
3234 |                 symbol_name_offset = self.get_offset_from_rva( symbol_name_address )
3235 |             except PEFormatError:
3236 |                 max_failed_entries_before_giving_up -= 1
3237 |                 if max_failed_entries_before_giving_up <= 0:
3238 |                     break
3239 |                 continue
3240 |             
3241 |             symbol_ordinal = self.get_word_from_data(
3242 |                 address_of_name_ordinals, i)
3243 | 
3244 |             
3245 |             if symbol_ordinal is not None and symbol_ordinal*4 < len(address_of_functions):
3246 |                 symbol_address = self.get_dword_from_data(
3247 |                     address_of_functions, symbol_ordinal)
3248 |             else:
3249 |                 # Corrupt? a bad pointer... we assume it's all
3250 |                 # useless, no exports
3251 |                 return None
3252 | 
3253 |             if symbol_address is None or symbol_address == 0:
3254 |                 continue
3255 |             
3256 |             # If the funcion's RVA points within the export directory
3257 |             # it will point to a string with the forwarded symbol's string
3258 |             # instead of pointing the the function start address.
3259 |             
3260 |             if symbol_address >= rva and symbol_address < rva+size:
3261 |                 forwarder_str = self.get_string_at_rva(symbol_address)
3262 |                 try:
3263 |                     forwarder_offset = self.get_offset_from_rva( symbol_address )
3264 |                 except PEFormatError:
3265 |                     continue
3266 |             else:
3267 |                 forwarder_str = None
3268 |                 forwarder_offset = None
3269 |             
3270 |             exports.append(
3271 |                 ExportData(
3272 |                     pe = self,
3273 |                     ordinal = export_dir.Base+symbol_ordinal,
3274 |                     ordinal_offset = self.get_offset_from_rva( export_dir.AddressOfNameOrdinals + 2*i ),
3275 |                     address = symbol_address,
3276 |                     address_offset = self.get_offset_from_rva( export_dir.AddressOfFunctions + 4*symbol_ordinal ),
3277 |                     name = symbol_name,
3278 |                     name_offset = symbol_name_offset,
3279 |                     forwarder = forwarder_str,
3280 |                     forwarder_offset = forwarder_offset ))
3281 |         
3282 |         ordinals = [exp.ordinal for exp in exports]
3283 |         
3284 |         max_failed_entries_before_giving_up = 10
3285 |         
3286 |         for idx in xrange( min(export_dir.NumberOfFunctions, length_until_eof(export_dir.AddressOfFunctions)/4) ):
3287 |             
3288 |             if not idx+export_dir.Base in ordinals:
3289 |                 try:
3290 |                     symbol_address = self.get_dword_from_data(
3291 |                         address_of_functions, idx)
3292 |                 except PEFormatError:
3293 |                     symbol_address = None
3294 |                 
3295 |                 if symbol_address is None:
3296 |                     max_failed_entries_before_giving_up -= 1
3297 |                     if max_failed_entries_before_giving_up <= 0:
3298 |                         break
3299 |                 
3300 |                 if symbol_address == 0:
3301 |                     continue
3302 |                 #
3303 |                 # Checking for forwarder again.
3304 |                 #
3305 |                 if symbol_address >= rva and symbol_address < rva+size:
3306 |                     forwarder_str = self.get_string_at_rva(symbol_address)
3307 |                 else:
3308 |                     forwarder_str = None
3309 | 
3310 |                 exports.append(
3311 |                     ExportData(
3312 |                         ordinal = export_dir.Base+idx,
3313 |                         address = symbol_address,
3314 |                         name = None,
3315 |                         forwarder = forwarder_str))
3316 |         
3317 |         return ExportDirData(
3318 |                 struct = export_dir,
3319 |                 symbols = exports)
3320 |         
3321 |     
3322 |     def dword_align(self, offset, base):
3323 |         return ((offset+base+3) & 0xfffffffcL) - (base & 0xfffffffcL)
3324 |         
3325 |     
3326 |     def parse_delay_import_directory(self, rva, size):
3327 |         """Walk and parse the delay import directory."""
3328 |         
3329 |         import_descs =  []
3330 |         while True:
3331 |             try:
3332 |                 # If the RVA is invalid all would blow up. Some PEs seem to be
3333 |                 # specially nasty and have an invalid RVA.
3334 |                 data = self.get_data( rva, Structure(self.__IMAGE_DELAY_IMPORT_DESCRIPTOR_format__).sizeof() )
3335 |             except PEFormatError, e:
3336 |                 self.__warnings.append(
3337 |                     'Error parsing the Delay import directory at RVA: 0x%x' % ( rva ) )
3338 |                 break
3339 |             
3340 |             import_desc =  self.__unpack_data__(
3341 |                 self.__IMAGE_DELAY_IMPORT_DESCRIPTOR_format__,
3342 |                 data, file_offset = self.get_offset_from_rva(rva) )
3343 |             
3344 |             
3345 |             # If the structure is all zeroes, we reached the end of the list
3346 |             if not import_desc or import_desc.all_zeroes():
3347 |                 break
3348 |             
3349 |             
3350 |             rva += import_desc.sizeof()
3351 |             
3352 |             try:
3353 |                 import_data =  self.parse_imports(
3354 |                     import_desc.pINT,
3355 |                     import_desc.pIAT,
3356 |                     None)
3357 |             except PEFormatError, e:
3358 |                 self.__warnings.append(
3359 |                     'Error parsing the Delay import directory. ' +
3360 |                     'Invalid import data at RVA: 0x%x' % ( rva ) )
3361 |                 break
3362 |             
3363 |             if not import_data:
3364 |                 continue
3365 |             
3366 |             
3367 |             dll = self.get_string_at_rva(import_desc.szName)
3368 |             if not is_valid_dos_filename(dll):
3369 |                 dll = '*invalid*'
3370 | 
3371 |             if dll:
3372 |                 import_descs.append(
3373 |                     ImportDescData(
3374 |                         struct = import_desc,
3375 |                         imports = import_data,
3376 |                         dll = dll))
3377 |         
3378 |         return import_descs
3379 |                     
3380 | 
3381 |     
3382 |     def parse_import_directory(self, rva, size):
3383 |         """Walk and parse the import directory."""
3384 |         
3385 |         import_descs =  []
3386 |         while True:
3387 |             try:
3388 |                 # If the RVA is invalid all would blow up. Some EXEs seem to be
3389 |                 # specially nasty and have an invalid RVA.
3390 |                 data = self.get_data(rva, Structure(self.__IMAGE_IMPORT_DESCRIPTOR_format__).sizeof() )
3391 |             except PEFormatError, e:
3392 |                 self.__warnings.append(
3393 |                     'Error parsing the import directory at RVA: 0x%x' % ( rva ) )
3394 |                 break
3395 |             
3396 |             import_desc =  self.__unpack_data__(
3397 |                 self.__IMAGE_IMPORT_DESCRIPTOR_format__,
3398 |                 data, file_offset = self.get_offset_from_rva(rva) )
3399 |             
3400 |             # If the structure is all zeroes, we reached the end of the list
3401 |             if not import_desc or import_desc.all_zeroes():
3402 |                 break
3403 |             
3404 |             rva += import_desc.sizeof()
3405 |             
3406 |             try:
3407 |                 import_data =  self.parse_imports(
3408 |                     import_desc.OriginalFirstThunk,
3409 |                     import_desc.FirstThunk,
3410 |                     import_desc.ForwarderChain)
3411 |             except PEFormatError, excp:
3412 |                 self.__warnings.append(
3413 |                     'Error parsing the import directory. ' +
3414 |                     'Invalid Import data at RVA: 0x%x (%s)' % ( rva, str(excp) ) )
3415 |                 break
3416 |                 #raise excp
3417 |             
3418 |             if not import_data:
3419 |                 continue
3420 |             
3421 |             dll = self.get_string_at_rva(import_desc.Name)
3422 |             if not is_valid_dos_filename(dll):
3423 |                 dll = '*invalid*'
3424 |                 
3425 |             if dll:
3426 |                 import_descs.append(
3427 |                     ImportDescData(
3428 |                         struct = import_desc,
3429 |                         imports = import_data,
3430 |                         dll = dll))
3431 |         
3432 |         suspicious_imports = set([ 'LoadLibrary', 'GetProcAddress' ])
3433 |         suspicious_imports_count = 0
3434 |         total_symbols = 0
3435 |         for imp_dll in import_descs:
3436 |             for symbol in imp_dll.imports:
3437 |                 for suspicious_symbol in suspicious_imports:
3438 |                     if symbol and symbol.name and symbol.name.startswith( suspicious_symbol ):
3439 |                         suspicious_imports_count += 1
3440 |                         break
3441 |                 total_symbols += 1
3442 |         if suspicious_imports_count == len(suspicious_imports) and total_symbols < 20:
3443 |             self.__warnings.append(
3444 |                 'Imported symbols contain entries typical of packed executables.' )
3445 |             
3446 |             
3447 |         
3448 |         return import_descs
3449 | 
3450 |     
3451 |     
3452 |     def parse_imports(self, original_first_thunk, first_thunk, forwarder_chain):
3453 |         """Parse the imported symbols.
3454 |         
3455 |         It will fill a list, which will be available as the dictionary
3456 |         attribute "imports". Its keys will be the DLL names and the values
3457 |         all the symbols imported from that object.
3458 |         """
3459 |         
3460 |         imported_symbols = []
3461 |         
3462 |         # The following has been commented as a PE does not
3463 |         # need to have the import data necessarily witin
3464 |         # a section, it can keep it in gaps between sections
3465 |         # or overlapping other data.
3466 |         #
3467 |         #imports_section = self.get_section_by_rva(first_thunk)
3468 |         #if not imports_section:
3469 |         #    raise PEFormatError, 'Invalid/corrupt imports.'
3470 |         
3471 |         # Import Lookup Table. Contains ordinals or pointers to strings.
3472 |         ilt = self.get_import_table(original_first_thunk)
3473 |         # Import Address Table. May have identical content to ILT if
3474 |         # PE file is not bounded, Will contain the address of the
3475 |         # imported symbols once the binary is loaded or if it is already
3476 |         # bound.
3477 |         iat = self.get_import_table(first_thunk)
3478 | 
3479 |         # OC Patch:
3480 |         # Would crash if IAT or ILT had None type
3481 |         if (not iat or len(iat)==0) and (not ilt or len(ilt)==0):
3482 |             raise PEFormatError(
3483 |                 'Invalid Import Table information. ' +
3484 |                 'Both ILT and IAT appear to be broken.')
3485 |         
3486 |         table = None
3487 |         if ilt:
3488 |             table = ilt
3489 |         elif iat:
3490 |             table = iat
3491 |         else:
3492 |             return None
3493 |         
3494 |         imp_offset = 4
3495 |         address_mask = 0x7fffffff
3496 |         if self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE:
3497 |             ordinal_flag = IMAGE_ORDINAL_FLAG
3498 |         elif self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS:
3499 |             ordinal_flag = IMAGE_ORDINAL_FLAG64
3500 |             imp_offset = 8
3501 |             address_mask = 0x7fffffffffffffffL
3502 |         
3503 |         for idx in xrange(len(table)):
3504 |             
3505 |             imp_ord = None
3506 |             imp_hint = None
3507 |             imp_name = None
3508 |             name_offset = None
3509 |             hint_name_table_rva = None
3510 |             
3511 |             if table[idx].AddressOfData:
3512 |                 
3513 |                 # If imported by ordinal, we will append the ordinal number
3514 |                 #
3515 |                 if table[idx].AddressOfData & ordinal_flag:
3516 |                     import_by_ordinal = True
3517 |                     imp_ord = table[idx].AddressOfData & 0xffff
3518 |                     imp_name = None
3519 |                     name_offset = None
3520 |                 else:
3521 |                     import_by_ordinal = False
3522 |                     try:
3523 |                         hint_name_table_rva = table[idx].AddressOfData & address_mask
3524 |                         data = self.get_data(hint_name_table_rva, 2)
3525 |                         # Get the Hint
3526 |                         imp_hint = self.get_word_from_data(data, 0)
3527 |                         imp_name = self.get_string_at_rva(table[idx].AddressOfData+2)
3528 |                         if not is_valid_function_name(imp_name):
3529 |                             imp_name = '*invalid*'
3530 |                         
3531 |                         name_offset = self.get_offset_from_rva(table[idx].AddressOfData+2)
3532 |                     except PEFormatError, e:
3533 |                         pass
3534 | 
3535 |                 # by nriva: we want the ThunkRVA and ThunkOffset
3536 |                 thunk_offset = table[idx].get_file_offset()
3537 |                 thunk_rva = self.get_rva_from_offset(thunk_offset)
3538 |                 
3539 |             imp_address = first_thunk + self.OPTIONAL_HEADER.ImageBase + idx * imp_offset
3540 |             
3541 |             struct_iat = None
3542 |             try:
3543 |                 
3544 |                 if iat and ilt and ilt[idx].AddressOfData != iat[idx].AddressOfData:
3545 |                     imp_bound = iat[idx].AddressOfData
3546 |                     struct_iat = iat[idx]
3547 |                 else:
3548 |                     imp_bound = None
3549 |             except IndexError:
3550 |                 imp_bound = None
3551 |             
3552 |             # The file with hashes:
3553 |             #
3554 |             # MD5: bfe97192e8107d52dd7b4010d12b2924
3555 |             # SHA256: 3d22f8b001423cb460811ab4f4789f277b35838d45c62ec0454c877e7c82c7f5
3556 |             #
3557 |             # has an invalid table built in a way that it's parseable but contains invalid
3558 |             # entries that lead pefile to take extremely long amounts of time to
3559 |             # parse. It also leads to extreme memory consumption.
3560 |             # To prevent similar cases, if invalid entries are found in the middle of a
3561 |             # table the parsing will be aborted
3562 |             #
3563 |             if imp_ord == None and imp_name == None:
3564 |                 raise PEFormatError( 'Invalid entries in the Import Table. Aborting parsing.' )
3565 |             
3566 |             if imp_name != '' and (imp_ord or imp_name):
3567 |                 imported_symbols.append(
3568 |                     ImportData(
3569 |                     pe = self,
3570 |                         struct_table = table[idx],
3571 |                         struct_iat = struct_iat, # for bound imports if any
3572 |                         import_by_ordinal = import_by_ordinal,
3573 |                         ordinal = imp_ord,
3574 |                         ordinal_offset = table[idx].get_file_offset(),
3575 |                         hint = imp_hint,
3576 |                         name = imp_name,
3577 |                         name_offset = name_offset,
3578 |                         bound = imp_bound,
3579 |                         address = imp_address,
3580 |                         hint_name_table_rva = hint_name_table_rva,
3581 |                         thunk_offset = thunk_offset,
3582 |                         thunk_rva = thunk_rva ))
3583 |         
3584 |         return imported_symbols
3585 | 
3586 | 
3587 |     
3588 |     def get_import_table(self, rva):
3589 |         
3590 |         table = []
3591 | 
3592 |         # We need the ordinal flag for a simple heuristic
3593 |         # we're implementing within the loop
3594 |         #
3595 |         if self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE:
3596 |             ordinal_flag = IMAGE_ORDINAL_FLAG
3597 |             format = self.__IMAGE_THUNK_DATA_format__
3598 |         elif self.PE_TYPE == OPTIONAL_HEADER_MAGIC_PE_PLUS:
3599 |             ordinal_flag = IMAGE_ORDINAL_FLAG64
3600 |             format = self.__IMAGE_THUNK_DATA64_format__
3601 | 
3602 |         MAX_ADDRESS_SPREAD = 128*2**20 # 64 MB
3603 |         MAX_REPEATED_ADDRESSES = 15
3604 |         repeated_address = 0
3605 |         addresses_of_data_set_64 = set()
3606 |         addresses_of_data_set_32 = set()
3607 |         while True and rva:
3608 |             
3609 |             # if we see too many times the same entry we assume it could be
3610 |             # a table containing bogus data (with malicious intenet or otherwise)
3611 |             if repeated_address >= MAX_REPEATED_ADDRESSES:
3612 |                 return []
3613 |                 
3614 |             # if the addresses point somwhere but the difference between the highest
3615 |             # and lowest address is larger than MAX_ADDRESS_SPREAD we assume a bogus
3616 |             # table as the addresses should be contained within a module
3617 |             if (addresses_of_data_set_32 and 
3618 |                 max(addresses_of_data_set_32) - min(addresses_of_data_set_32) > MAX_ADDRESS_SPREAD ):
3619 |                 return []
3620 |             if (addresses_of_data_set_64 and 
3621 |                 max(addresses_of_data_set_64) - min(addresses_of_data_set_64) > MAX_ADDRESS_SPREAD ):
3622 |                 return []
3623 |                 
3624 |             try:
3625 |                 data = self.get_data( rva, Structure(format).sizeof() )
3626 |             except PEFormatError, e:
3627 |                 self.__warnings.append(
3628 |                     'Error parsing the import table. ' +
3629 |                     'Invalid data at RVA: 0x%x' % ( rva ) )
3630 |                 return None
3631 |             
3632 |             thunk_data = self.__unpack_data__(
3633 |                 format, data, file_offset=self.get_offset_from_rva(rva) )
3634 | 
3635 |             if thunk_data and thunk_data.AddressOfData:
3636 |                 # If the entry looks like could be an ordinal...
3637 |                 if thunk_data.AddressOfData & ordinal_flag:
3638 |                     # but its value is beyond 2^16, we will assume it's a
3639 |                     # corrupted and ignore it altogether
3640 |                     if thunk_data.AddressOfData & 0x7fffffff > 0xffff:
3641 |                         return []
3642 |                 # and if it looks like it should be an RVA
3643 |                 else:
3644 |                     # keep track of the RVAs seen and store them to study their
3645 |                     # properties. When certain non-standard features are detected
3646 |                     # the parsing will be aborted
3647 |                     if (thunk_data.AddressOfData in addresses_of_data_set_32 or 
3648 |                         thunk_data.AddressOfData in addresses_of_data_set_64):
3649 |                         repeated_address += 1
3650 |                     if thunk_data.AddressOfData >= 2**32:
3651 |                         addresses_of_data_set_64.add(thunk_data.AddressOfData)
3652 |                     else:
3653 |                         addresses_of_data_set_32.add(thunk_data.AddressOfData)
3654 | 
3655 |             if not thunk_data or thunk_data.all_zeroes():
3656 |                 break
3657 |             
3658 |             rva += thunk_data.sizeof()
3659 |             
3660 |             table.append(thunk_data)
3661 |         
3662 |         return table
3663 |     
3664 |     
3665 |     def get_memory_mapped_image(self, max_virtual_address=0x10000000, ImageBase=None):
3666 |         """Returns the data corresponding to the memory layout of the PE file.
3667 |         
3668 |         The data includes the PE header and the sections loaded at offsets
3669 |         corresponding to their relative virtual addresses. (the VirtualAddress
3670 |         section header member).
3671 |         Any offset in this data corresponds to the absolute memory address
3672 |         ImageBase+offset.
3673 |         
3674 |         The optional argument 'max_virtual_address' provides with means of limiting
3675 |         which section are processed.
3676 |         Any section with their VirtualAddress beyond this value will be skipped.
3677 |         Normally, sections with values beyond this range are just there to confuse
3678 |         tools. It's a common trick to see in packed executables.
3679 |         
3680 |         If the 'ImageBase' optional argument is supplied, the file's relocations
3681 |         will be applied to the image by calling the 'relocate_image()' method. Beware
3682 |         that the relocation information is applied permanently.
3683 |         """
3684 |         
3685 |         # Rebase if requested
3686 |         #
3687 |         if ImageBase is not None:
3688 |             
3689 |             # Keep a copy of the image's data before modifying it by rebasing it
3690 |             #
3691 |             original_data = self.__data__
3692 |             
3693 |             self.relocate_image(ImageBase)
3694 |             
3695 |         # Collect all sections in one code block
3696 |         #mapped_data = self.header
3697 |         mapped_data = ''+ self.__data__[:]
3698 |         for section in self.sections:
3699 |             
3700 |             # Miscellaneous integrity tests.
3701 |             # Some packer will set these to bogus values to
3702 |             # make tools go nuts.
3703 |             #
3704 |             if section.Misc_VirtualSize == 0 or section.SizeOfRawData == 0:
3705 |                 continue
3706 |             
3707 |             if section.SizeOfRawData > len(self.__data__):
3708 |                 continue
3709 |             
3710 |             if self.adjust_FileAlignment( section.PointerToRawData,
3711 |                 self.OPTIONAL_HEADER.FileAlignment ) > len(self.__data__):
3712 |                 
3713 |                 continue
3714 |             
3715 |             VirtualAddress_adj = self.adjust_SectionAlignment( section.VirtualAddress, 
3716 |                 self.OPTIONAL_HEADER.SectionAlignment, self.OPTIONAL_HEADER.FileAlignment )
3717 |             
3718 |             if VirtualAddress_adj >= max_virtual_address:
3719 |                 continue
3720 |             
3721 |             padding_length = VirtualAddress_adj - len(mapped_data)
3722 |             
3723 |             if padding_length>0:
3724 |                 mapped_data += '\0'*padding_length
3725 |             elif padding_length<0:
3726 |                 mapped_data = mapped_data[:padding_length]
3727 |             
3728 |             mapped_data += section.get_data()
3729 |             
3730 |         # If the image was rebased, restore it to its original form
3731 |         #
3732 |         if ImageBase is not None:
3733 |             self.__data__ = original_data
3734 |             
3735 |         return mapped_data
3736 | 
3737 | 
3738 |     def get_resources_strings(self):
3739 |         """Returns a list of all the strings found withing the resources (if any).
3740 |         
3741 |         This method will scan all entries in the resources directory of the PE, if
3742 |         there is one, and will return a list() with the strings.
3743 |         
3744 |         An empty list will be returned otherwise.
3745 |         """
3746 |         
3747 |         resources_strings = list()
3748 |         
3749 |         if hasattr(self, 'DIRECTORY_ENTRY_RESOURCE'):
3750 |             
3751 |             for resource_type in self.DIRECTORY_ENTRY_RESOURCE.entries:
3752 |                 if hasattr(resource_type, 'directory'):
3753 |                     for resource_id in resource_type.directory.entries:
3754 |                         if hasattr(resource_id, 'directory'):
3755 |                             if hasattr(resource_id.directory, 'strings') and resource_id.directory.strings:
3756 |                                 for res_string in resource_id.directory.strings.values():
3757 |                                     resources_strings.append( res_string )
3758 |                                     
3759 |         return resources_strings
3760 | 
3761 | 
3762 |     def get_data(self, rva=0, length=None):
3763 |         """Get data regardless of the section where it lies on.
3764 |         
3765 |         Given a RVA and the size of the chunk to retrieve, this method
3766 |         will find the section where the data lies and return the data.
3767 |         """
3768 |         
3769 |         s = self.get_section_by_rva(rva)
3770 |         
3771 |         if length:
3772 |             end = rva + length
3773 |         else:
3774 |             end = None
3775 | 
3776 |         if not s:
3777 |             if rva < len(self.header):
3778 |                 return self.header[rva:end]
3779 |             
3780 |             # Before we give up we check whether the file might
3781 |             # contain the data anyway. There are cases of PE files
3782 |             # without sections that rely on windows loading the first
3783 |             # 8291 bytes into memory and assume the data will be
3784 |             # there
3785 |             # A functional file with these characteristics is:
3786 |             # MD5: 0008892cdfbc3bda5ce047c565e52295
3787 |             # SHA-1: c7116b9ff950f86af256defb95b5d4859d4752a9
3788 |             #
3789 |             if rva < len(self.__data__):
3790 |                 return self.__data__[rva:end]
3791 |             
3792 |             raise PEFormatError, 'data at RVA can\'t be fetched. Corrupt header?'
3793 |         
3794 |         return s.get_data(rva, length)
3795 | 
3796 |     
3797 |     def get_rva_from_offset(self, offset):
3798 |         """Get the RVA corresponding to this file offset. """
3799 |         
3800 |         s = self.get_section_by_offset(offset)
3801 |         if not s:
3802 |             if self.sections:
3803 |                 lowest_rva = min( [ self.adjust_SectionAlignment( s.VirtualAddress, 
3804 |                     self.OPTIONAL_HEADER.SectionAlignment, self.OPTIONAL_HEADER.FileAlignment ) for s in self.sections] )
3805 |                 if offset < lowest_rva:
3806 |                     # We will assume that the offset lies within the headers, or
3807 |                     # at least points before where the earliest section starts
3808 |                     # and we will simply return the offset as the RVA
3809 |                     #
3810 |                     # The case illustrating this behavior can be found at:
3811 |                     # http://corkami.blogspot.com/2010/01/hey-hey-hey-whats-in-your-head.html
3812 |                     # where the import table is not contained by any section
3813 |                     # hence the RVA needs to be resolved to a raw offset
3814 |                     return offset
3815 |             else:
3816 |                 return offset
3817 |             #raise PEFormatError("specified offset (0x%x) doesn't belong to any section." % offset)
3818 |         return s.get_rva_from_offset(offset)
3819 |     
3820 |     def get_offset_from_rva(self, rva):
3821 |         """Get the file offset corresponding to this RVA.
3822 |         
3823 |         Given a RVA , this method will find the section where the
3824 |         data lies and return the offset within the file.
3825 |         """
3826 |         
3827 |         s = self.get_section_by_rva(rva)
3828 |         if not s:
3829 | 
3830 |             # If not found within a section assume it might
3831 |             # point to overlay data or otherwise data present
3832 |             # but not contained in any section. In those
3833 |             # cases the RVA should equal the offset
3834 |             if rva<len(self.__data__):
3835 |                 return rva
3836 |                 
3837 |             raise PEFormatError, 'data at RVA can\'t be fetched. Corrupt header?'
3838 |         
3839 |         return s.get_offset_from_rva(rva)
3840 |             
3841 |     
3842 |     def get_string_at_rva(self, rva):
3843 |         """Get an ASCII string located at the given address."""
3844 | 
3845 |         if rva is None:
3846 |             return None
3847 | 
3848 |         s = self.get_section_by_rva(rva)
3849 |         if not s:
3850 |             return self.get_string_from_data(0, self.__data__[rva:rva+MAX_STRING_LENGTH])
3851 |         
3852 |         return self.get_string_from_data( 0, s.get_data(rva, length=MAX_STRING_LENGTH) )
3853 |         
3854 |     
3855 |     def get_string_from_data(self, offset, data):
3856 |         """Get an ASCII string from within the data."""
3857 |         
3858 |         # OC Patch
3859 |         b = None
3860 |         
3861 |         try:
3862 |             b = data[offset]
3863 |         except IndexError:
3864 |             return ''
3865 |         
3866 |         s = ''
3867 |         while ord(b):
3868 |             s += b
3869 |             offset += 1
3870 |             try:
3871 |                 b = data[offset]
3872 |             except IndexError:
3873 |                 break
3874 |         
3875 |         return s
3876 |                 
3877 |     
3878 |     def get_string_u_at_rva(self, rva, max_length = 2**16):
3879 |         """Get an Unicode string located at the given address."""
3880 |         
3881 |         try:
3882 |             # If the RVA is invalid all would blow up. Some EXEs seem to be
3883 |             # specially nasty and have an invalid RVA.
3884 |             data = self.get_data(rva, 2)
3885 |         except PEFormatError, e:
3886 |             return None
3887 |         
3888 |         #length = struct.unpack('<H', data)[0]
3889 |         
3890 |         s = u''
3891 |         for idx in xrange(max_length):
3892 |             try:
3893 |                 uchr = struct.unpack('<H', self.get_data(rva+2*idx, 2))[0]
3894 |             except struct.error:
3895 |                 break
3896 |             
3897 |             if unichr(uchr) == u'\0':
3898 |                 break
3899 |             s += unichr(uchr)
3900 |         
3901 |         return s
3902 | 
3903 |     
3904 |     def get_section_by_offset(self, offset):
3905 |         """Get the section containing the given file offset."""
3906 |         
3907 |         sections = [s for s in self.sections if s.contains_offset(offset)]
3908 |         
3909 |         if sections:
3910 |             return sections[0]
3911 |         
3912 |         return None
3913 | 
3914 |     
3915 |     def get_section_by_rva(self, rva):
3916 |         """Get the section containing the given address."""
3917 |         
3918 |         sections = [s for s in self.sections if s.contains_rva(rva)]
3919 |         
3920 |         if sections:
3921 |             return sections[0]
3922 |         
3923 |         return None
3924 |     
3925 |     def __str__(self):
3926 |         return self.dump_info()
3927 |         
3928 |     
3929 |     def print_info(self):
3930 |         """Print all the PE header information in a human readable from."""
3931 |         print self.dump_info()
3932 |         
3933 |     
3934 |     def dump_info(self, dump=None):
3935 |         """Dump all the PE header information into human readable string."""
3936 |         
3937 |         
3938 |         if dump is None:
3939 |             dump = Dump()
3940 |         
3941 |         warnings = self.get_warnings()
3942 |         if warnings:
3943 |             dump.add_header('Parsing Warnings')
3944 |             for warning in warnings:
3945 |                 dump.add_line(warning)
3946 |                 dump.add_newline()
3947 |         
3948 |         
3949 |         dump.add_header('DOS_HEADER')
3950 |         dump.add_lines(self.DOS_HEADER.dump())
3951 |         dump.add_newline()
3952 |         
3953 |         dump.add_header('NT_HEADERS')
3954 |         dump.add_lines(self.NT_HEADERS.dump())
3955 |         dump.add_newline()
3956 |         
3957 |         dump.add_header('FILE_HEADER')
3958 |         dump.add_lines(self.FILE_HEADER.dump())
3959 |         
3960 |         image_flags = retrieve_flags(IMAGE_CHARACTERISTICS, 'IMAGE_FILE_')
3961 |         
3962 |         dump.add('Flags: ')
3963 |         flags = []
3964 |         for flag in image_flags:
3965 |             if getattr(self.FILE_HEADER, flag[0]):
3966 |                 flags.append(flag[0])
3967 |         dump.add_line(', '.join(flags))
3968 |         dump.add_newline()
3969 |         
3970 |         if hasattr(self, 'OPTIONAL_HEADER') and self.OPTIONAL_HEADER is not None:
3971 |             dump.add_header('OPTIONAL_HEADER')
3972 |             dump.add_lines(self.OPTIONAL_HEADER.dump())
3973 |         
3974 |         dll_characteristics_flags = retrieve_flags(DLL_CHARACTERISTICS, 'IMAGE_DLL_CHARACTERISTICS_')
3975 |         
3976 |         dump.add('DllCharacteristics: ')
3977 |         flags = []
3978 |         for flag in dll_characteristics_flags:
3979 |             if getattr(self.OPTIONAL_HEADER, flag[0]):
3980 |                 flags.append(flag[0])
3981 |         dump.add_line(', '.join(flags))
3982 |         dump.add_newline()
3983 |         
3984 |         
3985 |         dump.add_header('PE Sections')
3986 |         
3987 |         section_flags = retrieve_flags(SECTION_CHARACTERISTICS, 'IMAGE_SCN_')
3988 |         
3989 |         for section in self.sections:
3990 |             dump.add_lines(section.dump())
3991 |             dump.add('Flags: ')
3992 |             flags = []
3993 |             for flag in section_flags:
3994 |                 if getattr(section, flag[0]):
3995 |                     flags.append(flag[0])
3996 |             dump.add_line(', '.join(flags))
3997 |             dump.add_line('Entropy: %f (Min=0.0, Max=8.0)' % section.get_entropy() )
3998 |             if md5 is not None:
3999 |                 dump.add_line('MD5     hash: %s' % section.get_hash_md5() )
4000 |             if sha1 is not None:
4001 |                 dump.add_line('SHA-1   hash: %s' % section.get_hash_sha1() )
4002 |             if sha256 is not None:
4003 |                 dump.add_line('SHA-256 hash: %s' % section.get_hash_sha256() )
4004 |             if sha512 is not None:
4005 |                 dump.add_line('SHA-512 hash: %s' % section.get_hash_sha512() )
4006 |             dump.add_newline()
4007 |             
4008 |         
4009 |         
4010 |         if (hasattr(self, 'OPTIONAL_HEADER') and
4011 |             hasattr(self.OPTIONAL_HEADER, 'DATA_DIRECTORY') ):
4012 |             
4013 |             dump.add_header('Directories')
4014 |             for idx in xrange(len(self.OPTIONAL_HEADER.DATA_DIRECTORY)):
4015 |                 directory = self.OPTIONAL_HEADER.DATA_DIRECTORY[idx]
4016 |                 dump.add_lines(directory.dump())
4017 |             dump.add_newline()
4018 |             
4019 |         
4020 |         def convert_char(char):
4021 |             if char in string.ascii_letters or char in string.digits or char in string.punctuation or char in string.whitespace:
4022 |                 return char
4023 |             else:
4024 |                 return r'\x%02x' % ord(char)
4025 |                 
4026 |         def convert_to_printable(s):
4027 |             return ''.join([convert_char(c) for c in s])
4028 |         
4029 |         if hasattr(self, 'VS_VERSIONINFO'):
4030 |             dump.add_header('Version Information')
4031 |             dump.add_lines(self.VS_VERSIONINFO.dump())
4032 |             dump.add_newline()
4033 |             
4034 |             if hasattr(self, 'VS_FIXEDFILEINFO'):
4035 |                 dump.add_lines(self.VS_FIXEDFILEINFO.dump())
4036 |                 dump.add_newline()
4037 |             
4038 |             if hasattr(self, 'FileInfo'):
4039 |                 for entry in self.FileInfo:
4040 |                     dump.add_lines(entry.dump())
4041 |                     dump.add_newline()
4042 |                     
4043 |                     if hasattr(entry, 'StringTable'):
4044 |                         for st_entry in entry.StringTable:
4045 |                             [dump.add_line('  '+line) for line in st_entry.dump()]
4046 |                             dump.add_line('  LangID: '+st_entry.LangID)
4047 |                             dump.add_newline()
4048 |                             for str_entry in st_entry.entries.items():
4049 |                                 dump.add_line( '    ' +
4050 |                                     convert_to_printable(str_entry[0]) + ': ' +
4051 |                                     convert_to_printable(str_entry[1]) )
4052 |                         dump.add_newline()
4053 |                     
4054 |                     elif hasattr(entry, 'Var'):
4055 |                         for var_entry in entry.Var:
4056 |                             if hasattr(var_entry, 'entry'):
4057 |                                 [dump.add_line('  '+line) for line in var_entry.dump()]
4058 |                                 dump.add_line(
4059 |                                     '    ' + 
4060 |                                     convert_to_printable(var_entry.entry.keys()[0]) +
4061 |                                     ': ' + var_entry.entry.values()[0])
4062 |                         
4063 |                         dump.add_newline()
4064 | 
4065 |             
4066 |         
4067 |         if hasattr(self, 'DIRECTORY_ENTRY_EXPORT'):
4068 |             dump.add_header('Exported symbols')
4069 |             dump.add_lines(self.DIRECTORY_ENTRY_EXPORT.struct.dump())
4070 |             dump.add_newline()
4071 |             dump.add_line('%-10s   %-10s  %s' % ('Ordinal', 'RVA', 'Name'))
4072 |             for export in self.DIRECTORY_ENTRY_EXPORT.symbols:
4073 |                 if export.address is not None:
4074 |                     dump.add('%-10d 0x%08Xh    %s' % (
4075 |                         export.ordinal, export.address, export.name))
4076 |                     if export.forwarder:
4077 |                         dump.add_line(' forwarder: %s' % export.forwarder)
4078 |                     else:
4079 |                         dump.add_newline()
4080 |             
4081 |             dump.add_newline()
4082 |         
4083 |         if hasattr(self, 'DIRECTORY_ENTRY_IMPORT'):
4084 |             dump.add_header('Imported symbols')
4085 |             for module in self.DIRECTORY_ENTRY_IMPORT:
4086 |                 dump.add_lines(module.struct.dump())
4087 |                 dump.add_newline()
4088 |                 for symbol in module.imports:
4089 |                     
4090 |                     if symbol.import_by_ordinal is True:
4091 |                         dump.add('%s Ordinal[%s] (Imported by Ordinal)' % (
4092 |                             module.dll, str(symbol.ordinal)))
4093 |                     else:
4094 |                         dump.add('%s.%s Hint[%s]' % (
4095 |                             module.dll, symbol.name, str(symbol.hint)))
4096 |                     
4097 |                     if symbol.bound:
4098 |                         dump.add_line(' Bound: 0x%08X' % (symbol.bound))
4099 |                     else:
4100 |                         dump.add_newline()
4101 |                 dump.add_newline()
4102 |         
4103 |         
4104 |         if hasattr(self, 'DIRECTORY_ENTRY_BOUND_IMPORT'):
4105 |             dump.add_header('Bound imports')
4106 |             for bound_imp_desc in self.DIRECTORY_ENTRY_BOUND_IMPORT:
4107 |                 
4108 |                 dump.add_lines(bound_imp_desc.struct.dump())
4109 |                 dump.add_line('DLL: %s' % bound_imp_desc.name)
4110 |                 dump.add_newline()
4111 |                 
4112 |                 for bound_imp_ref in bound_imp_desc.entries:
4113 |                     dump.add_lines(bound_imp_ref.struct.dump(), 4)
4114 |                     dump.add_line('DLL: %s' % bound_imp_ref.name, 4)
4115 |                     dump.add_newline()
4116 | 
4117 |         
4118 |         if hasattr(self, 'DIRECTORY_ENTRY_DELAY_IMPORT'):
4119 |             dump.add_header('Delay Imported symbols')
4120 |             for module in self.DIRECTORY_ENTRY_DELAY_IMPORT:
4121 |                 
4122 |                 dump.add_lines(module.struct.dump())
4123 |                 dump.add_newline()
4124 |                 
4125 |                 for symbol in module.imports:
4126 |                     if symbol.import_by_ordinal is True:
4127 |                         dump.add('%s Ordinal[%s] (Imported by Ordinal)' % (
4128 |                             module.dll, str(symbol.ordinal)))
4129 |                     else:
4130 |                         dump.add('%s.%s Hint[%s]' % (
4131 |                             module.dll, symbol.name, str(symbol.hint)))
4132 |                     
4133 |                     if symbol.bound:
4134 |                         dump.add_line(' Bound: 0x%08X' % (symbol.bound))
4135 |                     else:
4136 |                         dump.add_newline()
4137 |                 dump.add_newline()
4138 |         
4139 |         
4140 |         if hasattr(self, 'DIRECTORY_ENTRY_RESOURCE'):
4141 |             dump.add_header('Resource directory')
4142 |             
4143 |             dump.add_lines(self.DIRECTORY_ENTRY_RESOURCE.struct.dump())
4144 |             
4145 |             for resource_type in self.DIRECTORY_ENTRY_RESOURCE.entries:
4146 |                 
4147 |                 if resource_type.name is not None:
4148 |                     dump.add_line('Name: [%s]' % resource_type.name, 2)
4149 |                 else:
4150 |                     dump.add_line('Id: [0x%X] (%s)' % (
4151 |                         resource_type.struct.Id, RESOURCE_TYPE.get(
4152 |                             resource_type.struct.Id, '-')),
4153 |                         2)
4154 |                 
4155 |                 dump.add_lines(resource_type.struct.dump(), 2)
4156 |                 
4157 |                 if hasattr(resource_type, 'directory'):
4158 |                     
4159 |                     dump.add_lines(resource_type.directory.struct.dump(), 4)
4160 |                     
4161 |                     for resource_id in resource_type.directory.entries:
4162 |                         
4163 |                         if resource_id.name is not None:
4164 |                             dump.add_line('Name: [%s]' % resource_id.name, 6)
4165 |                         else:
4166 |                             dump.add_line('Id: [0x%X]' % resource_id.struct.Id, 6)
4167 |                         
4168 |                         dump.add_lines(resource_id.struct.dump(), 6)
4169 |                         
4170 |                         if hasattr(resource_id, 'directory'):
4171 |                             dump.add_lines(resource_id.directory.struct.dump(), 8)
4172 |                             
4173 |                             for resource_lang in resource_id.directory.entries:
4174 |                                 if hasattr(resource_lang, 'data'):
4175 |                                     dump.add_line('\\--- LANG [%d,%d][%s,%s]' % (
4176 |                                         resource_lang.data.lang,
4177 |                                         resource_lang.data.sublang,
4178 |                                         LANG.get(resource_lang.data.lang, '*unknown*'), 
4179 |                                         get_sublang_name_for_lang( resource_lang.data.lang, resource_lang.data.sublang ) ), 8)
4180 |                                     dump.add_lines(resource_lang.struct.dump(), 10)
4181 |                                     dump.add_lines(resource_lang.data.struct.dump(), 12)
4182 |                             if hasattr(resource_id.directory, 'strings') and resource_id.directory.strings:
4183 |                                 dump.add_line( '[STRINGS]' , 10 )
4184 |                                 for idx, res_string in resource_id.directory.strings.items():
4185 |                                     dump.add_line( '%6d: %s' % (idx, convert_to_printable(res_string) ), 12 )
4186 |                                         
4187 |                 dump.add_newline()
4188 |             
4189 |             dump.add_newline()
4190 |         
4191 |         
4192 |         if ( hasattr(self, 'DIRECTORY_ENTRY_TLS') and
4193 |              self.DIRECTORY_ENTRY_TLS and
4194 |              self.DIRECTORY_ENTRY_TLS.struct ):
4195 |             
4196 |             dump.add_header('TLS')
4197 |             dump.add_lines(self.DIRECTORY_ENTRY_TLS.struct.dump())
4198 |             dump.add_newline()
4199 |         
4200 |         
4201 |         if ( hasattr(self, 'DIRECTORY_ENTRY_LOAD_CONFIG') and
4202 |              self.DIRECTORY_ENTRY_LOAD_CONFIG and
4203 |              self.DIRECTORY_ENTRY_LOAD_CONFIG.struct ):
4204 |              
4205 |             dump.add_header('LOAD_CONFIG')
4206 |             dump.add_lines(self.DIRECTORY_ENTRY_LOAD_CONFIG.struct.dump())
4207 |             dump.add_newline()
4208 |         
4209 |         
4210 |         if hasattr(self, 'DIRECTORY_ENTRY_DEBUG'):
4211 |             dump.add_header('Debug information')
4212 |             for dbg in self.DIRECTORY_ENTRY_DEBUG:
4213 |                 dump.add_lines(dbg.struct.dump())
4214 |                 try:
4215 |                     dump.add_line('Type: '+DEBUG_TYPE[dbg.struct.Type])
4216 |                 except KeyError:
4217 |                     dump.add_line('Type: 0x%x(Unknown)' % dbg.struct.Type)
4218 |                 dump.add_newline()
4219 |         
4220 |         
4221 |         if hasattr(self, 'DIRECTORY_ENTRY_BASERELOC'):
4222 |             dump.add_header('Base relocations')
4223 |             for base_reloc in self.DIRECTORY_ENTRY_BASERELOC:
4224 |                 dump.add_lines(base_reloc.struct.dump())
4225 |                 for reloc in base_reloc.entries:
4226 |                     try:
4227 |                         dump.add_line('%08Xh %s' % (
4228 |                             reloc.rva, RELOCATION_TYPE[reloc.type][16:]), 4)
4229 |                     except KeyError:
4230 |                         dump.add_line('0x%08X 0x%x(Unknown)' % (
4231 |                             reloc.rva, reloc.type), 4)
4232 |                 dump.add_newline()
4233 | 
4234 |         
4235 |         return dump.get_text()
4236 |     
4237 |     # OC Patch
4238 |     def get_physical_by_rva(self, rva):
4239 |         """Gets the physical address in the PE file from an RVA value."""
4240 |         try:
4241 |             return self.get_offset_from_rva(rva)
4242 |         except Exception:
4243 |             return None
4244 | 
4245 |     
4246 |     ##
4247 |     # Double-Word get/set
4248 |     ##
4249 |     
4250 |     def get_data_from_dword(self, dword):
4251 |         """Return a four byte string representing the double word value. (little endian)."""
4252 |         return struct.pack('<L', dword & 0xffffffff)
4253 | 
4254 |     
4255 |     def get_dword_from_data(self, data, offset):
4256 |         """Convert four bytes of data to a double word (little endian)
4257 |         
4258 |         'offset' is assumed to index into a dword array. So setting it to
4259 |         N will return a dword out of the data starting at offset N*4.
4260 |         
4261 |         Returns None if the data can't be turned into a double word.
4262 |         """
4263 |         
4264 |         if (offset+1)*4 > len(data):
4265 |             return None
4266 |         
4267 |         return struct.unpack('<I', data[offset*4:(offset+1)*4])[0]
4268 |         
4269 |     
4270 |     def get_dword_at_rva(self, rva):
4271 |         """Return the double word value at the given RVA.
4272 |         
4273 |         Returns None if the value can't be read, i.e. the RVA can't be mapped
4274 |         to a file offset.
4275 |         """
4276 |         
4277 |         try:
4278 |             return self.get_dword_from_data(self.get_data(rva)[:4], 0)
4279 |         except PEFormatError:
4280 |             return None
4281 | 
4282 |     
4283 |     def get_dword_from_offset(self, offset):
4284 |         """Return the double word value at the given file offset. (little endian)"""
4285 |         
4286 |         if offset+4 > len(self.__data__):
4287 |             return None
4288 |         
4289 |         return self.get_dword_from_data(self.__data__[offset:offset+4], 0)
4290 | 
4291 |     
4292 |     def set_dword_at_rva(self, rva, dword):
4293 |         """Set the double word value at the file offset corresponding to the given RVA."""
4294 |         return self.set_bytes_at_rva(rva, self.get_data_from_dword(dword))
4295 | 
4296 |     
4297 |     def set_dword_at_offset(self, offset, dword):
4298 |         """Set the double word value at the given file offset."""
4299 |         return self.set_bytes_at_offset(offset, self.get_data_from_dword(dword))
4300 | 
4301 | 
4302 |     
4303 |     ##
4304 |     # Word get/set
4305 |     ##
4306 |     
4307 |     def get_data_from_word(self, word):
4308 |         """Return a two byte string representing the word value. (little endian)."""
4309 |         return struct.pack('<H', word)
4310 | 
4311 |     
4312 |     def get_word_from_data(self, data, offset):
4313 |         """Convert two bytes of data to a word (little endian)
4314 |         
4315 |         'offset' is assumed to index into a word array. So setting it to
4316 |         N will return a dword out of the data starting at offset N*2.
4317 |         
4318 |         Returns None if the data can't be turned into a word.
4319 |         """
4320 |         
4321 |         if (offset+1)*2 > len(data):
4322 |             return None
4323 |         
4324 |         return struct.unpack('<H', data[offset*2:(offset+1)*2])[0]
4325 | 
4326 |     
4327 |     def get_word_at_rva(self, rva):
4328 |         """Return the word value at the given RVA.
4329 |         
4330 |         Returns None if the value can't be read, i.e. the RVA can't be mapped
4331 |         to a file offset.
4332 |         """
4333 |         
4334 |         try:
4335 |             return self.get_word_from_data(self.get_data(rva)[:2], 0)
4336 |         except PEFormatError:
4337 |             return None
4338 | 
4339 |     
4340 |     def get_word_from_offset(self, offset):
4341 |         """Return the word value at the given file offset. (little endian)"""
4342 |         
4343 |         if offset+2 > len(self.__data__):
4344 |             return None
4345 |         
4346 |         return self.get_word_from_data(self.__data__[offset:offset+2], 0)
4347 | 
4348 |     
4349 |     def set_word_at_rva(self, rva, word):
4350 |         """Set the word value at the file offset corresponding to the given RVA."""
4351 |         return self.set_bytes_at_rva(rva, self.get_data_from_word(word))
4352 | 
4353 |     
4354 |     def set_word_at_offset(self, offset, word):
4355 |         """Set the word value at the given file offset."""
4356 |         return self.set_bytes_at_offset(offset, self.get_data_from_word(word))
4357 | 
4358 |     
4359 |     ##
4360 |     # Quad-Word get/set
4361 |     ##
4362 |     
4363 |     def get_data_from_qword(self, word):
4364 |         """Return a eight byte string representing the quad-word value. (little endian)."""
4365 |         return struct.pack('<Q', word)
4366 | 
4367 |     
4368 |     def get_qword_from_data(self, data, offset):
4369 |         """Convert eight bytes of data to a word (little endian)
4370 |         
4371 |         'offset' is assumed to index into a word array. So setting it to
4372 |         N will return a dword out of the data starting at offset N*8.
4373 |         
4374 |         Returns None if the data can't be turned into a quad word.
4375 |         """
4376 |         
4377 |         if (offset+1)*8 > len(data):
4378 |             return None
4379 |         
4380 |         return struct.unpack('<Q', data[offset*8:(offset+1)*8])[0]
4381 | 
4382 |     
4383 |     def get_qword_at_rva(self, rva):
4384 |         """Return the quad-word value at the given RVA.
4385 |         
4386 |         Returns None if the value can't be read, i.e. the RVA can't be mapped
4387 |         to a file offset.
4388 |         """
4389 |         
4390 |         try:
4391 |             return self.get_qword_from_data(self.get_data(rva)[:8], 0)
4392 |         except PEFormatError:
4393 |             return None
4394 | 
4395 |     
4396 |     def get_qword_from_offset(self, offset):
4397 |         """Return the quad-word value at the given file offset. (little endian)"""
4398 |         
4399 |         if offset+8 > len(self.__data__):
4400 |             return None
4401 |         
4402 |         return self.get_qword_from_data(self.__data__[offset:offset+8], 0)
4403 | 
4404 |     
4405 |     def set_qword_at_rva(self, rva, qword):
4406 |         """Set the quad-word value at the file offset corresponding to the given RVA."""
4407 |         return self.set_bytes_at_rva(rva, self.get_data_from_qword(qword))
4408 | 
4409 |     
4410 |     def set_qword_at_offset(self, offset, qword):
4411 |         """Set the quad-word value at the given file offset."""
4412 |         return self.set_bytes_at_offset(offset, self.get_data_from_qword(qword))
4413 | 
4414 | 
4415 |     
4416 |     ##
4417 |     # Set bytes
4418 |     ##
4419 |     
4420 |     
4421 |     def set_bytes_at_rva(self, rva, data):
4422 |         """Overwrite, with the given string, the bytes at the file offset corresponding to the given RVA.
4423 |         
4424 |         Return True if successful, False otherwise. It can fail if the
4425 |         offset is outside the file's boundaries.
4426 |         """
4427 |         
4428 |         if not isinstance(data, str):
4429 |             raise TypeError('data should be of type: str')
4430 | 
4431 |         offset = self.get_physical_by_rva(rva)
4432 |         if not offset:
4433 |             return False
4434 |         
4435 |         return self.set_bytes_at_offset(offset, data)
4436 |         
4437 |     
4438 |     def set_bytes_at_offset(self, offset, data):
4439 |         """Overwrite the bytes at the given file offset with the given string.
4440 |         
4441 |         Return True if successful, False otherwise. It can fail if the
4442 |         offset is outside the file's boundaries.
4443 |         """
4444 |         
4445 |         if not isinstance(data, str):
4446 |             raise TypeError('data should be of type: str')
4447 |         
4448 |         if offset >= 0 and offset < len(self.__data__):
4449 |             self.__data__ = ( self.__data__[:offset] + data + self.__data__[offset+len(data):] )
4450 |         else:
4451 |             return False
4452 |         
4453 |         return True
4454 |     
4455 | 
4456 |     def merge_modified_section_data(self):
4457 |         """Update the PE image content with any individual section data that has been modified."""
4458 |         
4459 |         for section in self.sections:
4460 |             section_data_start = self.adjust_FileAlignment( section.PointerToRawData,
4461 |                 self.OPTIONAL_HEADER.FileAlignment )
4462 |             section_data_end = section_data_start+section.SizeOfRawData
4463 |             if section_data_start < len(self.__data__) and section_data_end < len(self.__data__):
4464 |                 self.__data__ = self.__data__[:section_data_start] + section.get_data() + self.__data__[section_data_end:]
4465 |         
4466 |     
4467 |     def relocate_image(self, new_ImageBase):
4468 |         """Apply the relocation information to the image using the provided new image base.
4469 |         
4470 |         This method will apply the relocation information to the image. Given the new base,
4471 |         all the relocations will be processed and both the raw data and the section's data
4472 |         will be fixed accordingly.
4473 |         The resulting image can be retrieved as well through the method:
4474 |             
4475 |             get_memory_mapped_image()
4476 |         
4477 |         In order to get something that would more closely match what could be found in memory
4478 |         once the Windows loader finished its work.
4479 |         """
4480 |         
4481 |         relocation_difference = new_ImageBase - self.OPTIONAL_HEADER.ImageBase
4482 |         
4483 |         
4484 |         for reloc in self.DIRECTORY_ENTRY_BASERELOC:
4485 |             
4486 |             virtual_address = reloc.struct.VirtualAddress
4487 |             size_of_block = reloc.struct.SizeOfBlock
4488 |             
4489 |             # We iterate with an index because if the relocation is of type
4490 |             # IMAGE_REL_BASED_HIGHADJ we need to also process the next entry
4491 |             # at once and skip it for the next iteration
4492 |             #
4493 |             entry_idx = 0
4494 |             while entry_idx<len(reloc.entries):
4495 |                 
4496 |                 entry = reloc.entries[entry_idx]
4497 |                 entry_idx += 1
4498 |                 
4499 |                 if entry.type == RELOCATION_TYPE['IMAGE_REL_BASED_ABSOLUTE']:
4500 |                     # Nothing to do for this type of relocation
4501 |                     pass
4502 |                 
4503 |                 elif entry.type == RELOCATION_TYPE['IMAGE_REL_BASED_HIGH']:
4504 |                     # Fix the high 16bits of a relocation
4505 |                     #
4506 |                     # Add high 16bits of relocation_difference to the
4507 |                     # 16bit value at RVA=entry.rva
4508 |                     
4509 |                     self.set_word_at_rva(
4510 |                         entry.rva,
4511 |                         ( self.get_word_at_rva(entry.rva) + relocation_difference>>16)&0xffff )
4512 |                 
4513 |                 elif entry.type == RELOCATION_TYPE['IMAGE_REL_BASED_LOW']:
4514 |                     # Fix the low 16bits of a relocation
4515 |                     #
4516 |                     # Add low 16 bits of relocation_difference to the 16bit value
4517 |                     # at RVA=entry.rva
4518 |                     
4519 |                     self.set_word_at_rva(
4520 |                         entry.rva,
4521 |                         ( self.get_word_at_rva(entry.rva) + relocation_difference)&0xffff)
4522 |                 
4523 |                 elif entry.type == RELOCATION_TYPE['IMAGE_REL_BASED_HIGHLOW']:
4524 |                     # Handle all high and low parts of a 32bit relocation
4525 |                     #
4526 |                     # Add relocation_difference to the value at RVA=entry.rva
4527 |                     
4528 |                     self.set_dword_at_rva(
4529 |                         entry.rva,
4530 |                         self.get_dword_at_rva(entry.rva)+relocation_difference)
4531 |                 
4532 |                 elif entry.type == RELOCATION_TYPE['IMAGE_REL_BASED_HIGHADJ']:
4533 |                     # Fix the high 16bits of a relocation and adjust
4534 |                     #
4535 |                     # Add high 16bits of relocation_difference to the 32bit value
4536 |                     # composed from the (16bit value at RVA=entry.rva)<<16 plus
4537 |                     # the 16bit value at the next relocation entry.
4538 |                     #
4539 |                     
4540 |                     # If the next entry is beyond the array's limits,
4541 |                     # abort... the table is corrupt
4542 |                     #
4543 |                     if entry_idx == len(reloc.entries):
4544 |                         break
4545 |                     
4546 |                     next_entry = reloc.entries[entry_idx]
4547 |                     entry_idx += 1
4548 |                     self.set_word_at_rva( entry.rva,
4549 |                         ((self.get_word_at_rva(entry.rva)<<16) + next_entry.rva +
4550 |                         relocation_difference & 0xffff0000) >> 16 )
4551 |                 
4552 |                 elif entry.type == RELOCATION_TYPE['IMAGE_REL_BASED_DIR64']:
4553 |                     # Apply the difference to the 64bit value at the offset
4554 |                     # RVA=entry.rva
4555 |                     
4556 |                     self.set_qword_at_rva(
4557 |                         entry.rva,
4558 |                         self.get_qword_at_rva(entry.rva) + relocation_difference)
4559 |                 
4560 |     
4561 |     def verify_checksum(self):
4562 |         
4563 |         return self.OPTIONAL_HEADER.CheckSum == self.generate_checksum()
4564 |         
4565 |     
4566 |     def generate_checksum(self):
4567 |         
4568 |         # This will make sure that the data representing the PE image
4569 |         # is updated with any changes that might have been made by
4570 |         # assigning values to header fields as those are not automatically
4571 |         # updated upon assignment.
4572 |         #
4573 |         self.__data__ = self.write()
4574 |         
4575 |         # Get the offset to the CheckSum field in the OptionalHeader
4576 |         #
4577 |         checksum_offset = self.OPTIONAL_HEADER.__file_offset__ + 0x40 # 64
4578 |         
4579 |         checksum = 0
4580 |         
4581 |         # Verify the data is dword-aligned. Add padding if needed
4582 |         #
4583 |         remainder = len(self.__data__) % 4
4584 |         data = self.__data__ + ( '\0' * ((4-remainder) * ( remainder != 0 )) )
4585 |         
4586 |         for i in range( len( data ) / 4 ):
4587 |             
4588 |             # Skip the checksum field
4589 |             #
4590 |             if i == checksum_offset / 4:
4591 |                 continue
4592 |             
4593 |             dword = struct.unpack('I', data[ i*4 : i*4+4 ])[0]
4594 |             checksum = (checksum & 0xffffffff) + dword + (checksum>>32)
4595 |             if checksum > 2**32:
4596 |                 checksum = (checksum & 0xffffffff) + (checksum >> 32)
4597 |         
4598 |         checksum = (checksum & 0xffff) + (checksum >> 16)
4599 |         checksum = (checksum) + (checksum >> 16)
4600 |         checksum = checksum & 0xffff
4601 |         
4602 |         # The length is the one of the original data, not the padded one
4603 |         #
4604 |         return checksum + len(self.__data__)
4605 |     
4606 |     
4607 |     def is_exe(self):
4608 |         """Check whether the file is a standard executable.
4609 |         
4610 |         This will return true only if the file has the IMAGE_FILE_EXECUTABLE_IMAGE flag set
4611 |         and the IMAGE_FILE_DLL not set and the file does not appear to be a driver either.
4612 |         """
4613 |         
4614 |         EXE_flag = IMAGE_CHARACTERISTICS['IMAGE_FILE_EXECUTABLE_IMAGE']
4615 |         
4616 |         if (not self.is_dll()) and (not self.is_driver()) and ( 
4617 |                 EXE_flag & self.FILE_HEADER.Characteristics) == EXE_flag:
4618 |             return True
4619 |         
4620 |         return False
4621 |     
4622 |     
4623 |     def is_dll(self):
4624 |         """Check whether the file is a standard DLL.
4625 |         
4626 |         This will return true only if the image has the IMAGE_FILE_DLL flag set.
4627 |         """
4628 |         
4629 |         DLL_flag = IMAGE_CHARACTERISTICS['IMAGE_FILE_DLL']
4630 |         
4631 |         if ( DLL_flag & self.FILE_HEADER.Characteristics) == DLL_flag:
4632 |             return True
4633 |             
4634 |         return False
4635 |     
4636 |     
4637 |     def is_driver(self):
4638 |         """Check whether the file is a Windows driver.
4639 |         
4640 |         This will return true only if there are reliable indicators of the image
4641 |         being a driver.
4642 |         """
4643 | 
4644 |         # Checking that the ImageBase field of the OptionalHeader is above or
4645 |         # equal to 0x80000000 (that is, whether it lies in the upper 2GB of
4646 |         # the address space, normally belonging to the kernel) is not a
4647 |         # reliable enough indicator.  For instance, PEs that play the invalid
4648 |         # ImageBase trick to get relocated could be incorrectly assumed to be
4649 |         # drivers.
4650 |          
4651 |         # This is not reliable either...
4652 |         #   
4653 |         # if any( (section.Characteristics & SECTION_CHARACTERISTICS['IMAGE_SCN_MEM_NOT_PAGED']) for section in self.sections ):
4654 |         #    return True
4655 |             
4656 |         if hasattr(self, 'DIRECTORY_ENTRY_IMPORT'):
4657 |             
4658 |             # If it imports from "ntoskrnl.exe" or other kernel components it should be a driver
4659 |             #
4660 |             if set( ('ntoskrnl.exe', 'hal.dll', 'ndis.sys', 'bootvid.dll', 'kdcom.dll' ) ).intersection( [ imp.dll.lower() for imp in self.DIRECTORY_ENTRY_IMPORT ] ):
4661 |                 return True
4662 |                 
4663 |         return False
4664 |     
4665 | 
4666 |     def get_overlay_data_start_offset(self):
4667 |         """Get the offset of data appended to the file and not contained within the area described in the headers."""
4668 | 
4669 |         highest_PointerToRawData = 0
4670 |         highest_SizeOfRawData = 0
4671 |         for section in self.sections:
4672 |             
4673 |             # If a section seems to fall outside the boundaries of the file we assume it's either
4674 |             # because of intentionally misleading values or because the file is truncated
4675 |             # In either case we skip it
4676 |             if section.PointerToRawData + section.SizeOfRawData > len(self.__data__):
4677 |                 continue
4678 |                 
4679 |             if section.PointerToRawData + section.SizeOfRawData > highest_PointerToRawData + highest_SizeOfRawData:
4680 |                 highest_PointerToRawData = section.PointerToRawData
4681 |                 highest_SizeOfRawData = section.SizeOfRawData
4682 |                 
4683 |         if len(self.__data__) > highest_PointerToRawData + highest_SizeOfRawData:
4684 |             return highest_PointerToRawData + highest_SizeOfRawData
4685 |             
4686 |         return None
4687 |     
4688 |     
4689 |     def get_overlay(self):
4690 |         """Get the data appended to the file and not contained within the area described in the headers."""
4691 | 
4692 |         overlay_data_offset = self.get_overlay_data_start_offset()
4693 |                 
4694 |         if overlay_data_offset is not None:
4695 |             return self.__data__[ overlay_data_offset : ]
4696 |             
4697 |         return None
4698 |         
4699 |     
4700 |     def trim(self):
4701 |         """Return the just data defined by the PE headers, removing any overlayed data."""
4702 |     
4703 |         overlay_data_offset = self.get_overlay_data_start_offset()
4704 |                 
4705 |         if overlay_data_offset is not None:
4706 |             return self.__data__[ : overlay_data_offset ]
4707 |             
4708 |         return self.__data__[:]
4709 |         
4710 |         
4711 |     # According to http://corkami.blogspot.com/2010/01/parce-que-la-planche-aura-brule.html
4712 |     # if PointerToRawData is less that 0x200 it's rounded to zero. Loading the test file
4713 |     # in a debugger it's easy to verify that the PointerToRawData value of 1 is rounded
4714 |     # to zero. Hence we reproduce the behabior
4715 |     #
4716 |     # According to the document: 
4717 |     # [ Microsoft Portable Executable and Common Object File Format Specification ]
4718 |     # "The alignment factor (in bytes) that is used to align the raw data of sections in
4719 |     #  the image file. The value should be a power of 2 between 512 and 64 K, inclusive.
4720 |     #  The default is 512. If the SectionAlignment is less than the architecture’s page
4721 |     #  size, then FileAlignment must match SectionAlignment."
4722 |     #
4723 |     # The following is a hardcoded constant if the Windows loader
4724 |     def adjust_FileAlignment( self, val, file_alignment ):
4725 |         global FileAlignment_Warning
4726 |         if file_alignment > FILE_ALIGNEMNT_HARDCODED_VALUE:
4727 |             # If it's not a power of two, report it:
4728 |             if not power_of_two(file_alignment) and FileAlignment_Warning is False:
4729 |                 self.__warnings.append(
4730 |                     'If FileAlignment > 0x200 it should be a power of 2. Value: %x' % (
4731 |                         file_alignment)  )
4732 |                 FileAlignment_Warning = True
4733 | 
4734 |         if file_alignment < FILE_ALIGNEMNT_HARDCODED_VALUE:
4735 |             return val
4736 |         return (val / 0x200) * 0x200
4737 | 
4738 | 
4739 |     # According to the document: 
4740 |     # [ Microsoft Portable Executable and Common Object File Format Specification ]
4741 |     # "The alignment (in bytes) of sections when they are loaded into memory. It must be
4742 |     #  greater than or equal to FileAlignment. The default is the page size for the
4743 |     #  architecture."
4744 |     #
4745 |     def adjust_SectionAlignment( self, val, section_alignment, file_alignment ):
4746 |         global SectionAlignment_Warning
4747 |         if file_alignment < FILE_ALIGNEMNT_HARDCODED_VALUE:
4748 |             if file_alignment != section_alignment and SectionAlignment_Warning is False:
4749 |                 self.__warnings.append(
4750 |                     'If FileAlignment(%x) < 0x200 it should equal SectionAlignment(%x)' % (
4751 |                         file_alignment, section_alignment)  )
4752 |                 SectionAlignment_Warning = True
4753 |             
4754 |         if section_alignment < 0x1000: # page size
4755 |             section_alignment = file_alignment
4756 |             
4757 |         # 0x200 is the minimum valid FileAlignment according to the documentation
4758 |         # although ntoskrnl.exe has an alignment of 0x80 in some Windows versions
4759 |         #
4760 |         #elif section_alignment < 0x80: 
4761 |         #    section_alignment = 0x80
4762 |             
4763 |         if section_alignment and val % section_alignment:
4764 |             return section_alignment * ( val / section_alignment )
4765 |         return val
4766 | 
4767 | 


--------------------------------------------------------------------------------
/modules/unknown_blacklist.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/unknown_blacklist.txt


--------------------------------------------------------------------------------
/modules/unknown_regexblacklist.txt:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Xen0ph0n/YaraGenerator/48f529f0d85e7fff62405d9367901487e29aa28f/modules/unknown_regexblacklist.txt


--------------------------------------------------------------------------------
/yaraGenerator.py:
--------------------------------------------------------------------------------
  1 | #! /usr/bin/python
  2 | # YaraGenerator Will Automatically Build Yara Rules For Malware Families
  3 | # As of Yet this Only Works Well With Executables
  4 | # Copyright 2013 Chris Clark chris@xenosec.org
  5 | # Released under GPL3 Licence
  6 | 
  7 | import re, sys, os, argparse, hashlib, random, email
  8 | from datetime import datetime
  9 | 
 10 | #Ensure Import path is in syspath
 11 | pathname = os.path.abspath(os.path.dirname(sys.argv[0]))
 12 | sys.path.append(pathname + '/modules')
 13 | 
 14 | 
 15 | #Make sure requred imports are present
 16 | try:
 17 |   import pefile
 18 | except:
 19 |   print "[!] PEfile not installed or present in ./modules directory"
 20 |   sys.exit(1) 
 21 | 
 22 | def getFiles(workingdir):
 23 |   global hashList
 24 |   fileDict = {}
 25 |   hashList = [] 
 26 |   #get hashes
 27 |   for f in os.listdir(workingdir):
 28 |     if os.path.isfile(workingdir + f) and not f.startswith("."):
 29 |      fhash = md5sum(workingdir + f)
 30 |      fileDict[fhash] = workingdir + f
 31 |      hashList.append(fhash)
 32 |   if len(fileDict) == 0:
 33 |     print "[!] No Files Present in \"" + workingdir +"\"" 
 34 |     sys.exit(1) 
 35 |   else: 
 36 |     return fileDict
 37 | 
 38 | 
 39 | #Use PEfile for executables and remove import/api calls from sigs
 40 | def exeImportsFuncs(filename, allstrings):
 41 |     try:
 42 |         pe = pefile.PE(filename)
 43 |         importlist = []
 44 |         for entry in pe.DIRECTORY_ENTRY_IMPORT: 
 45 |           importlist.append(entry.dll)
 46 |           for imp in entry.imports:
 47 |             importlist.append(imp.name)
 48 |         for imp in importlist:
 49 |           if imp in allstrings: allstrings.remove(imp)
 50 |         if len(allstrings) > 0:
 51 |           return list(set(allstrings))
 52 |         else:
 53 |           print '[!] No Extractable Attributes Present in Hash: '+str(md5sum(filename)) + ' Please Remove it from the Sample Set and Try Again!'
 54 |           sys.exit(1) 
 55 |     except:  
 56 |         return allstrings
 57 | 
 58 | 
 59 | #EML File parsing, and comparision based on dictionary entries .... plus regexes looking for domains/links in text/html
 60 | def emailParse(filename):
 61 |     try:
 62 |       def emailStrings(text):
 63 |         #same as normal string extract except for " " so each word will be isolated, and nuking <>,. to excude HTML tags and punctuation
 64 |         chars = r"A-Za-z0-9/\-:_$%@'()\\\{\};\]\["
 65 |         regexp = '[%s]{%d,100}' % (chars, 6)
 66 |         pattern = re.compile(regexp)
 67 |         strlist = pattern.findall(text)
 68 |         return strlist
 69 | 
 70 |       uselesskeys = ['DKIM-Signature', 'X-SENDER-REPUTATION', 'References', 'To', 'Delivered-To', 'Received','Message-ID', 'MIME-Version','In-Reply-To', 'Date', 'Content-Type', 'X-Original-To']
 71 |       emailfile = open(filename, 'r')
 72 |       msg = email.message_from_file(emailfile)
 73 |       emaildict = dict(msg.items())
 74 |       if len(emaildict) == 0:
 75 |         print '[!] This File is not an EML File: '+str(md5sum(filename)) + ' Please Remove it from the Sample Set or Select Proper FileType!'
 76 |         sys.exit(1) 
 77 |       for uselesskey in uselesskeys:
 78 |         if uselesskey in emaildict:
 79 |           del emaildict[uselesskey]
 80 |       emaillist = []
 81 |       for part in msg.walk():
 82 |         part_ct = str(part.get_content_type())
 83 |         if "plain" in part_ct:
 84 |           bodyplain = part.get_payload(decode=True)
 85 |     #      emaildict['Body-Plaintxt'] = list(set(emailStrings(bodyplain)))
 86 |           textlinks = linkSearch(bodyplain)  
 87 |           if textlinks:
 88 |             emaildict['Body-Links'] = textlinks
 89 |         if "html" in part_ct:
 90 |           bodyhtml = part.get_payload(decode=True)
 91 |    #       emaildict['Body-HTML'] = list(set(emailStrings(bodyhtml))) 
 92 |           htmllinks = linkSearch(bodyhtml) 
 93 |           if htmllinks:                 
 94 |            emaildict['Body-Links'] = htmllinks
 95 |         if "application" in part_ct:
 96 |           if part.get_filename():       
 97 |             emaildict['attachmentName'] = part.get_filename()
 98 |       for key, value in emaildict.iteritems():
 99 |           if isinstance(value, list):
100 |             for subval in value:
101 |               emaillist.append(subval)
102 |           else:
103 |             emaillist.append(value)
104 |       return emaillist 
105 |     except Exception:
106 |         print '[!] This File is not an EML File: '+str(md5sum(filename)) + ' Please Remove it from the Sample Set or Select Proper FileType!'
107 |         sys.exit(1) 
108 | 
109 | def linkSearch(attachment):
110 |                 urls = list(set(re.compile('(?:ftp|hxxp)[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', re.I).findall(attachment)))
111 |                 return urls
112 | 
113 | #Simple String / ASCII Wide and URL string extraction 
114 | def getStrings(filename):
115 |   try:
116 |     data = open(filename,'rb').read()
117 |     chars = r"A-Za-z0-9/\-:.,_$%@'()\\\{\};\]\[<> "
118 |     regexp = '[%s]{%d,100}' % (chars, 6)
119 |     pattern = re.compile(regexp)
120 |     strlist = pattern.findall(data)
121 |     #Get Wide Strings
122 |     unicode_str = re.compile( ur'(?:[\x20-\x7E][\x00]){6,100}',re.UNICODE ) 
123 |     unicodelist = unicode_str.findall(data) 
124 |     allstrings = unicodelist + strlist
125 |     #Extract URLs if present
126 |     exeurls = linkSearch(data)
127 |     if exeurls:
128 |       for url in exeurls:
129 |         allstrings.append(url)
130 |     # use pefile to extract names of imports and function calls and remove them from string list
131 |     if len(allstrings) > 0:
132 |         return list(set(allstrings))
133 |     else:
134 |       print '[!] No Extractable Attributes Present in Hash: '+str(md5sum(filename)) + ' Please Remove it from the Sample Set and Try Again!'
135 |       sys.exit(1) 
136 |   except Exception:
137 |     print '[!] No Extractable Attributes Present in Hash: '+str(md5sum(filename)) + ' Please Remove it from the Sample Set and Try Again!'
138 |     sys.exit(1)
139 | 
140 | def md5sum(filename):
141 |   fh = open(filename, 'rb')
142 |   m = hashlib.md5()
143 |   while True:
144 |       data = fh.read(8192)
145 |       if not data:
146 |           break
147 |       m.update(data)
148 |   return m.hexdigest() 
149 | 
150 | 
151 | #find common strings and check against filetype specific blacklists
152 | def findCommonStrings(fileDict, filetype):
153 |   baseStringList = random.choice(fileDict.values())
154 |   finalStringList = []
155 |   matchNumber = len(fileDict)
156 |   for s in baseStringList:
157 |   	sNum = 0
158 |   	for key, value in fileDict.iteritems():
159 |   		if s in value:
160 |   			sNum +=1
161 |   	if sNum == matchNumber:
162 |   		finalStringList.append(s)
163 | 
164 |   #import and use filetype specific blacklist/regexlist to exclude unwanted sig material
165 |   #Various utility functions to extract strings/data/info and isolate signature material
166 |   with open(pathname +'/modules/'+filetype+'_blacklist.txt') as f:
167 |     blacklist = f.read().splitlines()
168 |   with open(pathname +'/modules/'+filetype+'_regexblacklist.txt') as f:
169 |     regblacklist = f.read().splitlines()
170 |   #Match Against Blacklist
171 |   for black in blacklist:
172 |     if black in finalStringList: finalStringList.remove(black)
173 |   #Match Against Regex Blacklist
174 |   regmatchlist = []
175 |   for regblack in regblacklist:
176 |     for string in finalStringList:
177 |       regex = re.compile(regblack) 
178 |       if regex.search(string): regmatchlist.append(string)
179 |   if len(regmatchlist) > 0:
180 |     for match in list(set(regmatchlist)):
181 |       finalStringList.remove(match)
182 | 
183 |   return finalStringList
184 | 
185 | #Build the actual rule
186 | def buildYara(options, strings, hashes):
187 |   date = datetime.now().strftime("%Y-%m-%d")
188 |   randStrings = []
189 |   #Ensure we have shared attributes and select twenty
190 |   try:
191 |     for i in range(1,20):
192 |   	 randStrings.append(random.choice(strings))
193 |   except IndexError:
194 |     print '[!] No Common Attributes Found For All Samples, Please Be More Selective'
195 |     sys.exit(1)
196 | 
197 |   #Prioritize based on specific filetype
198 |   if options.FileType == 'email':
199 |     for string in strings:
200 |       if "@" in string:
201 |         randStrings.append(string)
202 |       if "." in string:
203 |         randStrings.append(string)
204 | 
205 |   #Remove Duplicates
206 |   randStrings = list(set(randStrings))
207 | 
208 |   ruleOutFile = open(options.RuleName + ".yar", "w")
209 |   ruleOutFile.write("rule "+options.RuleName)
210 |   if options.Tags:
211 |     ruleOutFile.write(" : " + options.Tags)
212 |   ruleOutFile.write("\n")
213 |   ruleOutFile.write("{\n")
214 |   ruleOutFile.write("meta:\n")
215 |   ruleOutFile.write("\tauthor = \""+ options.Author + "\"\n")
216 |   ruleOutFile.write("\tdate = \""+ date +"\"\n")
217 |   ruleOutFile.write("\tdescription = \""+ options.Description + "\"\n")
218 |   for h in hashes:
219 |   	ruleOutFile.write("\thash"+str(hashes.index(h))+" = \""+ h + "\"\n")
220 |   ruleOutFile.write("\tsample_filetype = \""+ options.FileType + "\"\n")
221 |   ruleOutFile.write("\tyaragenerator = \"https://github.com/Xen0ph0n/YaraGenerator\"\n")
222 |   ruleOutFile.write("strings:\n")
223 |   for s in randStrings:
224 |     if "\x00" in s:
225 |       ruleOutFile.write("\t$string"+str(randStrings.index(s))+" = \""+ s.replace("\\","\\\\").replace('"','\\"').replace("\x00","") +"\" wide\n")
226 |     else:  
227 |       ruleOutFile.write("\t$string"+str(randStrings.index(s))+" = \""+ s.replace("\\","\\\\") +"\"\n")
228 |   ruleOutFile.write("condition:\n")
229 |   if options.FileType == 'email':
230 |     ruleOutFile.write("\t any of them\n")
231 |   else:
232 |     ruleOutFile.write("\t"+str(len(randStrings) - 1)+" of them\n")
233 |   ruleOutFile.write("}\n")
234 |   ruleOutFile.close()
235 |   return
236 | 
237 | #Per filetype execution paths
238 | def unknownFile(fileDict):
239 |   #Unknown is the default and will mirror executable excepting the blacklist
240 |   for fhash, path in fileDict.iteritems():
241 |     fileDict[fhash] = getStrings(path)
242 |   finalStringList = findCommonStrings(fileDict, 'unknown')
243 |   return finalStringList
244 | 
245 | def exeFile(fileDict):
246 |   for fhash, path in fileDict.iteritems():
247 |     fileDict[fhash] = exeImportsFuncs(path, getStrings(path))
248 |   finalStringList = findCommonStrings(fileDict, 'exe')
249 |   return finalStringList
250 | 
251 | def pdfFile(fileDict):
252 |   for fhash, path in fileDict.iteritems():
253 |     fileDict[fhash] = getStrings(path)
254 |   finalStringList = findCommonStrings(fileDict, 'pdf')
255 |   return finalStringList
256 | 
257 | def emailFile(fileDict):
258 |   for fhash, path in fileDict.iteritems():
259 |     fileDict[fhash] = emailParse(path)
260 |   finalStringList = findCommonStrings(fileDict, 'email')
261 |   return finalStringList
262 | 
263 | def officeFile(fileDict):
264 |   for fhash, path in fileDict.iteritems():
265 |     fileDict[fhash] = getStrings(path)
266 |   finalStringList = findCommonStrings(fileDict, 'office')
267 |   return finalStringList
268 | 
269 | def jshtmlFile(fileDict):
270 |   for fhash, path in fileDict.iteritems():
271 |     fileDict[fhash] = getStrings(path)
272 |   finalStringList = findCommonStrings(fileDict, 'jshtml')
273 |   return finalStringList
274 | 
275 | #Main
276 | def main():
277 |   filetypeoptions = ['unknown','exe','pdf','email','office','js-html']
278 |   opt = argparse.ArgumentParser(description="YaraGenerator")
279 |   opt.add_argument("InputDirectory", help="Path To Files To Create Yara Rule From")
280 |   opt.add_argument("-r", "--RuleName", required=True , help="Enter A Rule/Alert Name (No Spaces + Must Start with Letter)")
281 |   opt.add_argument("-a", "--Author", default="Anonymous", help="Enter Author Name")
282 |   opt.add_argument("-d", "--Description",default="No Description Provided",help="Provide a useful description of the Yara Rule")
283 |   opt.add_argument("-t", "--Tags",default="",help="Apply Tags to Yara Rule For Easy Reference (AlphaNumeric)")
284 |   opt.add_argument("-v", "--Verbose",default=False,action="store_true", help= "Print Finished Rule To Standard Out")
285 |   opt.add_argument("-f", "--FileType", required=True, default='unknown',choices=filetypeoptions, help= "Select Sample Set FileType choices are: "+', '.join(filetypeoptions), metavar="")
286 |   if len(sys.argv)<=3:
287 |     opt.print_help()
288 |     sys.exit(1)
289 |   options = opt.parse_args()
290 |   if " " in options.RuleName or not options.RuleName[0].isalpha():
291 |   	print "[!] Rule Name Can Not Contain Spaces or Begin With A Non Alpha Character"
292 | 
293 | 
294 |   #Get Filenames and hashes
295 |   fileDict = getFiles(options.InputDirectory)
296 |   print "\n[+] Generating Yara Rule " + options.RuleName + " from files located in: " + options.InputDirectory
297 |   
298 |   #Begin per-filetype processing paths
299 |   if options.FileType == 'exe':
300 |     finalStringList = exeFile(fileDict)
301 |   elif options.FileType == 'pdf':
302 |     finalStringList = pdfFile(fileDict)
303 |   elif options.FileType == 'email':
304 |     finalStringList = emailFile(fileDict)
305 |   elif options.FileType == 'office':
306 |     finalStringList = officeFile(fileDict)
307 |   elif options.FileType == 'js-html':
308 |     finalStringList = jshtmlFile(fileDict)
309 |   else:
310 |     finalStringList = unknownFile(fileDict)
311 | 
312 |   #Build and Write Yara Rule
313 |   global hashList
314 |   buildYara(options, finalStringList, hashList)
315 |   print "\n[+] Yara Rule Generated: "+options.RuleName+".yar\n"
316 |   print "  [+] Files Examined: " + str(hashList)
317 |   print "  [+] Author Credited: " + options.Author
318 |   print "  [+] Rule Description: " + options.Description 
319 |   if options.Tags:
320 |     print "  [+] Rule Tags: " + options.Tags +"\n"
321 |   if options.Verbose:
322 |     print "[+] Rule Below:\n"
323 |     with open(options.RuleName + ".yar", 'r') as donerule:
324 |       print donerule.read()
325 | 
326 |   print "[+] YaraGenerator (C) 2013 Chris@xenosec.org https://github.com/Xen0ph0n/YaraGenerator"
327 | 
328 | 
329 | if __name__ == "__main__":	
330 | 	main()


--------------------------------------------------------------------------------