├── README.md ├── apihashes.db └── psx.py /README.md: -------------------------------------------------------------------------------- 1 | # PyPowerShellXray 2 | Python script to decode common encoded PowerShell scripts. 3 | 4 | Hope you find it helpful! 5 | 6 | Even more hacked together by @JohnLaTwC, Nov 2016 7 | v 0.6 8 | With apologies to @Lee_Holmes for using Python instead of PowerShell. In decoding so much PowerShell, I didn't want to risk a self-infection :) 9 | 10 | This script attempts to decode encoded powershell commands. 11 | REQUIREMENTS: This script uses vivisect for PE parsing and dissasembly: https://github.com/vivisect/vivisect. Set the PYTHONPATH as appropriate. 12 | e.g. set pythonpath=C:\vivisect-master\vivisect-master 13 | 14 | Things this script tries to do. Emphasis on tries. 15 | * It attempts to decode recusively if instructed (via the -r switch) 16 | * It attempts to find Base64 data, compressed content (Gzip, Deflate), or char[]](77,105,95) style encoding 17 | * It attempts to 'find/replace' the encoded text in the powershell command. This is handy 18 | if the script has numerous chunks of encoded content 19 | * If it finds shellcode, it attempts to display it. LIMITATION: x86 shellcode only 20 | If you ever come across this sequence in PowerShell, you know you have shellcode 21 | 22 | ``` 23 | [Byte[]]$z = 0xb8,0x46,0x0f,0x64...REST OF SHELLCODE; 24 | ... 25 | $Nb7=$w::VirtualAlloc(0,0x1000,$g,0x40); 26 | ... 27 | $w::CreateThread(0,0,$Nb7,0,0,0); 28 | ``` 29 | With the shellcode it tries: 30 | - Resolve APIs. The APIs used by shellcode gives defenders a clue as to what to look for on host. 31 | e.g. if you calls to winsock/wininet/winhttp APIs, you know they connected to a URL or IP 32 | e.g. if you see a call to WinExec / CreateProcess, you know something was downloaded and spawned 33 | 34 | ``` 35 | push 0x0726774c << 0x0726774c is the hash of the API text "kernel32.dll!LoadLibraryA" 36 | call ebp --> kernel32.dll!LoadLibraryA 37 | ``` 38 | 39 | Pretty sure @stephenfewer came up with blockhash in https://github.com/rapid7/metasploit-framework/blob/master/external/source/shellcode/windows/x86/src/block/block_api.asm 40 | Rather than have a hardcoded list of API hashes, it build a dictionary based on your local binaries. 41 | This means the script requires Windows as the underlying OS to do this. 42 | - Display ascii text for DWORD constants to assist decoding. 43 | e.g. the below shows the encoding of ws2_32 [.dll] before a call to LoadLibrary 44 | 45 | ``` 46 | push 0x00003233--> '23' 47 | push 0x5f327377--> '_2sw' 48 | push esp 49 | push 0x0726774c--> '&wL' << garbage. this is just the API hash for 'kernel32.dll!LoadLibraryA' 50 | call ebp --> kernel32.dll!LoadLibraryA 51 | ``` 52 | - Display IP:port for calls to socket/Internet APIs 53 | 54 | ``` 55 | push 0x68bff1c0 56 | push 0xbb010002--> IP 192.241.191.104:443 57 | ``` 58 | - Display a hex dump to look for strings 59 | - Decode some encoded shellcode. Shellcode is often encoded. A common one is shikata_ga_nai. 60 | You can disable this behavior by the -nx switch 61 | Here is an example of the shikata encoder in action: 62 | 63 | ``` 64 | 0x00000000 b8460f64cf mov eax,0xcf640f46 << 4byte XOR key 65 | 0x00000005 dbcf fcmovne st0,st7 << execute any floating point operation to set up GetPC 66 | 0x00000007 d97424f4 fnstenv [esp - 12] << stores floating point state 67 | 0x0000000b 5d pop ebp << GetPC: pop addr of last FP instr into ebp 68 | 0x0000000c 29c9 sub ecx,ecx 69 | 0x0000000e b147 mov cl,71 << 71 DWORD to decode 70 | 0x00000010 314513 xor dword [ebp + 19],eax << start of XOR decode loop 71 | 0x00000013 83edfc sub ebp,0xfffffffc << increment counter by 4 72 | 0x00000016 034549 add eax,dword [ebp + 73] << partial garbage instruction 73 | 0x00000019 ed in eax,dx << garbage b/c it's encoded 74 | 0x0000001a 91 ... garbarge bytes continue 75 | ``` 76 | 77 | Post decode you get something like: 78 | 79 | ``` 80 | 0x00000010 314513 xor dword [ebp + 19],eax 81 | 0x00000013 83edfc sub ebp,0xfffffffc 82 | 0x00000016 03450f add eax,dword [ebp + 15] << pre decode this was: add eax,dword [ebp + 73] 83 | 0x00000019 e2f5 loop 0x00000010 << the expected loop operation. 71 times 84 | 0x0000001b fc cld ... decoded content. it's now valid shellcode 85 | 0x0000001c e882000000 call 0x000000a3 86 | 0x00000021 60 pushad 87 | 0x00000022 89e5 mov ebp,esp 88 | 0x00000024 31c0 xor eax,eax 89 | 0x00000026 648b5030 fs: mov edx,dword [eax + 48] 90 | 0x0000002a 8b520c mov edx,dword [edx + 12] 91 | 0x0000002d 8b5214 mov edx,dword [edx + 20] 92 | 0x00000030 8b7228 mov esi,dword [edx + 40] 93 | 0x00000033 0fb74a26 movzx ecx,word [edx + 38] 94 | ... 95 | ``` 96 | A real programmer would use an emulator (libemu). Not this script 97 | 98 | "I'm running this on Linux or Mac and don't have the Windows DLLs around to get the hashes for API resolution. Help!" I added a database (sqlite) of common APIs & hashes. Give it a whirl like so: 99 | 100 | ``` 101 | python.exe psx.py -f test1.txt -db apihashes.db 102 | Hex dump: 60 9c 54 5e fc e8 82 00 00 00 60 89 e5 31 c0 64 8b 50 30 8b 52 0c 8b 52 14 8b 72 28 0f b7 4a 26 31 ff ac 3c 61 7c 02 2c 20 c1 cf 0d 01 c7 e2 f2 52 57 8b 52 10 8b 4a 3c 8b 4c 11 78 e3 48 01 d1 51 8b 59 20 01 d3 8b 49 18 e3 3a 49 8b 34 8b 01 d6 31 ff ac c1 cf 0d 01 c7 38 e0 75 f6 03 7d f8 3b 7d 24 75 e4 58 8b 58 24 01 d3 66 8b 0c 4b 8b 58 1c 01 d3 8b 04 8b 01 d0 89 44 24 24 5b 5b 61 59 5a 51 ff e0 5f 5f 5a 8b 12 eb 8d 5d 68 49 47 c6 62 ff d5 50 6a 00 68 ff 0f 1f 00 68 ee 95 b6 50 ff d5 89 c3 6a 00 68 70 69 33 32 68 61 64 76 61 54 68 4c 77 26 07 ff d5 68 44 3a 50 00 83 ec 04 6a 00 8d 44 24 04 50 6a 01 8d 44 24 10 50 68 9a 63 6f da ff d5 6a 04 53 68 db f8 3a d6 ff d5 89 f4 61 9d c3 103 | 0x00000000 60 pushad 104 | 0x00000001 9c pushfd 105 | 0x00000002 54 push esp 106 | 0x00000003 5e pop esi 107 | 0x00000004 fc cld 108 | 0x00000005 e882000000 call 0x0000008c 109 | 0x0000000a 60 pushad 110 | 0x0000000b 89e5 mov ebp,esp 111 | 0x0000000d 31c0 xor eax,eax 112 | 0x0000000f 648b5030 fs: mov edx,dword [eax + 48] 113 | 0x00000013 8b520c mov edx,dword [edx + 12] 114 | 0x00000016 8b5214 mov edx,dword [edx + 20] 115 | 0x00000019 8b7228 mov esi,dword [edx + 40] 116 | 0x0000001c 0fb74a26 movzx ecx,word [edx + 38] 117 | 0x00000020 31ff xor edi,edi 118 | 0x00000022 ac lodsb 119 | 0x00000023 3c61 cmp al,97 120 | 0x00000025 7c02 jl 0x00000029 121 | 0x00000027 2c20 sub al,32 122 | 0x00000029 c1cf0d ror edi,13 123 | 0x0000002c 01c7 add edi,eax 124 | 0x0000002e e2f2 loop 0x00000022 125 | 0x00000030 52 push edx 126 | 0x00000031 57 push edi 127 | 0x00000032 8b5210 mov edx,dword [edx + 16] 128 | 0x00000035 8b4a3c mov ecx,dword [edx + 60] 129 | 0x00000038 8b4c1178 mov ecx,dword [ecx + edx + 120] 130 | 0x0000003c e348 jecxz 0x00000086 131 | 0x0000003e 01d1 add ecx,edx 132 | 0x00000040 51 push ecx 133 | 0x00000041 8b5920 mov ebx,dword [ecx + 32] 134 | 0x00000044 01d3 add ebx,edx 135 | 0x00000046 8b4918 mov ecx,dword [ecx + 24] 136 | 0x00000049 e33a jecxz 0x00000085 137 | 0x0000004b 49 dec ecx 138 | 0x0000004c 8b348b mov esi,dword [ebx + ecx * 4] 139 | 0x0000004f 01d6 add esi,edx 140 | 0x00000051 31ff xor edi,edi 141 | 0x00000053 ac lodsb 142 | 0x00000054 c1cf0d ror edi,13 143 | 0x00000057 01c7 add edi,eax 144 | 0x00000059 38e0 cmp al,ah 145 | 0x0000005b 75f6 jnz 0x00000053 146 | 0x0000005d 037df8 add edi,dword [ebp - 8] 147 | 0x00000060 3b7d24 cmp edi,dword [ebp + 36] 148 | 0x00000063 75e4 jnz 0x00000049 149 | 0x00000065 58 pop eax 150 | 0x00000066 8b5824 mov ebx,dword [eax + 36] 151 | 0x00000069 01d3 add ebx,edx 152 | 0x0000006b 668b0c4b mov cx,word [ebx + ecx * 2] 153 | 0x0000006f 8b581c mov ebx,dword [eax + 28] 154 | 0x00000072 01d3 add ebx,edx 155 | 0x00000074 8b048b mov eax,dword [ebx + ecx * 4] 156 | 0x00000077 01d0 add eax,edx 157 | 0x00000079 89442424 mov dword [esp + 36],eax 158 | 0x0000007d 5b pop ebx 159 | 0x0000007e 5b pop ebx 160 | 0x0000007f 61 popad 161 | 0x00000080 59 pop ecx 162 | 0x00000081 5a pop edx 163 | 0x00000082 51 push ecx 164 | 0x00000083 ffe0 jmp eax 165 | 0x00000085 5f pop edi 166 | 0x00000086 5f pop edi 167 | 0x00000087 5a pop edx 168 | 0x00000088 8b12 mov edx,dword [edx] 169 | 0x0000008a eb8d jmp 0x00000019 170 | 0x0000008c 5d pop ebp 171 | 0x0000008d 684947c662 push 0x62c64749--> 'bGI' 172 | 0x00000092 ffd5 call ebp --> kernel32.dll!GetCurrentProcessId 173 | 0x00000094 50 push eax 174 | 0x00000095 6a00 push 0 175 | 0x00000097 68ff0f1f00 push 0x001f0fff 176 | 0x0000009c 68ee95b650 push 0x50b695ee 177 | 0x000000a1 ffd5 call ebp --> kernel32.dll!OpenProcess 178 | 0x000000a3 89c3 mov ebx,eax 179 | 0x000000a5 6a00 push 0 180 | 0x000000a7 6870693332 push 0x32336970--> '23ip' 181 | 0x000000ac 6861647661 push 0x61766461--> 'avda' 182 | 0x000000b1 54 push esp 183 | 0x000000b2 684c772607 push 0x0726774c--> '&wL' 184 | 0x000000b7 ffd5 call ebp --> kernel32.dll!LoadLibraryA 185 | 0x000000b9 68443a5000 push 0x00503a44--> 'P:D' 186 | 0x000000be 83ec04 sub esp,4 187 | 0x000000c1 6a00 push 0 188 | 0x000000c3 8d442404 lea eax,dword [esp + 4] 189 | 0x000000c7 50 push eax 190 | 0x000000c8 6a01 push 1 191 | 0x000000ca 8d442410 lea eax,dword [esp + 16] 192 | 0x000000ce 50 push eax 193 | 0x000000cf 689a636fda push 0xda6f639a--> 'oc' 194 | 0x000000d4 ffd5 call ebp --> advapi32.dll!ConvertStringSecurityDescriptorToSecurityDescriptorA 195 | 0x000000d6 6a04 push 4 196 | 0x000000d8 53 push ebx 197 | 0x000000d9 68dbf83ad6 push 0xd63af8db 198 | 0x000000de ffd5 call ebp --> advapi32.dll!SetKernelObjectSecurity 199 | 0x000000e0 89f4 mov esp,esi 200 | 0x000000e2 61 popad 201 | 0x000000e3 9d popfd 202 | 0x000000e4 c3 ret 203 | 204 | Byte Dump: 205 | `.T^......`..1.d.P0.R.R..r(..J&1.. kernel32.dll!LoadLibraryA 34 | ## pretty sure @stephenfewer came up with blockhash in https://github.com/rapid7/metasploit-framework/blob/master/external/source/shellcode/windows/x86/src/block/block_api.asm 35 | ## Rather than have a hardcoded list of API hashes, it build a dictionary based on your local binaries. 36 | ## This means the script requires Windows as the underlying OS to do this. 37 | ## 38 | ## - Display ascii text for DWORD constants to assist decoding. 39 | ## e.g. the below shows the encoding of ws2_32 [.dll] before a call to LoadLibrary 40 | ## push 0x00003233--> '23' 41 | ## push 0x5f327377--> '_2sw' 42 | ## push esp 43 | ## push 0x0726774c--> '&wL' << garbage. this is just the API hash for 'kernel32.dll!LoadLibraryA' 44 | ## call ebp --> kernel32.dll!LoadLibraryA 45 | ## - Display IP:port for calls to socket/Internet APIs 46 | ## push 0x68bff1c0 47 | ## push 0xbb010002--> IP 192.241.191.104:443 48 | ## - Display a hex dump to look for strings 49 | ## - Decode some encoded shellcode. Shellcode is often encoded. A common one is shikata_ga_nai. 50 | ## You can disable this behavior by the -nx switch 51 | ## Here is an example of the shikata encoder in action: 52 | ## 0x00000000 b8460f64cf mov eax,0xcf640f46 << 4byte XOR key 53 | ## 0x00000005 dbcf fcmovne st0,st7 << execute any floating point operation to set up GetPC 54 | ## 0x00000007 d97424f4 fnstenv [esp - 12] << stores floating point state 55 | ## 0x0000000b 5d pop ebp << GetPC: pop addr of last FP instr into ebp 56 | ## 0x0000000c 29c9 sub ecx,ecx 57 | ## 0x0000000e b147 mov cl,71 << 71 DWORD to decode 58 | ## 0x00000010 314513 xor dword [ebp + 19],eax << start of XOR decode loop 59 | ## 0x00000013 83edfc sub ebp,0xfffffffc << increment counter by 4 60 | ## 0x00000016 034549 add eax,dword [ebp + 73] << partial garbage instruction 61 | ## 0x00000019 ed in eax,dx << garbage b/c it's encoded 62 | ## 0x0000001a 91 ... garbarge bytes continue 63 | ## 64 | ## Post decode you get something like: 65 | ## 0x00000010 314513 xor dword [ebp + 19],eax 66 | ## 0x00000013 83edfc sub ebp,0xfffffffc 67 | ## 0x00000016 03450f add eax,dword [ebp + 15] << pre decode this was: add eax,dword [ebp + 73] 68 | ## 0x00000019 e2f5 loop 0x00000010 << the expected loop operation. 71 times 69 | ## 0x0000001b fc cld ... decoded content. it's now valid shellcode 70 | ## 0x0000001c e882000000 call 0x000000a3 71 | ## 0x00000021 60 pushad 72 | ## 0x00000022 89e5 mov ebp,esp 73 | ## 0x00000024 31c0 xor eax,eax 74 | ## 0x00000026 648b5030 fs: mov edx,dword [eax + 48] 75 | ## 0x0000002a 8b520c mov edx,dword [edx + 12] 76 | ## 0x0000002d 8b5214 mov edx,dword [edx + 20] 77 | ## 0x00000030 8b7228 mov esi,dword [edx + 40] 78 | ## 0x00000033 0fb74a26 movzx ecx,word [edx + 38] 79 | ## ... 80 | ## A real programmer would use an emulator (libemu). Not this script 81 | 82 | 83 | import sys 84 | import zlib 85 | import re 86 | import argparse 87 | import string 88 | from envi.archs.i386 import i386Disasm 89 | 90 | MAX_DISTANCE_FROM_KEYWORD = 100 91 | 92 | szDbPath = None 93 | fDbLoaded = False 94 | fVerbose = False 95 | APIDict = {} 96 | fDecodeShellcode = True 97 | dis = i386Disasm() 98 | 99 | ror = lambda val, r_bits, max_bits: \ 100 | ((val & (2**max_bits-1)) >> r_bits%max_bits) | \ 101 | (val << (max_bits-(r_bits%max_bits)) & (2**max_bits-1)) 102 | 103 | def hashapi(sz): 104 | val = 0 105 | for a in sz: 106 | val = ror(val, 0xd, 32) 107 | val = val + ord(a) 108 | return val 109 | 110 | def blockhash(szDll, szAPI): 111 | from array import array 112 | sz = unicode(szDll.upper() + '\0') 113 | szEncDll = sz.encode("utf-16") 114 | szEncAPI = szAPI.encode("ascii") + '\0' 115 | 116 | iDll = hashapi(szEncDll[2:]) 117 | iAPI = hashapi(szEncAPI) 118 | return 0x0000FFFFFFFF & (iDll+iAPI) 119 | 120 | 121 | ## This function makes the script Windows specific. It expect Windows binaries and uses them 122 | ## to build up a dictionary of API hashes. One could fix this by doing this step on a 123 | ## Windows PC and then storing the API hashes in file 124 | ## Sept 2017: support the ability to load from a DB 125 | def PopulateExports(APIDict, szDll): 126 | global fVerbose 127 | from PE import PE 128 | import os 129 | fd = open(os.environ['SYSTEMROOT']+ '\\System32\\' + szDll, 'rb') 130 | pe = PE(fd) 131 | for exp in pe.getExports(): 132 | szAPI = exp[2] 133 | szHash = "0x%08x"%(blockhash(szDll, szAPI)) 134 | APIDict[szHash] = szDll + "!" + szAPI 135 | if (fVerbose): 136 | print("INSERT INTO APIs (module, api,hashvalue) VALUES('%s','%s','%s')" % (szDll, szAPI, szHash)) 137 | 138 | ## example: 139 | ## 0x00000000 b9c7060000 mov ecx,1735 140 | ## 0x00000005 e8ffffffff call 0x00000009 141 | ## 0x0000000a c15e304c rcr dword [esi + 48],76 142 | ## 0x0000000e 0e push cs 143 | ## 0x0000000f 07 pop es 144 | ## 0x00000010 e2fa loop 0x0000000c 145 | ## 0x00000012 b8b7050405 mov eax,0x050405b7 146 | def decode_call_to_self(d, all_instr_list): 147 | ## verify some bytes first 148 | import array 149 | sd = array.array('B', d) 150 | szd = None 151 | 152 | #look for mov and call to self after a min number of instructions 153 | if len(all_instr_list) < 10: 154 | return None 155 | 156 | fFoundMov = False 157 | fFoundCounter = False 158 | fFoundCallToSelf = False 159 | iLen = 0 160 | iCallOffset = 0 161 | szMsg = 'No decoder found' 162 | for i in range(0, 2): 163 | instr_lst = all_instr_list[i] 164 | szInsBytes = instr_lst[1] 165 | szIns = instr_lst[2] 166 | offset = instr_lst[3] 167 | # e8ffffffff call 0x00000009 168 | if szInsBytes == "e8ffffffff": 169 | fFoundCallToSelf = True 170 | iCallOffset = offset + 5 171 | # mov ecx,1735 172 | if szIns.startswith('mov ') and szIns.find('ecx,') > 0: 173 | fFoundCounter = True 174 | iLen = int(szIns.split(',')[1]) 175 | if (fFoundCallToSelf and fFoundCounter and iLen > 0): 176 | szMsg = "Found call_to_self shellcode len = %d, decode offset= %d" % (iLen, iCallOffset) 177 | szd = [] 178 | for i in range(0,iCallOffset): 179 | szd.append(chr(sd[i])) 180 | szd.append(chr(sd[iCallOffset - 1])) 181 | for i in range(iCallOffset,len(sd)-iCallOffset): 182 | szd.append(chr(sd[i])) 183 | return [''.join(szd), iLen, 0, iCallOffset, szMsg] 184 | 185 | return [None, 0, 0, 0, szMsg] 186 | 187 | ## Example shellcode 188 | ## 0x00000000 dbd3 fcmovnbe st0,st3 189 | ## 0x00000002 be1dd3f6b2 mov esi,0xb2f6d31d 190 | ## 0x00000007 d97424f4 fnstenv [esp - 12] 191 | ## 0x0000000b 5a pop edx 192 | ## 0x0000000c 33c9 xor ecx,ecx 193 | ## 0x0000000e b16e mov cl,110 194 | ## 0x00000010 83c204 add edx,4 195 | ## 0x00000013 317214 xor dword [edx + 20],esi 196 | ## 0x00000016 037209 add esi,dword [edx + 9] 197 | def decode_shikata_ga_nai(d, all_instr_list): 198 | ## verify some bytes first 199 | import array 200 | sd = array.array('B', d) 201 | szd = None 202 | 203 | #look for floating point instr, fnstenv, and mov in first few instr 204 | if len(all_instr_list) < 10: 205 | return None 206 | 207 | fFoundFnstenv = False 208 | fFoundFloatingPtInstr = False 209 | fFoundMov = False 210 | fFoundCounter = False 211 | fFoundXor = False 212 | iLen = 0 213 | key = 0 214 | szMsg = 'No decoder found' 215 | iXorOffset = 0 216 | iXorAdjust = 0 217 | iFPOpOffset = 0 218 | for i in range(0, 10): 219 | instr_lst = all_instr_list[i] 220 | szIns = instr_lst[2] 221 | offset = instr_lst[3] 222 | # fnstenv [esp - 12] 223 | if szIns.startswith('fnstenv'): 224 | fFoundFnstenv = True 225 | #fxch st0,st6 226 | if not fFoundFloatingPtInstr and not szIns.startswith('fnstenv') and szIns.startswith('f'): 227 | fFoundFloatingPtInstr = True 228 | iFPOpOffset = offset 229 | #xor dword [edx + 24],eax 230 | if szIns.startswith('sub ') and szIns.endswith('0xfffffffc'): 231 | iXorAdjust = -4 232 | if szIns.startswith('xor dword ['): 233 | fFoundXor = True 234 | iXorOffset = int((szIns.split('+')[1]).split(']')[0]) ##+ iXorAdjust 235 | #find key operation. e.g. add esi,dword [eax + 14] 236 | for j in range(1,3): 237 | keyop_instr_lst = all_instr_list[i+j] 238 | szKeyOpIns = keyop_instr_lst[2] 239 | if szKeyOpIns.startswith('add e'): 240 | szKeyOp = szKeyOpIns.split(' ')[0] 241 | istart = keyop_instr_lst[3] 242 | break 243 | # mov eax,0x4193fabc 244 | if szIns.startswith('mov ') and szIns.find('0x') > 0 and not fFoundMov: 245 | fFoundMov = True 246 | k1 = sd[offset + 0x1] 247 | k2 = sd[offset + 0x2] 248 | k3 = sd[offset + 0x3] 249 | k4 = sd[offset + 0x4] 250 | key = k1 | (k2 << 8) | (k3 << 16)| (k4 << 24) 251 | # mov cl,110 252 | if szIns.startswith('mov ') and szIns.find('cl,') > 0: 253 | fFoundCounter = True 254 | iLen = int(szIns.split(',')[1]) 255 | if (fFoundMov and fFoundFloatingPtInstr and fFoundFnstenv and fFoundCounter and iLen > 0): 256 | 257 | next_key_operation = d[istart: istart+3] 258 | 259 | szd = [] 260 | for i in range(0,iXorOffset + iFPOpOffset): 261 | szd.append(chr(sd[i])) 262 | 263 | for i in range(iXorOffset + iFPOpOffset,len(sd)-(iXorOffset + iFPOpOffset), 4): 264 | szd.append(chr(k1 ^ sd[i])) 265 | szd.append(chr(k2 ^ sd[i+1])) 266 | szd.append(chr(k3 ^ sd[i+2])) 267 | szd.append(chr(k4 ^ sd[i+3])) 268 | data = k1^sd[i] | ((k2^sd[i+1]) << 8) | ((k3^sd[i+2]) << 16) | ((k4^sd[i+3]) << 24) 269 | 270 | #update the key based on the shikata rules 271 | if szKeyOp == "add": 272 | key = (key + data) & 0x00000000FFFFFFFF 273 | else: 274 | key = (key + data) & 0x00000000FFFFFFFF 275 | pass # error case 276 | 277 | k1 = 0x000000FF & key 278 | k2 = (0x0000FF00 & key) >> 8 279 | k3 = (0x00FF0000 & key) >> 16 280 | k4 = (0xFF000000 & key) >> 24 281 | 282 | szd = ''.join(szd) 283 | 284 | op = dis.disasm(szd, istart, istart) 285 | szIns = repr(op).lower() 286 | szKeyOp = szIns.split(' ')[0] 287 | # szOffsetDirection = szIns.split(' ')[3] 288 | # cOffset = int((szIns.split(' ')[4]).split(']')[0]) 289 | szMsg = "Found shikata_ga_nai shellcode len = %d, key = 0x%x, decode offset= %d, fpop offset = %d, keyop= %s, istart=0x%x, '%s'" % (iLen, key, iXorOffset, iFPOpOffset, szKeyOp, istart, szIns) 290 | else: 291 | pass 292 | return [szd, iLen, key, iXorOffset, szMsg] 293 | 294 | def process_instructions_impl(d, offset, va): 295 | global dis 296 | instr_list = [] 297 | all_instr_list = [] 298 | final_offset_msg= '' 299 | while offset < len(d): 300 | op = None 301 | try: 302 | op = dis.disasm(d, offset, va+offset) 303 | szIns = repr(op).lower() 304 | instr_lst = ['0x%.8x' % (va+offset), 305 | '%s' % str(d[offset:offset+len(op)].encode('hex')), 306 | szIns, 307 | offset ] 308 | all_instr_list.append(instr_lst) 309 | offset += len(op) 310 | except Exception as e1: 311 | final_offset_msg = 'Decode error at offset 0x%x' % offset 312 | break 313 | return [all_instr_list, final_offset_msg] 314 | 315 | def process_instructions(d): 316 | return process_instructions_impl(d,0,0) 317 | 318 | def prepareAPIs(): 319 | global APIDict 320 | global szDbPath 321 | global fDbLoaded 322 | 323 | ## if APIs are being loaded from a DB, then do that now 324 | if (szDbPath is not None and not fDbLoaded): 325 | import sqlite3 326 | db = sqlite3.connect(szDbPath) 327 | cursor = db.cursor() 328 | cursor.execute('''SELECT module, api, hashvalue FROM APIs''') 329 | all_rows = cursor.fetchall() 330 | for row in all_rows: 331 | szHash = row[2] 332 | szDll = row[0] 333 | szAPI = row[1] 334 | APIDict[szHash] = szDll + "!" + szAPI 335 | db.close() 336 | fDbLoaded = True 337 | else: 338 | PopulateExports(APIDict, 'kernel32.dll') 339 | PopulateExports(APIDict, 'ws2_32.dll') 340 | PopulateExports(APIDict, 'ole32.dll') 341 | PopulateExports(APIDict, 'ntdll.dll') 342 | PopulateExports(APIDict, 'advapi32.dll') 343 | PopulateExports(APIDict, 'urlmon.dll') 344 | PopulateExports(APIDict, 'winhttp.dll') 345 | PopulateExports(APIDict, 'wininet.dll') 346 | 347 | def dumpShellcode(d): 348 | global fDecodeShellcode 349 | global APIDict 350 | szOut = '' 351 | if len(APIDict) == 0: 352 | prepareAPIs() 353 | ## for szKey in APIDict.keys(): 354 | ## print ("%s %s" % (szKey, APIDict[szKey])) 355 | 356 | # set pythonpath=\vivisect 357 | szIns = szPrev = '' 358 | instr_list = [] 359 | 360 | outputparamlst = process_instructions(d) 361 | all_instr_list = outputparamlst[0] 362 | final_offset_msg = outputparamlst[1] 363 | 364 | if fDecodeShellcode: 365 | decoder_funcs = [decode_shikata_ga_nai, decode_call_to_self] 366 | try: 367 | for decoder_func in decoder_funcs: 368 | out_params = decoder_func(d, all_instr_list) 369 | if out_params is not None and out_params[0] is not None: 370 | szd = out_params[0] 371 | iLen = out_params[1] 372 | key = out_params[2] 373 | iXorOffset = out_params[3] 374 | szMsg = out_params[4] 375 | szOut += szMsg + '\n' 376 | 377 | outputparamlst = process_instructions(szd) 378 | all_instr_list = outputparamlst[0] 379 | final_offset_msg = outputparamlst[1] 380 | d = szd 381 | 382 | except Exception as e1: 383 | print(e1) 384 | 385 | # display hex dump 386 | szdisplay = ' '.join([hex(ord(c))[2:].zfill(2) for c in d]) 387 | print('Hex dump: ' + szdisplay) 388 | 389 | for i in range(0, len(all_instr_list)): 390 | instr_lst = all_instr_list[i] 391 | szIns = instr_lst[2] 392 | szOut += '%s %s %s' % (instr_lst[0], instr_lst[1].ljust(16), szIns) 393 | if (i > 0): 394 | szPrev = all_instr_list[i-1][2] 395 | 396 | if (szIns == 'call ebp'): 397 | szDword = None 398 | if (szPrev.find("push 0x") >= 0 or re.search("mov e\wx,0x",szPrev) >= 0): 399 | szDword = szPrev[-10:] 400 | if (i > 2 and all_instr_list[i-1][1] == "0000" and all_instr_list[i-2][1] == "0000"): 401 | szDword = all_instr_list[i-3][2][-10:] 402 | if szDword is not None: 403 | if szDword in APIDict.keys(): 404 | szOut += " --> " + APIDict[szDword] + '\n'; 405 | else: 406 | szOut += '\n' 407 | else: 408 | szOut += '\n' 409 | elif (szIns.find('push 0x') >= 0 and szIns.find('0002')>0 and szPrev.find('push 0x') >= 0 ): 410 | #decode addr and port 411 | #0x000000ad 683418905b push 0x5b901834 IP 412 | #0x000000b2 68020001bb push 0xbb010002 port in highword 413 | szPort = szIns.split(' ')[1][2:6] 414 | szIP = szPrev.split(' ')[1] 415 | hexIP = int(szIP, 16) 416 | hexPort = int(szPort, 16) 417 | hexPort = ((hexPort & 0x0000FF00) >> 8) + ((hexPort & 0x000000FF) << 8) 418 | szOut += "--> IP %s.%s.%s.%s:%s\n" % (hexIP & 0x000000FF, (hexIP & 0x0000FF00) >> 8, (hexIP & 0x00FF0000)>>16, (hexIP & 0xFF000000) >> 24 , hexPort) 419 | elif (szIns.find('push 0x') >= 0 or (szIns.find('mov ') >= 0 and szIns.find(',0x') > 0)): 420 | szDword = szIns.split('0x')[1] # push 0x00707474 --> 007907474 421 | ## if dword is displayable characters (or NUL) then concatenate into a string 422 | szdw = ''.join([chr(int(''.join(c), 16)) for c in zip(szDword[0::2],szDword[1::2])]) 423 | szbytes = ''.join(map(lambda c: c if c in string.printable else '', szdw)) 424 | szbytes = szbytes.replace('\r',' ').replace('\n','') 425 | if len(szbytes) >= 2: 426 | szOut += "--> '" + szbytes + "'\n" 427 | else: 428 | szOut += '\n' 429 | else: 430 | szOut += '\n' 431 | szPrev = szIns 432 | 433 | szOut+= '\nByte Dump:\n' 434 | i = 0 435 | sz = '' 436 | for b in d: 437 | i+=1 438 | if (b in string.printable): 439 | sz += b.encode('utf-8').strip() 440 | else: 441 | sz +='.' 442 | 443 | sz = sz.replace(' ','') 444 | szOut += sz 445 | return szOut 446 | 447 | def xray(sz0): 448 | global fVerbose 449 | global MAX_DISTANCE_FROM_KEYWORD 450 | out = '' 451 | #first transform any char[](dec, dec) strings: 452 | ##example: [char[]](77,105,99,114,111,115,111,102,116,92,87,105,110,100,111,119,115,92,84,101,109,112,108,97,116,101,115,92,108,111,103,46,116,120,116) -join '')")){ 453 | 454 | m = re.search('\[char\[\]\]\(((\d+)+|,)+\)',sz0, re.IGNORECASE) 455 | if m is not None: 456 | sz = m.group(0) 457 | if fVerbose: 458 | print("GROUP: " + sz) 459 | b64buf = '' 460 | for c in re.split('[(,)]', sz): 461 | if c.isdigit(): 462 | b64buf += chr(int(c)) 463 | ##now get the largest string in the decoded buf 464 | out = sz0.replace(sz + " -join ''", b64buf ) 465 | if fVerbose: 466 | print("OUT: " + out) 467 | return out 468 | 469 | #find strings like this and substitute the decoded content 470 | #[Text.Encoding]::Unicode.GetString([Convert]::FromBase64String('MQA6ADEAMQAxADEAMQAxADEAMQAxADEAOgAxADEAMQAxADEAMQAxADEAMQAxADoAMQAxADEAOgAxADEAMQA6AA==' 471 | m = re.search('\[Text\.Encoding\]::Unicode\.GetString\(\[Convert\]::FromBase64String\(\'[A-Za-z0-9=/]*\'\)',sz0, re.IGNORECASE) 472 | if m is not None: 473 | g = m.group() 474 | b64 = (g.split("'")[1].decode('base64')) 475 | b64 = "'" + re.sub(r'[^\x01-\x7f]',r'', b64) + "'" 476 | out = sz0[:m.start()] + b64 + sz0[m.end():] 477 | return out 478 | 479 | sz = sz1 = max(filter(None, re.split("[\\\\ '\";\)]", sz0)), key=len).strip() 480 | 481 | ## test to see if we have candidate Base64 text 482 | h = re.compile(r'[A-Za-z0-9+/=]{10,}$') 483 | m = h.match(sz) 484 | if m is not None: 485 | out = sz = sz.decode('base64') 486 | pos0 = sz0.find(sz1) # position of encoded chunk 487 | 488 | if fVerbose: 489 | print('chunk start index = %d' % pos0) 490 | szdisplay = ' '.join([hex(ord(c))[2:].zfill(2) for c in out]) 491 | print('Hex dump: ' + szdisplay) 492 | 493 | fNotUnicode = False 494 | for i in range(0,10,2): 495 | if sz[i] in string.printable and ord(sz[i+1]) == 0x0: 496 | continue 497 | else: 498 | fNotUnicode = True 499 | break 500 | if fNotUnicode: 501 | if ord(sz[0]) == 0x1f and ord(sz[1]) == 0x8b: 502 | if fVerbose: print('Found GZip') 503 | sz2 = str(zlib.decompressobj(32 + zlib.MAX_WBITS).decompress(sz)) 504 | p1 = sz0[0:sz0.find(sz1)] 505 | p1 = re.sub(r'[^\x01-\x7f]',r'', p1) 506 | p2 = sz2 507 | p3 = sz0[sz0.find(sz1) + len(sz1):] 508 | p3 = re.sub(r'[^\x01-\x7f]',r'', p3) 509 | out = p1 + p2 + p3 510 | elif re.search('deflate',sz0, re.IGNORECASE): 511 | m2 = re.search('deflate',sz0, re.IGNORECASE) 512 | if fVerbose: print('Found Deflate at %d' % m2.start(0)) 513 | if abs(m2.start(0) - pos0) < MAX_DISTANCE_FROM_KEYWORD: 514 | sz2 = str(zlib.decompress( sz, -15)) 515 | p1 = sz0[0:sz0.find(sz1)] 516 | p1 = re.sub(r'[^\x01-\x7f]',r'', p1) 517 | p2 = sz2 518 | p3 = sz0[sz0.find(sz1) + len(sz1):] 519 | p3 = re.sub(r'[^\x01-\x7f]',r'', p3) 520 | out = p1 + p2 + p3 521 | else: 522 | if fVerbose: print('Keyword found too far away from encoded content %d' % abs(m2.start(0) - pos0)) 523 | outputparamlst = process_instructions(sz) 524 | if outputparamlst is not None and len(outputparamlst[0]) > 15: 525 | if fVerbose: print('Found Possible Shellcode') 526 | out = dumpShellcode(sz) 527 | else: 528 | # Test to see if we can dissasemble at least a min amount of instructions 529 | # that suggest we have valid x86 530 | 531 | # if we find curly braces, that suggest the result is code not asm 532 | fUnprintableFound = False 533 | for i in range(0,10): 534 | if sz[i] not in string.printable: 535 | fUnprintableFound = True 536 | if not fUnprintableFound and (sz.count('{') >=1 and sz.count('}') >= 1): ## check how much binary code is there as well 537 | if len(sz) != len(sz0): 538 | p1 = sz0[0:sz0.find(sz1)] 539 | p2 = out 540 | p3 = sz0[sz0.find(sz1) + len(sz1):] 541 | out = p1 + p2 + p3 542 | else: 543 | outputparamlst = process_instructions(sz) 544 | if outputparamlst is not None and len(outputparamlst[0]) > 15: 545 | if fVerbose: print('Found Possible Shellcode') 546 | out = dumpShellcode(sz) 547 | 548 | else: 549 | try: 550 | sz2 = out = out.decode('utf16', 'ignore') 551 | except Exception as e1: 552 | print(e1) 553 | if len(sz1) != len(sz0): 554 | p1 = sz0[0:sz0.find(sz1)] 555 | p1 = re.sub(r'[^\x01-\x7f]',r'', p1) 556 | p2 = sz2 557 | p3 = sz0[sz0.find(sz1) + len(sz1):] 558 | p3 = re.sub(r'[^\x01-\x7f]',r'', p3) 559 | out = p1 + p2 + p3 560 | elif sz.find(',0x') > 0: 561 | if fVerbose: print('Found Possible Shellcode') 562 | 563 | ## 0x6a,0x0,0x53,0xff,0xd5 --> 6f 6a 0 53 ff d5 564 | ## handle leading @( in Ps1 dropped by 06951164119c7b1704b3ab8d0474e609e852785e4e71fbf26061389f9ab12c6d. thx @r00tninja and @James_inthe_box 565 | sz = sz.replace('@','').replace('(','') 566 | szbytes = ''.join([chr(int(''.join(c), 16)) for c in sz.split(',')]) 567 | 568 | out = dumpShellcode(szbytes) 569 | 570 | return out 571 | 572 | if __name__ == '__main__': 573 | parser = argparse.ArgumentParser(description= \ 574 | """Attempt to decode PowerShell scripts by looking for some common encoded data. It defaults to reading from stdin. 575 | \n 576 | REQUIREMENTS: This script uses vivisect for PE parsing and dissasembly: https://github.com/vivisect/vivisect. Set the PYTHONPATH as appropriate. 577 | """ 578 | ) 579 | parser.add_argument('--recurse','-r', help='Recursively decode until done', action='store_true',default=False) 580 | parser.add_argument('--file','-f', help='Read input from a file', action='store', type=str, default=None) 581 | parser.add_argument('--verbose','-v', help='Enable verbose mode', action='store_true', default=False) 582 | parser.add_argument('--noshellcode','-nx', help='Don\'t attempt to decode encoded shellcode', action='store_false', default=True) 583 | parser.add_argument('--dumpapis','-api', help='Dump APIs and hashes', action='store_true', default=False) 584 | parser.add_argument('--apidb','-db', help='Load APIs and hashes from a DB', action='store', type=str,default=None) 585 | args = parser.parse_args() 586 | 587 | psz = sz = None 588 | fVerbose = args.verbose 589 | fDecodeShellcode = args.noshellcode 590 | szDbPath = args.apidb 591 | 592 | if args.dumpapis: 593 | fVerbose = True 594 | prepareAPIs() 595 | sys.exit(0) 596 | 597 | if args.file is not None: 598 | file = open(args.file, 'r') 599 | sz = ' '.join(file.readlines()) 600 | else: 601 | sz = ' '.join(sys.stdin.readlines()) 602 | 603 | if args.recurse: 604 | try: 605 | fRecurse = True 606 | while fRecurse: 607 | psz = str(sz) 608 | sz2 = xray(sz) 609 | if len(sz2) == 0: 610 | fRecurse = False 611 | print(psz) 612 | sz = sz2 613 | except: 614 | print(psz) 615 | pass 616 | else: 617 | psz = xray(sz) 618 | psz = re.sub(r'[^\x01-\x7f]',r'', psz) 619 | print(psz) 620 | --------------------------------------------------------------------------------