├── README.md ├── build └── lib │ └── floss2yar │ ├── __init__.py │ ├── main.py │ └── yar_utils │ ├── __init__.py │ ├── func_parsing.py │ └── processing.py ├── floss2yar.egg-info ├── PKG-INFO ├── SOURCES.txt ├── dependency_links.txt ├── entry_points.txt ├── requires.txt └── top_level.txt ├── floss2yar ├── __init__.py ├── main.py └── yar_utils │ ├── __init__.py │ ├── __pycache__ │ ├── __init__.cpython-37.pyc │ ├── __init__.cpython-39.pyc │ ├── func_parsing.cpython-37.pyc │ ├── func_parsing.cpython-39.pyc │ ├── processing.cpython-37.pyc │ ├── processing.cpython-39.pyc │ └── yara_builder.cpython-37.pyc │ ├── func_parsing.py │ └── processing.py ├── requirements.txt └── setup.py /README.md: -------------------------------------------------------------------------------- 1 | # floss2yar 2 | Adventures in Awful Python To Find Shared Code 3 | 4 | Good code from Connor McLaughlin, bad code from Greg Lesnewich 5 | 6 | Required installs include: 7 | - `flare-floss` 8 | - `rizin` 9 | - `rzpipe` 10 | - `vivisect` 11 | 12 | 13 | ## Premise 14 | This tooling came out of an attempt to speed up a 'secret-sauce' of our YARA workflow. One of the first things we'd do when facing a new cluster of malware or sample tasked with would be to run FLOSS on the file. A tweet from Marc Ochsenmeier referencing the '-x' flag (or expert :D) that kindly highlights likely encoding functions from FLOSS's emulation in addition to some other goodies. Example of the old run of FLOSS with -x flag showing function scoring: 15 | 16 | ``` 17 | $ floss Testing/SampleDump//Turla/35f205367e2e5f8a121925bbae6ff07626b526a7 -x 18 | 19 | TRUNCATED 20 | 21 | Most likely decoding functions in: Testing/SampleDump//Turla/35f205367e2e5f8a121925bbae6ff07626b526a7 22 | address score 23 | --------- ------- 24 | 0x402AE2 1.13577 25 | 0x4034D8 0.80244 26 | 0x403384 0.68577 27 | 0x402F78 0.67398 28 | 0x4017ED 0.67154 29 | 0x402A68 0.50244 30 | 0x404247 0.50244 31 | 0x40394D 0.48333 32 | 0x40379F 0.40488 33 | 0x401369 0.30244 34 | 35 | FLOSS decoded 1858 strings 36 | 37 | Decoding function at 0x40394D (decoded 1788 strings) 38 | Offset Called At String 39 | ---------- ----------- ---------------------------------------------- 40 | [HEAP] 0x401588 02d:%02d:%02d:%03d|\t[%04d|%-48s]\t 41 | [HEAP] 0x401588 csec 42 | [HEAP] 0x401588 02d:%02d:%02d:%03d|\t[%04d|%-48s]\t 43 | 44 | TRUNCATED 45 | 46 | FLOSS extracted 4 stackstrings 47 | Function Frame Offset String 48 | ---------- -------------- ----------------------- 49 | 0x404247 0x21B kernel32.dll 50 | 0x404247 0x110 WriteProcessorPwrScheme 51 | 0x404247 0x110 WriteProcessMemory 52 | 0x404247 0xFD =eme 53 | ``` 54 | 55 | Please note that while function scoring is still present in the new version of FLOSS, it does not 'pop' out of the -v flag the way it used to. If you just want function scoring, try a [this script](https://gist.github.com/williballenthin/635329b7bc4dc73805f6cbfb1bef468b) from Willi instead 56 | 57 | Floss2Yar uses the new version of FLOSS to gather interesting functions, disassemble them in rizin, mask the bytes that likely would change sample over sample (addresses) and generate a rule for each function. The inclusion of the disassembly with the relevant bytes was intentional so more analyts could trim the rule down to interesting basic blocks. 58 | 59 | ## Inspiration: 60 | 61 | - Qutluch's [Steezy](https://github.com/schrodyn/steezy) 62 | - Kaspersky and Costin's [KTAE Tooling](https://securelist.com/big-threats-using-code-similarity-part-1/97239/) 63 | - ArielJT's [VTCodeSimilarity-YaraGen tooling](https://github.com/arieljt/VTCodeSimilarity-YaraGen) 64 | - c3rb3ru5d3d53c's [binlex](https://github.com/c3rb3ru5d3d53c/binlex) 65 | - Malpedia's [yara-signator](https://github.com/fxb-cocacoding/yara-signator) thanks @fxb_b and @push_pnx 66 | - Notareverser's consistent encouragement and slick one off tooling ideas 67 | - ConnorSecurity's vision to automate all workflows and replace me with a computer 68 | - jgrosfelts's mad reversing skills 69 | - xorhex's WILD code and jump and switch table based rules 70 | - williballenthin insane smarts and kindness to underpin this whole thing with a script that is much more professional than the rest of this outfit 71 | - Stvemillertime for making YARA approachable to find evil with weak signals and overall ruling as a human 72 | - BitsofBinary's creative YARA rules that expanded my understanding of detection possibilities 73 | 74 | ## Installation 75 | 76 | Install [rizin](https://github.com/rizinorg/rizin/releases/tag/v0.4.0) 77 | 78 | Update pip 79 | - `pip install --upgrade pip` 80 | 81 | Create Virtual Environment (Clean work space):​ 82 | - `python3 -m venv floss2yar_env​` 83 | - `source floss2yar_env/bin/activate​` 84 | 85 | Jump into floss2yar directory & install components: 86 | - `pip install ./` 87 | 88 | 89 | 90 | ## Usage: 91 | 92 | Point the script at a file using -f and get yaras! 93 | 94 | Optional Flags: 95 | - Score: -s flag with a float (0.95, 1.2, whatever) as a minimum threshold for data coming from FLOSS main library 96 | - Name: -n flag to pass a name to the outputted yara rules; otherwise they will be given generic names based on the functions analyzed 97 | 98 | Please note - as this can produce many yara rules, users are encouraged to rapidly test the new rules for false positives (ie incidental shared functions that are not malicious) over legitimate Windows components or a set of samples that are likely unrelated. Recommend using this [blog](https://stairwell.com/news/threat-research-detection-research-labeled-malware-corpus-yara-testing/) to start! 99 | 100 | Please also note floss2yar uses some verbose logging from FLOSS - don't assume something broke if information begins to flood your terminal 101 | 102 | ``` 103 | $ python3 floss2yar/main.py -f ~/Testing/SampleDump/backburner -s 0.98 104 | 105 | parsing funcs with minimum score: 0.98 106 | [+] parsing funcs with minimum score: 0.98 107 | finding decoding function features: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████| 150/150 [00:00<00:00, 678.83 functions/s, skipped 0 library functions] 108 | [+] trying to process data 109 | 110 | rule floss2yar_fcn_00016c1f { 111 | meta: 112 | author = "floss2yar" 113 | date = "2022-08-30" 114 | version = "1.0" 115 | hash = "ea2ea2ae0d92e9b186ccb313fb8961cf9d6716a80588a87545f71f2a2b48a63d" 116 | 117 | strings: 118 | $fcn_00016c1f = {55 8D 6C 24 98 81 EC 8C ?? ?? ?? 8B 4D 7C 53 8B D8 0F B6 81 F2 ?? ?? ?? 85 C0 56 57 89 45 60 0F 84 ?? ?? ?? ?? 8B 75 78 8D 7D DC A5 A5 A5 8D 45 DC 50 A5 E8 ?? ?? ?? ?? EB ?? 83 7D 74 ?? 0F 84 ?? ?? ?? ?? 8B 45 60 8A 44 05 DC 32 03 8B 4D 70 FF 45 70 FF 45 60 43 FF 4D 74 88 01 83 7D 60 10 7C ?? 83 7D 74 ?? 0F 84 ?? ?? ?? ?? 8B 4D 78 FF 01 83 65 60 ?? E9 ?? ?? ?? ?? BE ?? ?? ?? ?? 39 75 74 7E ?? 89 75 5C EB ?? 8B 45 74 89 45 5C 8B 45 5C 29 45 74 C1 F8 04 85 C0 8D 55 DC 89 55 64 7E ?? 8B D0 8B 7D 64 83 ?? ?? ?? 8B F1 A5 A5 A5 A5 FF 01 4A 75 ?? F6 45 5C 0F 74 ?? 8B 7D 64 8B F1 A5 A5 A5 A5 40 C1 E0 04 8B F0 C1 FE 04 A8 ?? 8D 7D DC 75 ?? EB ?? 8B 4D 7C 57 8B C7 4E E8 ?? ?? ?? ?? 83 ?? ?? 85 F6 75 ?? 83 65 64 ?? F6 ?? ?? 8B 4D 70 8D 45 DC 75 ?? F6 ?? ?? 75 ?? 8B D0 F6 ?? ?? 75 ?? 83 7D 5C 10 0F 8C ?? ?? ?? ?? 6A ?? 5E 8D 55 E4 2B F2 8B 13 33 10 6A ?? 89 11 8B 50 04 33 53 04 89 51 04 8B 50 08 33 53 08 89 51 08 8B 50 0C 33 53 0C 89 51 0C 5A 01 55 64 03 C2 03 DA 03 CA 8D 54 06 08 3B 55 5C 7E ?? E9 ?? ?? ?? ?? 83 7D 5C 10 0F 8C ?? ?? ?? ?? 6A ?? 5E 8D 55 DE 2B F2 8A 13 32 10 6A ?? 88 11 8A 50 01 32 53 01 88 51 01 8A 50 02 32 53 02 88 51 02 8A 50 03 32 53 03 88 51 03 8A 50 04 32 53 04 88 51 04 8A 50 05 32 53 05 88 51 05 8A 50 06 32 53 06 88 51 06 8A 50 07 32 53 07 88 51 07 8A 50 08 32 53 08 88 51 08 8A 50 09 32 53 09 88 51 09 8A 50 0A 32 53 0A 88 51 0A 8A 50 0B 32 53 0B 88 51 0B 8A 50 0C 32 53 0C 88 51 0C 8A 50 0D 32 53 0D 88 51 0D 8A 50 0E 32 53 0E 88 51 0E 8A 50 0F 32 53 0F 88 51 0F 5A} 119 | /* 120 | ; CALL XREF from fcn.000134c9 @ 0x1359e 121 | ; CALL XREF from fcn.000135e1 @ 0x1376a 122 | ┌ fcn.00016c1f (); 123 | │ 0x00016c1f 55 push ebp 124 | │ 0x00016c20 8d6c2498 lea ebp, [esp - 0x68] 125 | │ 0x00016c24 81ec8c000000 sub esp, 0x8c 126 | │ 0x00016c2a 8b4d7c mov ecx, dword [ebp + 0x7c] 127 | │ 0x00016c2d 53 push ebx 128 | │ 0x00016c2e 8bd8 mov ebx, eax 129 | │ 0x00016c30 0fb681f20000. movzx eax, byte [ecx + 0xf2] 130 | │ 0x00016c37 85c0 test eax, eax 131 | │ 0x00016c39 56 push esi 132 | │ 0x00016c3a 57 push edi 133 | │ 0x00016c3b 894560 mov dword [ebp + 0x60], eax 134 | │ ┌─< 0x00016c3e 0f840a020000 je 0x16e4e 135 | │ │ 0x00016c44 8b7578 mov esi, dword [ebp + 0x78] 136 | │ │ 0x00016c47 8d7ddc lea edi, [ebp - 0x24] 137 | │ │ 0x00016c4a a5 movsd dword es:[edi], dword ptr [esi] 138 | │ │ 0x00016c4b a5 movsd dword es:[edi], dword ptr [esi] 139 | │ │ 0x00016c4c a5 movsd dword es:[edi], dword ptr [esi] 140 | │ │ 0x00016c4d 8d45dc lea eax, [ebp - 0x24] 141 | │ │ 0x00016c50 50 push eax ; int32_t arg_8h 142 | │ │ 0x00016c51 a5 movsd dword es:[edi], dword ptr [esi] 143 | │ │ 0x00016c52 e81c020000 call fcn.00016e73 144 | │ ┌──< 0x00016c57 eb22 jmp 0x16c7b 145 | │ ┌───> 0x00016c59 837d7400 cmp dword [ebp + 0x74], 0 146 | │ ┌────< 0x00016c5d 0f84f8010000 je 0x16e5b 147 | │ │╎││ 0x00016c63 8b4560 mov eax, dword [ebp + 0x60] 148 | │ │╎││ 0x00016c66 8a4405dc mov al, byte [ebp + eax - 0x24] 149 | │ │╎││ 0x00016c6a 3203 xor al, byte [ebx] 150 | │ │╎││ 0x00016c6c 8b4d70 mov ecx, dword [ebp + 0x70] 151 | │ │╎││ 0x00016c6f ff4570 inc dword [ebp + 0x70] 152 | │ │╎││ 0x00016c72 ff4560 inc dword [ebp + 0x60] 153 | │ │╎││ 0x00016c75 43 inc ebx 154 | │ │╎││ 0x00016c76 ff4d74 dec dword [ebp + 0x74] 155 | │ │╎││ 0x00016c79 8801 mov byte [ecx], al 156 | │ │╎││ ; CODE XREF from fcn.00016c1f @ 0x16c57 157 | │ │╎└──> 0x00016c7b 837d6010 cmp dword [ebp + 0x60], 0x10 158 | │ │└───< 0x00016c7f 7cd8 jl 0x16c59 159 | │ │ │ 0x00016c81 837d7400 cmp dword [ebp + 0x74], 0 160 | │ │ ┌──< 0x00016c85 0f84d0010000 je 0x16e5b 161 | │ │ ││ 0x00016c8b 8b4d78 mov ecx, dword [ebp + 0x78] 162 | │ │ ││ 0x00016c8e ff01 inc dword [ecx] 163 | │ │ ││ 0x00016c90 83656000 and dword [ebp + 0x60], 0 164 | │ │┌───< 0x00016c94 e9b8010000 jmp 0x16e51 165 | │ ┌─────> 0x00016c99 be80000000 mov esi, 0x80 ; 128 166 | │ ╎││││ 0x00016c9e 397574 cmp dword [ebp + 0x74], esi 167 | │ ┌──────< 0x00016ca1 7e05 jle 0x16ca8 168 | │ │╎││││ 0x00016ca3 89755c mov dword [ebp + 0x5c], esi 169 | │ ┌───────< 0x00016ca6 eb06 jmp 0x16cae 170 | │ │└──────> 0x00016ca8 8b4574 mov eax, dword [ebp + 0x74] 171 | │ │ ╎││││ 0x00016cab 89455c mov dword [ebp + 0x5c], eax 172 | │ │ ╎││││ ; CODE XREF from fcn.00016c1f @ 0x16ca6 173 | │ └───────> 0x00016cae 8b455c mov eax, dword [ebp + 0x5c] 174 | │ ╎││││ 0x00016cb1 294574 sub dword [ebp + 0x74], eax 175 | │ ╎││││ 0x00016cb4 c1f804 sar eax, 4 176 | │ ╎││││ 0x00016cb7 85c0 test eax, eax 177 | │ ╎││││ 0x00016cb9 8d55dc lea edx, [ebp - 0x24] 178 | │ ╎││││ 0x00016cbc 895564 mov dword [ebp + 0x64], edx 179 | │ ┌──────< 0x00016cbf 7e14 jle 0x16cd5 180 | │ │╎││││ 0x00016cc1 8bd0 mov edx, eax 181 | │ ┌───────> 0x00016cc3 8b7d64 mov edi, dword [ebp + 0x64] 182 | │ ╎│╎││││ 0x00016cc6 83456410 add dword [ebp + 0x64], 0x10 ; [0x10:4]=-1 ; 16 183 | │ ╎│╎││││ 0x00016cca 8bf1 mov esi, ecx 184 | │ ╎│╎││││ 0x00016ccc a5 movsd dword es:[edi], dword ptr [esi] 185 | │ ╎│╎││││ 0x00016ccd a5 movsd dword es:[edi], dword ptr [esi] 186 | │ ╎│╎││││ 0x00016cce a5 movsd dword es:[edi], dword ptr [esi] 187 | │ ╎│╎││││ 0x00016ccf a5 movsd dword es:[edi], dword ptr [esi] 188 | │ ╎│╎││││ 0x00016cd0 ff01 inc dword [ecx] 189 | │ ╎│╎││││ 0x00016cd2 4a dec edx 190 | │ └───────< 0x00016cd3 75ee jne 0x16cc3 191 | │ └──────> 0x00016cd5 f6455c0f test byte [ebp + 0x5c], 0xf 192 | │ ┌──────< 0x00016cd9 740a je 0x16ce5 193 | │ │╎││││ 0x00016cdb 8b7d64 mov edi, dword [ebp + 0x64] 194 | │ │╎││││ 0x00016cde 8bf1 mov esi, ecx 195 | │ │╎││││ 0x00016ce0 a5 movsd dword es:[edi], dword ptr [esi] 196 | │ │╎││││ 0x00016ce1 a5 movsd dword es:[edi], dword ptr [esi] 197 | │ │╎││││ 0x00016ce2 a5 movsd dword es:[edi], dword ptr [esi] 198 | │ │╎││││ 0x00016ce3 a5 movsd dword es:[edi], dword ptr [esi] 199 | │ │╎││││ 0x00016ce4 40 inc eax 200 | │ └──────> 0x00016ce5 c1e004 shl eax, 4 201 | │ ╎││││ 0x00016ce8 8bf0 mov esi, eax 202 | │ ╎││││ 0x00016cea c1fe04 sar esi, 4 203 | │ ╎││││ 0x00016ced a80f test al, 0xf ; 15 204 | │ ╎││││ 0x00016cef 8d7ddc lea edi, [ebp - 0x24] 205 | │ ┌──────< 0x00016cf2 7515 jne 0x16d09 206 | │ ┌───────< 0x00016cf4 eb0f jmp 0x16d05 207 | │ ────────> 0x00016cf6 8b4d7c mov ecx, dword [ebp + 0x7c] 208 | │ ││╎││││ 0x00016cf9 57 push edi ; int32_t arg_8h 209 | │ ││╎││││ 0x00016cfa 8bc7 mov eax, edi 210 | │ ││╎││││ 0x00016cfc 4e dec esi 211 | │ ││╎││││ 0x00016cfd e871010000 call fcn.00016e73 212 | │ ││╎││││ 0x00016d02 83c710 add edi, 0x10 ; 16 213 | │ ││╎││││ ; CODE XREF from fcn.00016c1f @ 0x16cf4 214 | │ └───────> 0x00016d05 85f6 test esi, esi 215 | │ ────────< 0x00016d07 75ed jne 0x16cf6 216 | │ └──────> 0x00016d09 83656400 and dword [ebp + 0x64], 0 217 | │ ╎││││ 0x00016d0d f6c303 test bl, 3 ; 3 218 | │ ╎││││ 0x00016d10 8b4d70 mov ecx, dword [ebp + 0x70] 219 | │ ╎││││ 0x00016d13 8d45dc lea eax, [ebp - 0x24] 220 | │ ┌──────< 0x00016d16 7559 jne 0x16d71 221 | │ │╎││││ 0x00016d18 f6c103 test cl, 3 ; 3 222 | │ ┌───────< 0x00016d1b 7554 jne 0x16d71 223 | │ ││╎││││ 0x00016d1d 8bd0 mov edx, eax 224 | │ ││╎││││ 0x00016d1f f6c203 test dl, 3 ; 3 225 | │ ────────< 0x00016d22 754d jne 0x16d71 226 | │ ││╎││││ 0x00016d24 837d5c10 cmp dword [ebp + 0x5c], 0x10 227 | │ ────────< 0x00016d28 0f8cfe000000 jl 0x16e2c 228 | │ ││╎││││ 0x00016d2e 6a10 push 0x10 ; 16 229 | │ ││╎││││ 0x00016d30 5e pop esi 230 | │ ││╎││││ 0x00016d31 8d55e4 lea edx, [ebp - 0x1c] 231 | │ ││╎││││ 0x00016d34 2bf2 sub esi, edx 232 | │ ────────> 0x00016d36 8b13 mov edx, dword [ebx] 233 | │ ││╎││││ 0x00016d38 3310 xor edx, dword [eax] 234 | │ ││╎││││ 0x00016d3a 6a10 push 0x10 ; 16 235 | │ ││╎││││ 0x00016d3c 8911 mov dword [ecx], edx 236 | │ ││╎││││ 0x00016d3e 8b5004 mov edx, dword [eax + 4] 237 | │ ││╎││││ 0x00016d41 335304 xor edx, dword [ebx + 4] 238 | │ ││╎││││ 0x00016d44 895104 mov dword [ecx + 4], edx 239 | │ ││╎││││ 0x00016d47 8b5008 mov edx, dword [eax + 8] 240 | │ ││╎││││ 0x00016d4a 335308 xor edx, dword [ebx + 8] 241 | │ ││╎││││ 0x00016d4d 895108 mov dword [ecx + 8], edx 242 | │ ││╎││││ 0x00016d50 8b500c mov edx, dword [eax + 0xc] 243 | │ ││╎││││ 0x00016d53 33530c xor edx, dword [ebx + 0xc] 244 | │ ││╎││││ 0x00016d56 89510c mov dword [ecx + 0xc], edx 245 | │ ││╎││││ 0x00016d59 5a pop edx 246 | │ ││╎││││ 0x00016d5a 015564 add dword [ebp + 0x64], edx 247 | │ ││╎││││ 0x00016d5d 03c2 add eax, edx 248 | │ ││╎││││ 0x00016d5f 03da add ebx, edx 249 | │ ││╎││││ 0x00016d61 03ca add ecx, edx 250 | │ ││╎││││ 0x00016d63 8d540608 lea edx, [esi + eax + 8] 251 | │ ││╎││││ 0x00016d67 3b555c cmp edx, dword [ebp + 0x5c] 252 | │ ────────< 0x00016d6a 7eca jle 0x16d36 253 | │ ────────< 0x00016d6c e9b8000000 jmp 0x16e29 254 | │ └└──────> 0x00016d71 837d5c10 cmp dword [ebp + 0x5c], 0x10 255 | │ ┌──────< 0x00016d75 0f8cb1000000 jl 0x16e2c 256 | │ │╎││││ 0x00016d7b 6a10 push 0x10 ; 16 257 | │ │╎││││ 0x00016d7d 5e pop esi 258 | │ │╎││││ 0x00016d7e 8d55de lea edx, [ebp - 0x22] 259 | │ │╎││││ 0x00016d81 2bf2 sub esi, edx 260 | │ ┌───────> 0x00016d83 8a13 mov dl, byte [ebx] 261 | │ ╎│╎││││ 0x00016d85 3210 xor dl, byte [eax] 262 | │ ╎│╎││││ 0x00016d87 6a10 push 0x10 ; 16 263 | │ ╎│╎││││ 0x00016d89 8811 mov byte [ecx], dl 264 | │ ╎│╎││││ 0x00016d8b 8a5001 mov dl, byte [eax + 1] 265 | │ ╎│╎││││ 0x00016d8e 325301 xor dl, byte [ebx + 1] 266 | │ ╎│╎││││ 0x00016d91 885101 mov byte [ecx + 1], dl 267 | │ ╎│╎││││ 0x00016d94 8a5002 mov dl, byte [eax + 2] 268 | │ ╎│╎││││ 0x00016d97 325302 xor dl, byte [ebx + 2] 269 | │ ╎│╎││││ 0x00016d9a 885102 mov byte [ecx + 2], dl 270 | │ ╎│╎││││ 0x00016d9d 8a5003 mov dl, byte [eax + 3] 271 | │ ╎│╎││││ 0x00016da0 325303 xor dl, byte [ebx + 3] 272 | │ ╎│╎││││ 0x00016da3 885103 mov byte [ecx + 3], dl 273 | │ ╎│╎││││ 0x00016da6 8a5004 mov dl, byte [eax + 4] 274 | │ ╎│╎││││ 0x00016da9 325304 xor dl, byte [ebx + 4] 275 | │ ╎│╎││││ 0x00016dac 885104 mov byte [ecx + 4], dl 276 | │ ╎│╎││││ 0x00016daf 8a5005 mov dl, byte [eax + 5] 277 | │ ╎│╎││││ 0x00016db2 325305 xor dl, byte [ebx + 5] 278 | │ ╎│╎││││ 0x00016db5 885105 mov byte [ecx + 5], dl 279 | │ ╎│╎││││ 0x00016db8 8a5006 mov dl, byte [eax + 6] 280 | │ ╎│╎││││ 0x00016dbb 325306 xor dl, byte [ebx + 6] 281 | │ ╎│╎││││ 0x00016dbe 885106 mov byte [ecx + 6], dl 282 | │ ╎│╎││││ 0x00016dc1 8a5007 mov dl, byte [eax + 7] 283 | │ ╎│╎││││ 0x00016dc4 325307 xor dl, byte [ebx + 7] 284 | │ ╎│╎││││ 0x00016dc7 885107 mov byte [ecx + 7], dl 285 | │ ╎│╎││││ 0x00016dca 8a5008 mov dl, byte [eax + 8] 286 | │ ╎│╎││││ 0x00016dcd 325308 xor dl, byte [ebx + 8] 287 | │ ╎│╎││││ 0x00016dd0 885108 mov byte [ecx + 8], dl 288 | │ ╎│╎││││ 0x00016dd3 8a5009 mov dl, byte [eax + 9] 289 | │ ╎│╎││││ 0x00016dd6 325309 xor dl, byte [ebx + 9] 290 | │ ╎│╎││││ 0x00016dd9 885109 mov byte [ecx + 9], dl 291 | │ ╎│╎││││ 0x00016ddc 8a500a mov dl, byte [eax + 0xa] 292 | │ ╎│╎││││ 0x00016ddf 32530a xor dl, byte [ebx + 0xa] 293 | │ ╎│╎││││ 0x00016de2 88510a mov byte [ecx + 0xa], dl 294 | │ ╎│╎││││ 0x00016de5 8a500b mov dl, byte [eax + 0xb] 295 | │ ╎│╎││││ 0x00016de8 32530b xor dl, byte [ebx + 0xb] 296 | │ ╎│╎││││ 0x00016deb 88510b mov byte [ecx + 0xb], dl 297 | │ ╎│╎││││ 0x00016dee 8a500c mov dl, byte [eax + 0xc] 298 | │ ╎│╎││││ 0x00016df1 32530c xor dl, byte [ebx + 0xc] 299 | │ ╎│╎││││ 0x00016df4 88510c mov byte [ecx + 0xc], dl 300 | │ ╎│╎││││ 0x00016df7 8a500d mov dl, byte [eax + 0xd] 301 | │ ╎│╎││││ 0x00016dfa 32530d xor dl, byte [ebx + 0xd] 302 | │ ╎│╎││││ 0x00016dfd 88510d mov byte [ecx + 0xd], dl 303 | │ ╎│╎││││ 0x00016e00 8a500e mov dl, byte [eax + 0xe] 304 | │ ╎│╎││││ 0x00016e03 32530e xor dl, byte [ebx + 0xe] 305 | │ ╎│╎││││ 0x00016e06 88510e mov byte [ecx + 0xe], dl 306 | │ ╎│╎││││ 0x00016e09 8a500f mov dl, byte [eax + 0xf] 307 | │ ╎│╎││││ 0x00016e0c 32530f xor dl, byte [ebx + 0xf] 308 | │ ╎│╎││││ 0x00016e0f 88510f mov byte [ecx + 0xf], dl 309 | │ ╎│╎││││ 0x00016e12 5a pop edx 310 | │ ╎│╎││││ 0x00016e13 015564 add dword [ebp + 0x64], edx 311 | │ ╎│╎││││ 0x00016e16 03c2 add eax, edx 312 | │ ╎│╎││││ 0x00016e18 03da add ebx, edx 313 | │ ╎│╎││││ 0x00016e1a 03ca add ecx, edx 314 | │ ╎│╎││││ 0x00016e1c 8d540602 lea edx, [esi + eax + 2] 315 | │ ╎│╎││││ 0x00016e20 3b555c cmp edx, dword [ebp + 0x5c] 316 | │ └───────< 0x00016e23 0f8e5affffff jle 0x16d83 317 | │ │╎││││ ; CODE XREF from fcn.00016c1f @ 0x16d6c 318 | │ ────────> 0x00016e29 894d70 mov dword [ebp + 0x70], ecx 319 | │ ─└──────> 0x00016e2c 8b555c mov edx, dword [ebp + 0x5c] 320 | │ ╎││││ 0x00016e2f 395564 cmp dword [ebp + 0x64], edx 321 | │ ┌──────< 0x00016e32 7d1a jge 0x16e4e 322 | │ │╎││││ 0x00016e34 8bf2 mov esi, edx 323 | │ │╎││││ 0x00016e36 2b7564 sub esi, dword [ebp + 0x64] 324 | │ ┌───────> 0x00016e39 8b5560 mov edx, dword [ebp + 0x60] 325 | │ ╎│╎││││ 0x00016e3c 8a1410 mov dl, byte [eax + edx] 326 | │ ╎│╎││││ 0x00016e3f 3213 xor dl, byte [ebx] 327 | │ ╎│╎││││ 0x00016e41 8811 mov byte [ecx], dl 328 | │ ╎│╎││││ 0x00016e43 41 inc ecx 329 | │ ╎│╎││││ 0x00016e44 ff4560 inc dword [ebp + 0x60] 330 | │ ╎│╎││││ 0x00016e47 43 inc ebx 331 | │ ╎│╎││││ 0x00016e48 4e dec esi 332 | │ └───────< 0x00016e49 75ee jne 0x16e39 333 | │ │╎││││ 0x00016e4b 894d70 mov dword [ebp + 0x70], ecx 334 | │ └────└─> 0x00016e4e 8b4d78 mov ecx, dword [ebp + 0x78] 335 | │ ╎│││ ; CODE XREF from fcn.00016c1f @ 0x16c94 336 | │ ╎│└───> 0x00016e51 837d7400 cmp dword [ebp + 0x74], 0 337 | │ └─────< 0x00016e55 0f853efeffff jne 0x16c99 338 | │ └─└──> 0x00016e5b 8a4560 mov al, byte [ebp + 0x60] 339 | │ 0x00016e5e 8b4d7c mov ecx, dword [ebp + 0x7c] 340 | │ 0x00016e61 5f pop edi 341 | │ 0x00016e62 5e pop esi 342 | │ 0x00016e63 8881f2000000 mov byte [ecx + 0xf2], al 343 | │ 0x00016e69 33c0 xor eax, eax 344 | │ 0x00016e6b 5b pop ebx 345 | │ 0x00016e6c 83c568 add ebp, 0x68 ; 104 346 | │ 0x00016e6f c9 leave 347 | └ 0x00016e70 c21000 ret 0x10 348 | 349 | */ 350 | condition: 351 | 1 of them 352 | } 353 | 354 | 355 | ``` 356 | -------------------------------------------------------------------------------- /build/lib/floss2yar/__init__.py: -------------------------------------------------------------------------------- 1 | from .main import * -------------------------------------------------------------------------------- /build/lib/floss2yar/main.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Metadata-Version: 1.2 3 | Name: floss2yar 4 | Version: 0.1 5 | Summary: Generate YARA Rules Based on FLOSS Finding Decoding Functions 6 | Home-page: UNKNOWN 7 | Author: Greg Lesnewich & Connor McLaughlin 8 | Author-email: glesnewich@gmail.com 9 | License: UNKNOWN 10 | Description: UNKNOWN 11 | Platform: UNKNOWN 12 | Requires-Python: >=3.6 13 | ''' 14 | 15 | import argparse 16 | import os 17 | 18 | from yar_utils import func_parsing 19 | from yar_utils import processing 20 | 21 | 22 | def run_floss(filepath, score, name): 23 | extracted = func_parsing.floss_func_parsing(filepath, score) 24 | processing.data_processing(extracted, name) 25 | 26 | def main(): 27 | parser = argparse.ArgumentParser(description="Create a masked YARA rule for a file based on FLOSS finding likely decoding functions") 28 | parser.add_argument("-f", "--file", help="Specify file to parse", metavar="", required=True) 29 | parser.add_argument("-s", "--score", help="Minumum FLOSS Func Scoring Threshold to Create YARA Rules from (default: 0.90)", metavar="", required=False) 30 | parser.add_argument("-n", "--name", help="Name for output rules - example MAL_EVILDOOR without quotes", metavar="", required=False) 31 | args = parser.parse_args() 32 | 33 | if args.file: 34 | try: 35 | run_floss(args.file, args.score, args.name) 36 | except: 37 | print("finding decode funcs failed") 38 | 39 | 40 | if __name__ == "__main__": 41 | main() 42 | 43 | -------------------------------------------------------------------------------- /build/lib/floss2yar/yar_utils/__init__.py: -------------------------------------------------------------------------------- 1 | from . import func_parsing 2 | from . import processing -------------------------------------------------------------------------------- /build/lib/floss2yar/yar_utils/func_parsing.py: -------------------------------------------------------------------------------- 1 | from collections import defaultdict 2 | import logging 3 | from multiprocessing.sharedctypes import Value 4 | import subprocess 5 | from curses import raw 6 | from typing import Optional 7 | import rzpipe 8 | import json 9 | 10 | from . import processing 11 | 12 | def floss_func_parsing(file, score): 13 | print('parsing funcs with minimum score: ', score) 14 | analysis = FileAnalysis(file, []) 15 | funclist = processing.get_floss_funcs(file, score) 16 | rz = analysis.rz 17 | rz.cmd('aaaa') 18 | json_blob = rz.cmd('aflj') 19 | data = json.loads(json_blob) 20 | for func in data: 21 | for name in funclist: 22 | if name in func['name']: 23 | if func['size'] > 50: 24 | if func['size'] < 600: 25 | fun = FunctionFeature(rz, name) 26 | analysis.functions.append(fun) 27 | 28 | rz.cmd('q') 29 | return analysis 30 | 31 | 32 | class FileAnalysis(object): 33 | """Holds the strategies and file information as well as the rz pipe""" 34 | 35 | def __init__(self, filepath:str, strategies:list): 36 | """""" 37 | self.rz = rzpipe.open(filepath) 38 | self.rz.cmd("aaaa") 39 | self.file_hash = json.loads(self.rz.cmd("itj"))['sha256'] 40 | self.file_path = filepath 41 | 42 | self.strategies = [] 43 | for each in strategies: 44 | if not issubclass(each, FunctionFinder): 45 | raise ValueError(f"{type(each)} is not a FunctionFinder") 46 | self.strategies.append(each) 47 | 48 | self.interesting_function_addrs = defaultdict(list) 49 | self.functions = [] 50 | 51 | 52 | def get_functions(self): 53 | """Find funcitons in the file with the set strategies""" 54 | for strat in self.strategies: 55 | for addr, comments in strat.get_functions(self.file_path, self.rz).items(): 56 | self.interesting_function_addrs[addr].extend(comments) 57 | 58 | def analyze_functions(self): 59 | """Create Function Features based on the unique addrs identified""" 60 | logging.debug(f"Analyzing {len(self.interesting_function_addrs)} functions") 61 | for addr in sorted(list(self.interesting_function_addrs.keys())): 62 | try: 63 | func = FunctionFeature(self.rz, addr) 64 | except: 65 | logging.exception(f"Failed to analyze function {addr}") 66 | continue 67 | self.functions.append(func) 68 | 69 | def quit_rizin(self): 70 | self.rz.cmd('q') 71 | 72 | def run(self) -> list: 73 | """Get and analyze functions, running this will close the rizin output since at this point File analysis should be done""" 74 | self.get_functions() 75 | self.analyze_functions() 76 | self.quit_rizin() 77 | return self.functions 78 | 79 | 80 | class FunctionFinder(object): 81 | """Base class that is able to find interesting functions to consider for yara rules 82 | 83 | During operation of the yar2d2 we can have one or more of these be used at a time""" 84 | strategy_name = "UNDEFINED" 85 | 86 | @classmethod 87 | def get_functions(cls, filepath, rz): 88 | """Return a diction of function addresses with comments as to why they were added""" 89 | raise NotImplementedError 90 | 91 | 92 | class FunctionFeature(object): 93 | """Class representing a function that we'll use to capture a function 94 | 95 | Notes: 96 | This will rely alot on the signature functionality of Rizin 97 | Details can be found here: https://book.rizin.re/signatures/zignatures.html""" 98 | 99 | def __init__(self, rz: rzpipe.open_sync.open, function_symbol): 100 | """Create a function feature with a pipe and a specific symobol or hexadecimal address""" 101 | 102 | # Check to see if the function symbol is an address or not 103 | symbol = None 104 | try: 105 | addr = int(function_symbol, 16) 106 | symbol = self.resolve_symbol_for_addr(rz, addr) 107 | except ValueError: 108 | symbol = function_symbol 109 | 110 | 111 | # Get File SHA256 112 | self.file = json.loads(rz.cmd("itj"))["sha256"] 113 | 114 | # Get highlevel function information 115 | rz.cmd(f"s {symbol}") 116 | raw_function_data = rz.cmd("afij") 117 | if len(raw_function_data) == 0: 118 | raise Exception(f"Couldn't parse information on {symbol}") 119 | function_data = json.loads(raw_function_data) 120 | if type(function_data) != list or len(function_data) != 1: 121 | raise Exception(f"Broken assumption on how the data should exist") 122 | function_data = function_data[0] 123 | 124 | self.name = function_data['name'] 125 | self.size = function_data['size'] 126 | self.signature = function_data['signature'] 127 | 128 | # Get Dissassembly for comment string 129 | rz.cmd("e asm.bytes=true") 130 | self.disassembly = rz.cmd(f"pD {self.size}@ {self.name}") 131 | 132 | 133 | # Create the signature 134 | output = rz.cmd(f"zaf {symbol} {symbol}") 135 | 136 | #TODO some output checking here 137 | 138 | # Get the signature output 139 | raw_data = rz.cmd("zj") 140 | if len(raw_data) == 0: 141 | raise Exception(f"Failed to get data from signature for {symbol}") 142 | 143 | # Do some processing to make sure there's only one signature 144 | data = json.loads(raw_data) 145 | if len(data) == 1: 146 | data = data[0] 147 | else: 148 | data = list(filter(lambda x: (x['name'] == symbol),data)) 149 | if len(data) != 1: 150 | raise Exception(f"More than one signatures matched {symbol}") 151 | data = data[0] 152 | 153 | # Load up the raw data 154 | self.sig_name = data["name"] 155 | self.bytes = bytes.fromhex(data['bytes']) 156 | self.mask = bytes.fromhex(data['mask']) 157 | self.graph = data['graph'] 158 | self.addr = data['addr'] 159 | # Default the realname to function symbol for now 160 | self.realname = data.get('realname', symbol) 161 | self.xrefs_from = data['xrefs_from'] 162 | self.xrefs_to = data['xrefs_to'] 163 | self.vars = data['vars'] 164 | self.types = data['types'] 165 | self.hash = data['hash'] 166 | self._masked_asm_str = None 167 | 168 | # Delete the sig 169 | rz.cmd(f"z- {symbol}") 170 | 171 | @property 172 | def masked_asm_str(self) -> str: 173 | """Return an ascii hex string with ?? masking out parts of the instruction""" 174 | if self._masked_asm_str is not None: 175 | return self._masked_asm_str 176 | 177 | ret_str = [] 178 | for x,y in zip(self.bytes, self.mask): 179 | if x & y == 0: 180 | ret_str.append("??") 181 | else: 182 | ret_str.append(f"{x & y:02X}") 183 | self._masked_asm_str = " ".join(ret_str) 184 | return self._masked_asm_str 185 | 186 | @property 187 | def yara_str(self) -> str: 188 | """Return a yara rule ready string""" 189 | return "$ = {{ {} }}".format(self.masked_asm_str) 190 | 191 | def resolve_symbol_for_addr(self, rz:rzpipe.open_sync.open, addr:int): 192 | """Attempt to resolve a symbol for the address 193 | 194 | Notes: 195 | If we don't have a function defined in rizin, we'll attempt to analzye the function 196 | """ 197 | 198 | # Check to see if we have a function defined 199 | res = rz.cmd(f"afd @ {addr:#x}") 200 | if res == '': 201 | logging.debug(f"Warning defining a new function at {addr:#X}") 202 | rz.cmd(f"af @ {addr:#x}") 203 | 204 | # Seek to address 205 | rz.cmd(f"s {addr}") 206 | 207 | # Get function information 208 | fdata = json.loads(rz.cmd('afij')) 209 | if len(fdata) != 1: 210 | raise Exception(f"Broken assumption for addr {addr}") 211 | 212 | fdata = fdata[0] 213 | 214 | # Validate that the address is within offset and offset + size 215 | if addr < fdata['offset'] or addr > fdata['offset'] + fdata['size']: 216 | raise Exception(f"Broken Assumption: {addr:#X} outside of bounds {fdata['offset']:#X} < < {fdata['offset'] + fdata['size']:#X}") 217 | 218 | return fdata['name'] 219 | 220 | 221 | 222 | def __str__(self) -> str: 223 | simple_features = dict( 224 | name=self.name, 225 | realname=self.realname, 226 | file=self.file, 227 | size=self.size, 228 | masked_asm_str=self.masked_asm_str, 229 | disassembly=self.disassembly 230 | ) 231 | return json.dumps(simple_features) 232 | -------------------------------------------------------------------------------- /build/lib/floss2yar/yar_utils/processing.py: -------------------------------------------------------------------------------- 1 | from datetime import date 2 | from unicodedata import name 3 | import floss 4 | import viv_utils 5 | from floss import identify 6 | 7 | 8 | def yara_builder(list_of_input_functions, name_arg): 9 | if name_arg is not None: 10 | name_arg = str(name_arg) 11 | rule_name_str = "\nrule " + name_arg + '_floss2yar_' 12 | else: 13 | rule_name_str = '\nrule floss2yar_' 14 | rule_str = "" 15 | today = date.today().isoformat() 16 | for input_function in list_of_input_functions: 17 | disass = input_function['disass'] 18 | yara_strang = input_function['yara_str'] 19 | yara_str_name = input_function['func_name'] 20 | todaydate = 'date = "' + today + '"' 21 | rule_setup = rule_name_str + yara_str_name + ' {\nmeta:\n\tauthor = "floss2yar"\n\t' + todaydate + '\n\tversion = "1.0"\n' 22 | rule_str += rule_setup 23 | hash_list = sorted(list(set(input_function['samples']))) 24 | for f in hash_list: 25 | e = f.strip() 26 | rule_str += f'\thash = "{e}"\n' 27 | rule = '\nstrings: \n' + '\t$' + yara_str_name + ' = {' + yara_strang + '}\n /* \n' + disass + '\n */ \ncondition: \n\t1 of them \n}' 28 | rule_str += rule 29 | rule_str += "\n\n" 30 | print(rule_str) 31 | return rule_str 32 | 33 | def get_floss_funcs(file, min_score): 34 | if min_score is None: 35 | min_score = 0.90 36 | min_scoring = float(min_score) 37 | print('[+] parsing funcs with minimum score: ', min_scoring) 38 | candidates = [] 39 | vw = viv_utils.getWorkspace(file) 40 | functions = vw.getFunctions() 41 | func_features, lib_funcs = floss.identify.find_decoding_function_features(vw, functions) 42 | # dict from function VA (int) to score (float) 43 | func_scores = { 44 | fva: features["score"] 45 | for fva, features in func_features.items() 46 | } 47 | # list of tuples (score (float), function VA (int)) sorted descending 48 | func_scores = sorted([ 49 | (score, fva) 50 | for fva, score in func_scores.items() 51 | ], reverse=True) 52 | for score, fva in func_scores: 53 | if score > min_scoring: 54 | offset = f"{fva:x}" 55 | func = offset.lower() 56 | candidates.append(func) 57 | return candidates 58 | 59 | 60 | def data_processing(blob_of_data, name_arg): 61 | print('[+] trying to process data') 62 | final_yara_list = {} 63 | for sample in blob_of_data.functions: 64 | yara_input = {} 65 | yara_input['samples'] = [] 66 | sample.realname = sample.realname.replace('.', '_') 67 | yara_input['func_name'] = sample.realname 68 | yara_input['yara_str'] = sample.masked_asm_str 69 | yara_input['disass'] = sample.disassembly 70 | yara_input['samples'].append(sample.file) 71 | final_yara_list[sample] = yara_input 72 | 73 | final_yara_list = [v for k,v in final_yara_list.items()] 74 | return yara_builder(final_yara_list, name_arg) 75 | -------------------------------------------------------------------------------- /floss2yar.egg-info/PKG-INFO: -------------------------------------------------------------------------------- 1 | Metadata-Version: 2.1 2 | Name: floss2yar 3 | Version: 0.1 4 | Summary: Generate YARA Based on code similarity 5 | Home-page: UNKNOWN 6 | Author: Greg Lesnewich and Connor McLaughlin 7 | Author-email: glesnewich@gmail.com 8 | License: UNKNOWN 9 | Platform: UNKNOWN 10 | Requires-Python: >=3.6 11 | 12 | UNKNOWN 13 | 14 | -------------------------------------------------------------------------------- /floss2yar.egg-info/SOURCES.txt: -------------------------------------------------------------------------------- 1 | README.md 2 | setup.py 3 | floss2yar/__init__.py 4 | floss2yar/main.py 5 | floss2yar.egg-info/PKG-INFO 6 | floss2yar.egg-info/SOURCES.txt 7 | floss2yar.egg-info/dependency_links.txt 8 | floss2yar.egg-info/entry_points.txt 9 | floss2yar.egg-info/requires.txt 10 | floss2yar.egg-info/top_level.txt 11 | floss2yar/yar_utils/__init__.py 12 | floss2yar/yar_utils/func_parsing.py 13 | floss2yar/yar_utils/processing.py -------------------------------------------------------------------------------- /floss2yar.egg-info/dependency_links.txt: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /floss2yar.egg-info/entry_points.txt: -------------------------------------------------------------------------------- 1 | [console_scripts] 2 | floss2yar = floss2yar.main:main 3 | 4 | -------------------------------------------------------------------------------- /floss2yar.egg-info/requires.txt: -------------------------------------------------------------------------------- 1 | flare-floss==2.0.0 2 | flirt==0.0.2 3 | pefile==2022.5.30 4 | python-flirt==0.7.0 5 | viv-utils==0.7.5 6 | vivisect==1.0.8 7 | yara-python==4.2.0 8 | rzpipe==0.4.0 9 | numpy<1.23 10 | -------------------------------------------------------------------------------- /floss2yar.egg-info/top_level.txt: -------------------------------------------------------------------------------- 1 | floss2yar 2 | -------------------------------------------------------------------------------- /floss2yar/__init__.py: -------------------------------------------------------------------------------- 1 | from .main import * -------------------------------------------------------------------------------- /floss2yar/main.py: -------------------------------------------------------------------------------- 1 | ''' 2 | Metadata-Version: 1.2 3 | Name: floss2yar 4 | Version: 0.1 5 | Summary: Generate YARA Rules Based on FLOSS Finding Decoding Functions 6 | Home-page: UNKNOWN 7 | Author: Greg Lesnewich & Connor McLaughlin 8 | Author-email: glesnewich@gmail.com 9 | License: UNKNOWN 10 | Description: UNKNOWN 11 | Platform: UNKNOWN 12 | Requires-Python: >=3.6 13 | ''' 14 | 15 | import argparse 16 | import os 17 | 18 | from yar_utils import func_parsing 19 | from yar_utils import processing 20 | 21 | 22 | def run_floss(filepath, score, name): 23 | extracted = func_parsing.floss_func_parsing(filepath, score) 24 | processing.data_processing(extracted, name) 25 | 26 | def main(): 27 | parser = argparse.ArgumentParser(description="Create a masked YARA rule for a file based on FLOSS finding likely decoding functions") 28 | parser.add_argument("-f", "--file", help="Specify file to parse", metavar="", required=True) 29 | parser.add_argument("-s", "--score", help="Minumum FLOSS Func Scoring Threshold to Create YARA Rules from (default: 0.90)", metavar="", required=False) 30 | parser.add_argument("-n", "--name", help="Name for output rules - example MAL_EVILDOOR without quotes", metavar="", required=False) 31 | args = parser.parse_args() 32 | 33 | if args.file: 34 | try: 35 | run_floss(args.file, args.score, args.name) 36 | except: 37 | print("finding decode funcs failed") 38 | 39 | 40 | if __name__ == "__main__": 41 | main() 42 | 43 | -------------------------------------------------------------------------------- /floss2yar/yar_utils/__init__.py: -------------------------------------------------------------------------------- 1 | from . import func_parsing 2 | from . import processing -------------------------------------------------------------------------------- /floss2yar/yar_utils/__pycache__/__init__.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g-les/floss2yar/b84083a39fdeed04f78cac4a0920341cec0c05de/floss2yar/yar_utils/__pycache__/__init__.cpython-37.pyc -------------------------------------------------------------------------------- /floss2yar/yar_utils/__pycache__/__init__.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g-les/floss2yar/b84083a39fdeed04f78cac4a0920341cec0c05de/floss2yar/yar_utils/__pycache__/__init__.cpython-39.pyc -------------------------------------------------------------------------------- /floss2yar/yar_utils/__pycache__/func_parsing.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g-les/floss2yar/b84083a39fdeed04f78cac4a0920341cec0c05de/floss2yar/yar_utils/__pycache__/func_parsing.cpython-37.pyc -------------------------------------------------------------------------------- /floss2yar/yar_utils/__pycache__/func_parsing.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g-les/floss2yar/b84083a39fdeed04f78cac4a0920341cec0c05de/floss2yar/yar_utils/__pycache__/func_parsing.cpython-39.pyc -------------------------------------------------------------------------------- /floss2yar/yar_utils/__pycache__/processing.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g-les/floss2yar/b84083a39fdeed04f78cac4a0920341cec0c05de/floss2yar/yar_utils/__pycache__/processing.cpython-37.pyc -------------------------------------------------------------------------------- /floss2yar/yar_utils/__pycache__/processing.cpython-39.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g-les/floss2yar/b84083a39fdeed04f78cac4a0920341cec0c05de/floss2yar/yar_utils/__pycache__/processing.cpython-39.pyc -------------------------------------------------------------------------------- /floss2yar/yar_utils/__pycache__/yara_builder.cpython-37.pyc: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/g-les/floss2yar/b84083a39fdeed04f78cac4a0920341cec0c05de/floss2yar/yar_utils/__pycache__/yara_builder.cpython-37.pyc -------------------------------------------------------------------------------- /floss2yar/yar_utils/func_parsing.py: -------------------------------------------------------------------------------- 1 | from collections import defaultdict 2 | import logging 3 | from multiprocessing.sharedctypes import Value 4 | import subprocess 5 | from curses import raw 6 | from typing import Optional 7 | import rzpipe 8 | import json 9 | 10 | from . import processing 11 | 12 | def floss_func_parsing(file, score): 13 | analysis = FileAnalysis(file, []) 14 | funclist = processing.get_floss_funcs(file, score) 15 | rz = analysis.rz 16 | rz.cmd('aaaa') 17 | json_blob = rz.cmd('aflj') 18 | data = json.loads(json_blob) 19 | for func in data: 20 | for name in funclist: 21 | if name in func['name']: 22 | if func['size'] > 50: 23 | if func['size'] < 600: 24 | fun = FunctionFeature(rz, name) 25 | analysis.functions.append(fun) 26 | 27 | rz.cmd('q') 28 | return analysis 29 | 30 | 31 | class FileAnalysis(object): 32 | """Holds the strategies and file information as well as the rz pipe""" 33 | 34 | def __init__(self, filepath:str, strategies:list): 35 | """""" 36 | self.rz = rzpipe.open(filepath) 37 | self.rz.cmd("aaaa") 38 | self.file_hash = json.loads(self.rz.cmd("itj"))['sha256'] 39 | self.file_path = filepath 40 | 41 | self.strategies = [] 42 | for each in strategies: 43 | if not issubclass(each, FunctionFinder): 44 | raise ValueError(f"{type(each)} is not a FunctionFinder") 45 | self.strategies.append(each) 46 | 47 | self.interesting_function_addrs = defaultdict(list) 48 | self.functions = [] 49 | 50 | 51 | def get_functions(self): 52 | """Find funcitons in the file with the set strategies""" 53 | for strat in self.strategies: 54 | for addr, comments in strat.get_functions(self.file_path, self.rz).items(): 55 | self.interesting_function_addrs[addr].extend(comments) 56 | 57 | def analyze_functions(self): 58 | """Create Function Features based on the unique addrs identified""" 59 | logging.debug(f"Analyzing {len(self.interesting_function_addrs)} functions") 60 | for addr in sorted(list(self.interesting_function_addrs.keys())): 61 | try: 62 | func = FunctionFeature(self.rz, addr) 63 | except: 64 | logging.exception(f"Failed to analyze function {addr}") 65 | continue 66 | self.functions.append(func) 67 | 68 | def quit_rizin(self): 69 | self.rz.cmd('q') 70 | 71 | def run(self) -> list: 72 | """Get and analyze functions, running this will close the rizin output since at this point File analysis should be done""" 73 | self.get_functions() 74 | self.analyze_functions() 75 | self.quit_rizin() 76 | return self.functions 77 | 78 | 79 | class FunctionFinder(object): 80 | """Base class that is able to find interesting functions to consider for yara rules 81 | 82 | During operation of the yar2d2 we can have one or more of these be used at a time""" 83 | strategy_name = "UNDEFINED" 84 | 85 | @classmethod 86 | def get_functions(cls, filepath, rz): 87 | """Return a diction of function addresses with comments as to why they were added""" 88 | raise NotImplementedError 89 | 90 | 91 | class FunctionFeature(object): 92 | """Class representing a function that we'll use to capture a function 93 | 94 | Notes: 95 | This will rely alot on the signature functionality of Rizin 96 | Details can be found here: https://book.rizin.re/signatures/zignatures.html""" 97 | 98 | def __init__(self, rz: rzpipe.open_sync.open, function_symbol): 99 | """Create a function feature with a pipe and a specific symobol or hexadecimal address""" 100 | 101 | # Check to see if the function symbol is an address or not 102 | symbol = None 103 | try: 104 | addr = int(function_symbol, 16) 105 | symbol = self.resolve_symbol_for_addr(rz, addr) 106 | except ValueError: 107 | symbol = function_symbol 108 | 109 | 110 | # Get File SHA256 111 | self.file = json.loads(rz.cmd("itj"))["sha256"] 112 | 113 | # Get highlevel function information 114 | rz.cmd(f"s {symbol}") 115 | raw_function_data = rz.cmd("afij") 116 | if len(raw_function_data) == 0: 117 | raise Exception(f"Couldn't parse information on {symbol}") 118 | function_data = json.loads(raw_function_data) 119 | if type(function_data) != list or len(function_data) != 1: 120 | raise Exception(f"Broken assumption on how the data should exist") 121 | function_data = function_data[0] 122 | 123 | self.name = function_data['name'] 124 | self.size = function_data['size'] 125 | self.signature = function_data['signature'] 126 | 127 | # Get Dissassembly for comment string 128 | rz.cmd("e asm.bytes=true") 129 | self.disassembly = rz.cmd(f"pD {self.size}@ {self.name}") 130 | 131 | 132 | # Create the signature 133 | output = rz.cmd(f"zaf {symbol} {symbol}") 134 | 135 | #TODO some output checking here 136 | 137 | # Get the signature output 138 | raw_data = rz.cmd("zj") 139 | if len(raw_data) == 0: 140 | raise Exception(f"Failed to get data from signature for {symbol}") 141 | 142 | # Do some processing to make sure there's only one signature 143 | data = json.loads(raw_data) 144 | if len(data) == 1: 145 | data = data[0] 146 | else: 147 | data = list(filter(lambda x: (x['name'] == symbol),data)) 148 | if len(data) != 1: 149 | raise Exception(f"More than one signatures matched {symbol}") 150 | data = data[0] 151 | 152 | # Load up the raw data 153 | self.sig_name = data["name"] 154 | self.bytes = bytes.fromhex(data['bytes']) 155 | self.mask = bytes.fromhex(data['mask']) 156 | self.graph = data['graph'] 157 | self.addr = data['addr'] 158 | # Default the realname to function symbol for now 159 | self.realname = data.get('realname', symbol) 160 | self.xrefs_from = data['xrefs_from'] 161 | self.xrefs_to = data['xrefs_to'] 162 | self.vars = data['vars'] 163 | self.types = data['types'] 164 | self.hash = data['hash'] 165 | self._masked_asm_str = None 166 | 167 | # Delete the sig 168 | rz.cmd(f"z- {symbol}") 169 | 170 | @property 171 | def masked_asm_str(self) -> str: 172 | """Return an ascii hex string with ?? masking out parts of the instruction""" 173 | if self._masked_asm_str is not None: 174 | return self._masked_asm_str 175 | 176 | ret_str = [] 177 | for x,y in zip(self.bytes, self.mask): 178 | if x & y == 0: 179 | ret_str.append("??") 180 | else: 181 | ret_str.append(f"{x & y:02X}") 182 | self._masked_asm_str = " ".join(ret_str) 183 | return self._masked_asm_str 184 | 185 | @property 186 | def yara_str(self) -> str: 187 | """Return a yara rule ready string""" 188 | return "$ = {{ {} }}".format(self.masked_asm_str) 189 | 190 | def resolve_symbol_for_addr(self, rz:rzpipe.open_sync.open, addr:int): 191 | """Attempt to resolve a symbol for the address 192 | 193 | Notes: 194 | If we don't have a function defined in rizin, we'll attempt to analzye the function 195 | """ 196 | 197 | # Check to see if we have a function defined 198 | res = rz.cmd(f"afd @ {addr:#x}") 199 | if res == '': 200 | logging.debug(f"Warning defining a new function at {addr:#X}") 201 | rz.cmd(f"af @ {addr:#x}") 202 | 203 | # Seek to address 204 | rz.cmd(f"s {addr}") 205 | 206 | # Get function information 207 | fdata = json.loads(rz.cmd('afij')) 208 | if len(fdata) != 1: 209 | raise Exception(f"Broken assumption for addr {addr}") 210 | 211 | fdata = fdata[0] 212 | 213 | # Validate that the address is within offset and offset + size 214 | if addr < fdata['offset'] or addr > fdata['offset'] + fdata['size']: 215 | raise Exception(f"Broken Assumption: {addr:#X} outside of bounds {fdata['offset']:#X} < < {fdata['offset'] + fdata['size']:#X}") 216 | 217 | return fdata['name'] 218 | 219 | 220 | 221 | def __str__(self) -> str: 222 | simple_features = dict( 223 | name=self.name, 224 | realname=self.realname, 225 | file=self.file, 226 | size=self.size, 227 | masked_asm_str=self.masked_asm_str, 228 | disassembly=self.disassembly 229 | ) 230 | return json.dumps(simple_features) 231 | -------------------------------------------------------------------------------- /floss2yar/yar_utils/processing.py: -------------------------------------------------------------------------------- 1 | from datetime import date 2 | from unicodedata import name 3 | import floss 4 | import viv_utils 5 | from floss import identify 6 | 7 | 8 | def yara_builder(list_of_input_functions, name_arg): 9 | if name_arg is not None: 10 | name_arg = str(name_arg) 11 | rule_name_str = "\nrule " + name_arg + '_floss2yar_' 12 | else: 13 | rule_name_str = '\nrule floss2yar_' 14 | rule_str = "" 15 | today = date.today().isoformat() 16 | for input_function in list_of_input_functions: 17 | disass = input_function['disass'] 18 | yara_strang = input_function['yara_str'] 19 | yara_str_name = input_function['func_name'] 20 | todaydate = 'date = "' + today + '"' 21 | rule_setup = rule_name_str + yara_str_name + ' {\nmeta:\n\tauthor = "floss2yar"\n\t' + todaydate + '\n\tversion = "1.0"\n' 22 | rule_str += rule_setup 23 | hash_list = sorted(list(set(input_function['samples']))) 24 | for f in hash_list: 25 | e = f.strip() 26 | rule_str += f'\thash = "{e}"\n' 27 | rule = '\nstrings: \n' + '\t$' + yara_str_name + ' = {' + yara_strang + '}\n /* \n' + disass + '\n */ \ncondition: \n\t1 of them \n}' 28 | rule_str += rule 29 | rule_str += "\n\n" 30 | print(rule_str) 31 | return rule_str 32 | 33 | def get_floss_funcs(file, min_score): 34 | if min_score is None: 35 | min_score = 0.90 36 | min_scoring = float(min_score) 37 | print('[+] parsing funcs with minimum score: ', min_scoring) 38 | candidates = [] 39 | vw = viv_utils.getWorkspace(file) 40 | functions = vw.getFunctions() 41 | func_features, lib_funcs = floss.identify.find_decoding_function_features(vw, functions) 42 | # dict from function VA (int) to score (float) 43 | func_scores = { 44 | fva: features["score"] 45 | for fva, features in func_features.items() 46 | } 47 | # list of tuples (score (float), function VA (int)) sorted descending 48 | func_scores = sorted([ 49 | (score, fva) 50 | for fva, score in func_scores.items() 51 | ], reverse=True) 52 | for score, fva in func_scores: 53 | if score > min_scoring: 54 | offset = f"{fva:x}" 55 | func = offset.lower() 56 | candidates.append(func) 57 | return candidates 58 | 59 | 60 | def data_processing(blob_of_data, name_arg): 61 | print('[+] trying to process data') 62 | final_yara_list = {} 63 | for sample in blob_of_data.functions: 64 | yara_input = {} 65 | yara_input['samples'] = [] 66 | sample.realname = sample.realname.replace('.', '_') 67 | yara_input['func_name'] = sample.realname 68 | yara_input['yara_str'] = sample.masked_asm_str 69 | yara_input['disass'] = sample.disassembly 70 | yara_input['samples'].append(sample.file) 71 | final_yara_list[sample] = yara_input 72 | 73 | final_yara_list = [v for k,v in final_yara_list.items()] 74 | return yara_builder(final_yara_list, name_arg) 75 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | flare-floss==2.0.0 2 | flirt==0.0.2 3 | pefile==2022.5.30 4 | python-flirt==0.7.0 5 | viv-utils==0.7.5 6 | vivisect==1.0.8 7 | yara-python==4.2.0 8 | rzpipe==0.4.0 9 | numpy<1.23 -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | 3 | setup( 4 | name="floss2yar", 5 | version="0.1", 6 | description="Generate YARA Based on code similarity", 7 | author="Greg Lesnewich and Connor McLaughlin", 8 | author_email="glesnewich@gmail.com", 9 | packages=find_packages(), 10 | install_requires=open("requirements.txt").read().splitlines(), 11 | entry_points={ 12 | "console_scripts": ["floss2yar=floss2yar.main:main",], 13 | }, 14 | python_requires=">=3.6", 15 | ) 16 | --------------------------------------------------------------------------------