├── IMPLEMENTATION.md ├── README.md ├── __init__.py ├── java.py ├── tests.py └── x86.py /IMPLEMENTATION.md: -------------------------------------------------------------------------------- 1 | # pyasm2 2 | © 2012, Jurriaan Bremer 3 | 4 | ## Introduction 5 | 6 | _pyasm2_ is an x86 assembler library. It allows an easy Intel-like assembly 7 | syntax, with support for sequences of instructions, as well as labels. 8 | 9 | ## Simple Usage 10 | 11 | Here are some examples to illustrate the simplicity of pyasm2. For each 12 | example the normal Intel-syntax is given, followed by the equivalent using 13 | pyasm2. 14 | 15 | * `push eax` → **`push(eax)`** 16 | * `mov eax, ebx` → **`mov(eax, ebx)`** 17 | * `lea edx, [ebp+eax*4+32]` → **`lea(edx, [ebp+eax*4+32])`** 18 | * `movzx ebx, byte [esp-64]` → **`movzx(ebx, byte [esp-64])`** 19 | * `mov eax, dword fs:[0xc0]` → **`mov(eax, dword [fs:0xc0])`** 20 | 21 | Note that pyasm2 throws an exception if the instruction doesn't support the 22 | given operands (an operand is like a parameter to an instruction.) 23 | 24 | A few simple command-line examples. 25 | ```python 26 | >>> from pyasm2 import * 27 | >>> mov(eax, dword[ebx+0x100]) 28 | mov eax, dword [ebx+0x100] 29 | >>> push(dword[esp]) 30 | push dword [esp] 31 | >>> mov(eax, eax, eax) # invalid encoding 32 | ... snip ... 33 | Exception: Unknown or Invalid Encoding 34 | ``` 35 | 36 | ## Blocks 37 | 38 | Besides normal instructions pyasm2 also supports sequences of instructions, 39 | referred to as *blocks* from now on. 40 | 41 | Blocks are especially useful when chaining multiple instructions. Besides 42 | that, blocks automatically resolve relative jumps, labels, etc. 43 | 44 | A simple example of a function that does only one thing; zero the *eax* 45 | register (the default return value of a function on x86) and returning to the 46 | caller, looks like the following. 47 | 48 | ```python 49 | Block( 50 | xor(eax, eax), 51 | retn() 52 | ) 53 | ``` 54 | 55 | Before we discuss further on blocks, we first need an introduction on pyasm2 56 | labels. 57 | 58 | ## Labels 59 | 60 | pyasm2 supports two types of Labels; anonymous labels and named labels. 61 | 62 | #### Anonymous Labels 63 | 64 | Anonymous labels get an index, and can be referred to by a relative index. 65 | 66 | For example, the following block increases the *eax* register infinite times. 67 | (The -1 in this example is a relative index, so -1 points to the last defined 68 | Label.) 69 | 70 | ```python 71 | Block( 72 | Label(), 73 | inc(eax), 74 | jmp(Label(-1)) 75 | ) 76 | ``` 77 | 78 | It is, however, not possible to reference to anonymous labels outside of the 79 | current block (i.e. an IndexError is thrown.) 80 | 81 | There are three different possible values for relative indices. 82 | 83 | * *Negative Index* → Points to an anonymous label before the current 84 | instruction. 85 | * *Zero Index* → Points to a transparant label which points to the 86 | current instruction. 87 | * *Positive Index* → Points to an anonymous label after the current 88 | instruction. 89 | 90 | (This does indeed mean that relative index *1* points to the first label 91 | after the current instruction.) 92 | 93 | Throughout the following sections we will refer to this snippet, by rewriting 94 | it a little bit every time. 95 | 96 | #### Global Named Labels 97 | 98 | A new named label can be created by creating a new Label instance with the 99 | name as first parameter. Referencing a named label is just like referencing 100 | an anonymous label, but instead of passing an index, you give a string as 101 | parameter. 102 | 103 | ```python 104 | Block( 105 | Label('loop'), 106 | inc(eax), 107 | jmp(Label('loop')) 108 | ) 109 | ``` 110 | 111 | Note that this type of named label is global, that is, other blocks can 112 | reference to this particular label as well. This is useful for example when 113 | defining a function. (Note that two or more blocks can *not* declare the same 114 | global named labels!) 115 | 116 | #### Local Named Labels 117 | 118 | Whereas one could make a global named label using e.g. `Label('name')`, it is 119 | also possible to make a *local* named label; a label that's only defined for 120 | the current block. Because local labels are more commonly used than global 121 | labels, their syntax is easier as well. Local named labels are simply created 122 | **and referenced** by using a string as name. 123 | 124 | ```python 125 | Block( 126 | 'loop', 127 | inc(eax), 128 | jmp('loop') 129 | ) 130 | ``` 131 | 132 | #### Label References 133 | 134 | -Labels are referenced by e.g. `Label('name')`. When looking up label 135 | references, pyasm2 will first try to find the label in the current block, 136 | and only if there is no such label in the current block, it will look it up 137 | in the parent. In other words, local named labels are more important than 138 | global named labels.- 139 | 140 | Local Named Labels and Global Named Labels can *not* be mixed. E.g. the 141 | following snippet throws an error. 142 | 143 | ```python 144 | Block( 145 | Label('loop'), # global named label 146 | inc(eax), 147 | jmp('loop') # local named label 148 | ``` 149 | 150 | ### Further Label Tweaks 151 | 152 | Now we've seen the types of labels supported by pyasm2, it is time to get to 153 | some awesome tweaks which will speed up development and clean up your code 154 | even further. 155 | 156 | #### Label classobj instead of instance 157 | 158 | It is possible to define an Anonymous Label by passing the Label class, 159 | instead of passing an instance. 160 | 161 | ```python 162 | Block( 163 | Label, 164 | inc(eax), 165 | jmp(Label(-1)) 166 | ) 167 | ``` 168 | 169 | #### Global Named Labels as variabele 170 | 171 | Because global named labels are able to reference to labels outside their 172 | current scope (a block), it is also possible to reference to them as a 173 | variabele (e.g. a function.) 174 | 175 | ```python 176 | return_zero = Label('return_zero') 177 | f = Block( 178 | return_zero, 179 | xor(eax, eax), 180 | retn() 181 | ) 182 | f2 = Block( 183 | call(return_zero), 184 | # ... do something ... 185 | ) 186 | ``` 187 | 188 | #### Alias Label to L 189 | 190 | For those of us that think that the classname *Label* is too long, you could 191 | simply make an alias to **L** (i.e. `L = Label`.) 192 | 193 | ```python 194 | Block( 195 | L, 196 | inc(eax), 197 | jmp(L(-1)) 198 | ) 199 | ``` 200 | 201 | #### Tweaked Anonymous Label References 202 | 203 | Because `jmp(L(-1))` looks pretty ugly (see the [Alias Label to L][] section), 204 | we've tweaked anonymous label references even further to the point where you 205 | can add or subtract a relative index directly to/from the `Label` class. 206 | 207 | [Alias Label to L]: #alias-label-to-l 208 | 209 | ```python 210 | Block( 211 | L, 212 | inc(eax), 213 | jmp(L-1) 214 | ) 215 | ``` 216 | 217 | #### Offset from a Label 218 | 219 | Sometimes it might be necessary to add or subtract a value from the address of 220 | a label, in those cases the following technique applies. 221 | 222 | ```python 223 | Block( 224 | L, 225 | nop, 226 | mov(eax, Label(-1)+1) 227 | ) 228 | ``` 229 | 230 | In this example the anonymous label will be referenced, but the value one is 231 | added to it. So `Label(-1)+1` points to the `mov` instruction, because the 232 | `nop` instruction is only one byte in length. 233 | 234 | Do note that `Label(-1)+1` could be rewritten as `L-1+1`, but *please* don't 235 | do that, we don't want to torture python. 236 | 237 | ## Blocks part two 238 | 239 | Now we've seen how pyasm2 handles labels, it's time for some more in-depth 240 | information about blocks. 241 | 242 | #### Instruction classobj instead of instance 243 | 244 | Any instruction that does *not* take any additional operands (e.g. `retn`, 245 | `stosb`, `sysenter`, etc.) can be used directly in a block without actually 246 | making an instance. For example, the following two snippets are equal to 247 | pyasm2. 248 | 249 | ```python 250 | Block( 251 | mov(eax, 0), 252 | retn() 253 | ) 254 | ``` 255 | ```python 256 | Block( 257 | mov(eax, 0), 258 | retn 259 | ) 260 | ``` 261 | 262 | #### Combining Blocks 263 | 264 | One can combine multiple blocks by *adding* one to the other. Combining blocks 265 | is actually just merging them, e.g. one block is appended to the other block. 266 | 267 | ```python 268 | a = Block( 269 | mov(eax, ebx), 270 | mov(ebx, 42) 271 | ) 272 | b = Block( 273 | mov(ecx, edx) 274 | ) 275 | print repr(a + b) 276 | # Block(mov(eax, ebx), mov(ebx, 42), mov(ecx, edx)) 277 | ``` 278 | 279 | #### Temporary Blocks as Lists 280 | 281 | Temporary blocks, those that you only use to add to other blocks, can be 282 | written as lists (or tuples, for that matter.) 283 | 284 | ```python 285 | a = Block( 286 | mov(eax, ebx), 287 | mov(ebx, 42) 288 | ) 289 | print repr(a + [xor(ecx, ecx), retn]) 290 | # Block(mov(eax, ebx), mov(ebx, 42), xor(ecx, ecx), retn) 291 | ``` 292 | 293 | This does, however, not work if you want to call *repr* or *str* on the block. 294 | In that particular case, you can do the following. 295 | 296 | ```python 297 | a = [xor(eax, eax), retn] 298 | print repr(Block(a)) 299 | # Block(xor(eax, eax), retn) 300 | ``` 301 | 302 | #### Combining Instructions Directly 303 | 304 | Instead of writing e.g. `Block(mov(eax, ebx), mov(ebx, 42))`, pyasm2 offers a 305 | shorthand. 306 | 307 | ```python 308 | a = mov(eax, ebx) + mov(ebx, 42) 309 | print repr(a) 310 | # Block(mov(eax, ebx), mov(ebx, 42)) 311 | ``` 312 | 313 | ## Raw Data Sections 314 | 315 | As any assembler, pyasm2 also supports raw data. There are a few supported 316 | data types; signed/unsigned 8/16/32/64bit integers, strings and labels (which 317 | are 32bit pointers on x86.) 318 | 319 | Some examples should suffice as explanation. 320 | 321 | ```python 322 | a = Block( 323 | String('abc'), 324 | Int8(0x64), 325 | Uint8(0x65), 326 | Uint16(0x6766), 327 | Int32(0x6b6a6968) 328 | ) 329 | print str(a) 330 | # abcdefghijlk 331 | ``` 332 | 333 | #### Raw Data Aliases 334 | 335 | Some interesting aliases include. 336 | 337 | * `S = String` 338 | * `i8 = Int8` 339 | * `u8 = Uint8` 340 | * etc. 341 | 342 | #### Multiple Items with the same Type 343 | 344 | It is perfectly possible to define multiple values of the same type in one 345 | simple statement. 346 | 347 | ```python 348 | a = Uint32(0x11223344, 0x44332211, 0x12345678, 0x87654321) 349 | ``` 350 | 351 | ## Blocks part three 352 | 353 | Now we have seen the declaration of raw data using pyasm2, it is time to link 354 | code and data sections. For example, in normal executable binaries, it is 355 | normal to have different so-called sections for code and data. This way the 356 | code is seperated from the data. 357 | 358 | This gives us a problem. When assembling, we do not have to combine the text 359 | and data blocks, so in order to get the correct addresses of code and data, 360 | we do the following. We assign an address to the data section, and from there 361 | give every label with address to the code section. This way the code section 362 | knows where to find the references to those labels. 363 | 364 | ```python 365 | a = Block( 366 | mov(eax, L('hello')), 367 | # ... snip ... 368 | ) 369 | b = Block( 370 | L('hello'), 371 | String('Hello World!\n\x00') 372 | ) 373 | b.base_address(0x402000) 374 | a.references(b) 375 | ``` 376 | 377 | ## pyasm2 Internals 378 | 379 | Although most of pyasm2 is fairly straightforward (chaining instructions is 380 | not that hard), there is one tricky part: **labels**. 381 | 382 | To start off, the x86 instruction set provides two types of relative jumps. 383 | Those with an 8bit relative offset, and those with a 32bit relative offset. 384 | 385 | Besides that, instructions can refer to other instructions or addresses within 386 | a data section, using labels. This means that pyasm2 has to keep track of 387 | these references, and magically fix them in the final step. 388 | 389 | #### Relative Offset Size 390 | 391 | So a relative jump can point to another instruction, by using a label. This 392 | raises the question; is the offset to this instruction within the size of an 393 | 8bit relative offset, or a 32bit one? 394 | 395 | (8bit relative jumps are 2 bytes in length, 32bit ones are 5 bytes for 396 | unconditional jumps, and 6 bytes for conditional ones.) 397 | 398 | There are two solutions to this problem, as far as I can tell. 399 | 400 | * Each label keeps a list of instructions pointing to it. When assembling, 401 | each of the instructions is updated with the location of the label, so the 402 | instructions can assemble the address or relative offset accordingly. 403 | From here the instruction can determine if the offset has to be 8bit or 404 | 32bit. 405 | * At first each relative jump is created using a 32bit relative offset. 406 | Then, after assembling each instruction, the instructions are enumerated 407 | and a check is done if the relative jumps would fit as jumps with an 8bit 408 | relative offset as well. If that is the case, the jump is updated, and all 409 | the other instructions are updated as well. This goes one until there are 410 | no relative jumps left to tweak, or a recursive limit has exceeded. 411 | 412 | Although the first implementation might be a little better, performance wise. 413 | pyasm2 uses the latter implementation, which is much easier to implement. 414 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | pyasm2 - x86 assembler library (C) 2012 Jurriaan Bremer 2 | 3 | Although its called pyasm2, this is not per se a successor of Pyasm or pyASM. 4 | pyasm2 aims to be as flexible as possible, it will support x86, SSE and SSE2. 5 | 6 | A key feature of pyasm2 is the ability to have blocks of instructions and 7 | being able to give the base address at a later time, that is, you don't need 8 | to know the address of instructions before-hand. For example, you can construct 9 | a series of instructions, request the size that will be needed in order to 10 | store all instructions as sequence, allocate this memory and write the 11 | instructions from there, this approach is very useful when making JIT 12 | compilers etc. 13 | 14 | The syntax of pyasm2 is supposed to be as simple as possible. 15 | 16 | For example, an instruction such as "**mov eax, dword [ebx+edx*2+32]**" can be 17 | encoded using pyasm2 as the following. 18 | ```python 19 | mov(eax, dword [ebx+edx*2+32]) 20 | ``` 21 | 22 | These memory addresses also support **segment registers**, e.g. 23 | ```python 24 | mov(eax, dword[fs:0xc0]) 25 | ``` 26 | although this is currently only supported in 64bit python versions. 27 | 28 | Furthermore, pyasm2 makes it possible to **chain multiple instructions**. Take 29 | for example the following statement. 30 | ```python 31 | block(mov(eax, ebx), push(32)) 32 | ``` 33 | 34 | However, for more implementation-specific details, please refer to the 35 | *IMPLEMENTATION* file. 36 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/jbremer/pyasm2/ff0f0e1c02146dc9230211f44c2dd04ff3bd4265/__init__.py -------------------------------------------------------------------------------- /java.py: -------------------------------------------------------------------------------- 1 | """ 2 | Java Disassembler & Assembler Engine (C) 2012 Jurriaan Bremer 3 | https://github.com/jbremer/pyasm2 4 | 5 | """ 6 | import struct 7 | 8 | # http://en.wikipedia.org/wiki/Java_bytecode_instruction_listings 9 | _table = { 10 | 0x32: 'aaload', 11 | 0x53: 'aastore', 12 | 0x01: 'aconst_null', 13 | 0x19: 'aload', 14 | 0x2a: 'aload_0', 15 | 0x2b: 'aload_1', 16 | 0x2c: 'aload_2', 17 | 0x2d: 'aload_3', 18 | 0xbd: 'anewarray', 19 | 0xb0: 'areturn', 20 | 0xbe: 'arraylength', 21 | 0x3a: 'astore', 22 | 0x4b: 'astore_0', 23 | 0x4c: 'astore_1', 24 | 0x4d: 'astore_2', 25 | 0x4e: 'astore_3', 26 | 0xbf: 'athrow', 27 | 0x33: 'baload', 28 | 0x54: 'bastore', 29 | 0x10: 'bipush', 30 | 0x34: 'caload', 31 | 0x55: 'castore', 32 | 0xc0: 'checkcast', 33 | 0x90: 'd2f', 34 | 0x8e: 'd2i', 35 | 0x8f: 'd2l', 36 | 0x63: 'dadd', 37 | 0x31: 'daload', 38 | 0x52: 'dastore', 39 | 0x98: 'dcmpg', 40 | 0x97: 'dcmpl', 41 | 0x0e: 'dconst_0', 42 | 0x0f: 'dconst_1', 43 | 0x6f: 'ddiv', 44 | 0x18: 'dload', 45 | 0x26: 'dload_0', 46 | 0x27: 'dload_1', 47 | 0x28: 'dload_2', 48 | 0x29: 'dload_3', 49 | 0x6b: 'dmul', 50 | 0x77: 'dneg', 51 | 0x73: 'drem', 52 | 0xaf: 'dreturn', 53 | 0x39: 'dstore', 54 | 0x47: 'dstore_0', 55 | 0x48: 'dstore_1', 56 | 0x49: 'dstore_2', 57 | 0x4a: 'dstore_3', 58 | 0x67: 'dsub', 59 | 0x59: 'dup', 60 | 0x5a: 'dup_x1', 61 | 0x5b: 'dup_x2', 62 | 0x5c: 'dup2', 63 | 0x5d: 'dup2_x1', 64 | 0x5e: 'dup2_x2', 65 | 0x8d: 'f2d', 66 | 0x8b: 'f2i', 67 | 0x8c: 'f2l', 68 | 0x62: 'fadd', 69 | 0x30: 'faload', 70 | 0x51: 'fastore', 71 | 0x96: 'fcmpg', 72 | 0x95: 'fcmpl', 73 | 0x0b: 'fconst_0', 74 | 0x0c: 'fconst_1', 75 | 0x0d: 'fconst_2', 76 | 0x6e: 'fdiv', 77 | 0x17: 'fload', 78 | 0x22: 'fload_0', 79 | 0x23: 'fload_1', 80 | 0x24: 'fload_2', 81 | 0x25: 'fload_3', 82 | 0x6a: 'fmul', 83 | 0x76: 'fneg', 84 | 0x72: 'frem', 85 | 0xae: 'freturn', 86 | 0x38: 'fstore', 87 | 0x43: 'fstore_0', 88 | 0x44: 'fstore_1', 89 | 0x45: 'fstore_2', 90 | 0x46: 'fstore_3', 91 | 0x66: 'fsub', 92 | 0xb4: 'getfield', 93 | 0xb2: 'getstatic', 94 | 0xa7: 'goto', 95 | 0xc8: 'goto_w', 96 | 0x91: 'i2b', 97 | 0x92: 'i2c', 98 | 0x87: 'i2d', 99 | 0x86: 'i2f', 100 | 0x85: 'i2l', 101 | 0x93: 'i2s', 102 | 0x60: 'iadd', 103 | 0x2e: 'iaload', 104 | 0x7e: 'iand', 105 | 0x4f: 'iastore', 106 | 0x02: 'iconst_m1', 107 | 0x03: 'iconst_0', 108 | 0x04: 'iconst_1', 109 | 0x05: 'iconst_2', 110 | 0x06: 'iconst_3', 111 | 0x07: 'iconst_4', 112 | 0x08: 'iconst_5', 113 | 0x6c: 'idiv', 114 | 0xa5: 'if_acmpeq', 115 | 0xa6: 'if_acmpne', 116 | 0x9f: 'if_icmpeq', 117 | 0xa0: 'if_icmpne', 118 | 0xa1: 'if_icmplt', 119 | 0xa2: 'if_icmpge', 120 | 0xa3: 'if_icmpgt', 121 | 0xa4: 'if_icmple', 122 | 0x99: 'ifeq', 123 | 0x9a: 'ifne', 124 | 0x9b: 'iflt', 125 | 0x9c: 'ifge', 126 | 0x9d: 'ifgt', 127 | 0x9e: 'ifle', 128 | 0xc7: 'ifnonnull', 129 | 0xc6: 'ifnull', 130 | 0x84: 'iinc', 131 | 0x15: 'iload', 132 | 0x1a: 'iload_0', 133 | 0x1b: 'iload_1', 134 | 0x1c: 'iload_2', 135 | 0x1d: 'iload_3', 136 | 0x68: 'imul', 137 | 0x74: 'ineg', 138 | 0xc1: 'instanceof', 139 | 0xba: 'invokedynamic', 140 | 0xb9: 'invokeinterface', 141 | 0xb7: 'invokespecial', 142 | 0xb8: 'invokestatic', 143 | 0xb6: 'invokevirtual', 144 | 0x80: 'ior', 145 | 0x70: 'irem', 146 | 0xac: 'ireturn', 147 | 0x78: 'ishl', 148 | 0x7a: 'ishr', 149 | 0x36: 'istore', 150 | 0x3b: 'istore_0', 151 | 0x3c: 'istore_1', 152 | 0x3d: 'istore_2', 153 | 0x3e: 'istore_3', 154 | 0x64: 'isub', 155 | 0x7c: 'iushr', 156 | 0x82: 'ixor', 157 | 0xa8: 'jsr', 158 | 0xc9: 'jsr_w', 159 | 0x8a: 'l2d', 160 | 0x89: 'l2f', 161 | 0x88: 'l2i', 162 | 0x61: 'ladd', 163 | 0x2f: 'laload', 164 | 0x7f: 'land', 165 | 0x50: 'lastore', 166 | 0x94: 'lcmp', 167 | 0x09: 'lconst_0', 168 | 0x0a: 'lconst_1', 169 | 0x12: 'ldc', 170 | 0x13: 'ldc_w', 171 | 0x14: 'ldc2_w', 172 | 0x6d: 'ldiv', 173 | 0x16: 'lload', 174 | 0x1e: 'lload_0', 175 | 0x1f: 'lload_1', 176 | 0x20: 'lload_2', 177 | 0x21: 'lload_3', 178 | 0x69: 'lmul', 179 | 0x75: 'lneg', 180 | 0xab: 'lookupswitch', 181 | 0x81: 'lor', 182 | 0x71: 'lrem', 183 | 0xad: 'lreturn', 184 | 0x79: 'lshl', 185 | 0x7b: 'lshr', 186 | 0x37: 'lstore', 187 | 0x3f: 'lstore_0', 188 | 0x40: 'lstore_1', 189 | 0x41: 'lstore_2', 190 | 0x42: 'lstore_3', 191 | 0x65: 'lsub', 192 | 0x7d: 'lushr', 193 | 0x83: 'lxor', 194 | 0xc2: 'monitorenter', 195 | 0xc3: 'monitorexit', 196 | 0xc5: 'multianewarray', 197 | 0xbb: 'new', 198 | 0xbc: 'newarray', 199 | 0x00: 'nop', 200 | 0x57: 'pop', 201 | 0x58: 'pop2', 202 | 0xb5: 'putfield', 203 | 0xb3: 'putstatic', 204 | 0xa9: 'ret', 205 | 0xb1: 'return', 206 | 0x35: 'saload', 207 | 0x56: 'sastore', 208 | 0x11: 'sipush', 209 | 0x5f: 'swap', 210 | 0xaa: 'tableswitch', 211 | 0xc4: 'wide', 212 | 0xca: 'breakpoint', 213 | 0xfe: 'impdep1', 214 | 0xff: 'impdep2', 215 | } 216 | 217 | def _sbint16(x): return struct.unpack('>h', x)[0] 218 | def _ubint16(x): return struct.unpack('>H', x)[0] 219 | def _sbint32(x): return struct.unpack('>i', x)[0] 220 | 221 | # name to opcode table 222 | _names = dict((v if type(v) == str else v[0], k) for k, v in _table.items()) 223 | 224 | # opcodes which are valid for the "wide" instruction with length 3 225 | _wide_opcodes = sorted(_names[x] for x in ('iload', 'fload', 'aload', 'lload', 226 | 'dload', 'istore', 'fstore', 'astore', 'lstore', 'dstore', 'ret')) 227 | 228 | # opcode which is valid for the "wide" instruction with length 5 229 | _wide_inc = _names['iinc'] 230 | 231 | # opcodes which have a two-byte index into the constant pool (and no further 232 | # arguments) 233 | _index_opcodes = sorted(_names[x] for x in ('anewarray', 'checkcast', 234 | 'getfield', 'getstatic', 'instanceof', 'invokespecial', 'invokestatic', 235 | 'invokevirtual', 'ldc_w', 'ldc2_w', 'new', 'putfield', 'putstatic')) 236 | 237 | # all opcodes that take two branch bytes; "goto", "jsr", and all opcodes 238 | # starting with "if" 239 | _branch_opcodes = sorted(_names[x] for x in _names if x[:2] == 'if' or 240 | x == 'goto' or x == 'jsr') 241 | 242 | _primitive_types = { 243 | 10: 'int', 244 | 8: 'byte', 245 | 11: 'long', 246 | 7: 'double', 247 | 6: 'float', 248 | 5: 'char', 249 | 9: 'short', 250 | } 251 | 252 | _other_opcodes = { 253 | 'bipush': lambda ch, d, o: Instruction(name=_table[ch], length=2, 254 | value=ord(d[o+1]), rep='%s %d' % (_table[ch], ord(d[o+1]))), 255 | 'sipush': lambda ch, d, o: Instruction(name=_table[ch], length=3, 256 | value=_sbint16(d[o+1:o+3]), rep='%s %d' % (_table[ch], 257 | _sbint16(d[o+1:o+3]))), 258 | 'lookupswitch': lambda ch, d, o: None, 259 | 'tableswitch': lambda ch, d, o: None, 260 | 'newarray': lambda ch, d, o: Instruction(name=_table[ch], length=2, 261 | value=ord(d[o+1]), rep='%s %s' % (_table[ch], 262 | _primitive_types[ord(d[o+1])])), 263 | 'goto_w': lambda ch, d, o: Instruction(name=_table[ch], length=5, 264 | value=_sbint32(d[o+1:o+5]), rep='%s %d' % (_table[ch], 265 | _sbint32(d[o+1:o+5]))), 266 | 'invokedynamic': lambda ch, d, o: None, 267 | 'invokeinterface': lambda ch, d, o: None, 268 | 'jsr_w': lambda ch, d, o: Instruction(name=_table[ch], length=5, 269 | value=_sbint32(d[o+1:o+5]), rep='%s %d' % (_table[ch], 270 | _sbint32(d[o+1:o+5]))), 271 | 'multianewarray': lambda ch, d, o: Instruction(name=_table[ch], length=4, 272 | cp=_ubint16(d[o+1:o+3]), value=ord(d[o+3]), rep='%s #%d %d' % ( 273 | _table[ch], _ubint16(d[o+1:o+3]), ord(d[o+3]))), 274 | 'ldc': lambda ch, d, o: Instruction(name=_table[ch], length=2, 275 | cp=ord(d[o+1]), rep='%s #%d' % (_table[ch], ord(d[o+1]))), 276 | } 277 | 278 | # convert the opcode names of _other_opcodes into opcode indices 279 | _other_opcodes = dict((_names[k], v) for k, v in _other_opcodes.items()) 280 | 281 | class Instruction: 282 | def __init__(self, name=None, cp=None, local=None, length=None, 283 | value=None, rep=None): 284 | self.name = name 285 | self.cp = cp 286 | self.local = local 287 | self.length = length 288 | self.value = value 289 | self.rep = rep 290 | 291 | def __str__(self): 292 | return self.rep or self.name 293 | 294 | def __repr__(self): 295 | ret = ['name="%s"' % self.name] 296 | if self.cp: 297 | ret += ['cp=%s' % self.cp] 298 | if self.local: 299 | ret += ['local=%d' % self.local] 300 | if self.length: 301 | ret += ['length=%d' % self.length] 302 | if self.value: 303 | ret += ['value="%s"' % self.value] 304 | if self.rep: 305 | ret += ['rep="%s"' % self.rep] 306 | return 'Instruction(%s)' % ', '.join(ret) 307 | 308 | def disassemble(data, offset=0): 309 | # opcode 310 | ch = ord(data[offset]) 311 | 312 | # the "wide" instruction 313 | if ch == 0xc4: 314 | ch2 = ord(data[offset+1]) 315 | 316 | if ch2 == _wide_inc: 317 | idx = _ubint16(data[offset+2:offset+4]) 318 | val = _sbint16(data[offset+4:offset+6]) 319 | return Instruction(name=_table[ch2], local=idx, length=6, 320 | value=val, rep='%s v%d, %d' % (_table[ch2], idx, val)) 321 | elif ch2 in _wide_opcodes: 322 | idx = _ubint16(data[offset+2:offset+4]) 323 | return Instruction(name=_table[ch2], local=idx, length=4, 324 | rep='%s v%d' % (_table[ch2], idx)) 325 | else: 326 | return None 327 | 328 | # if the opcode is in _wide_opcodes then it loads or stores a local 329 | if ch in _wide_opcodes: 330 | return Instruction(name=_table[ch], length=2, 331 | local=ord(data[offset+1]), rep='%s v%d' % (_table[ch], 332 | ord(data[offset+1]))) 333 | 334 | # instructions which only have an index into the constant pool as argument 335 | if ch in _index_opcodes: 336 | return Instruction(name=_table[ch], length=3, 337 | cp=_ubint16(data[offset+1:offset+3]), 338 | rep='%s #%d' % (_table[ch], 339 | _ubint16(data[offset+1:offset+3]))) 340 | 341 | # branch instructions that take a two-byte branch offset 342 | if ch in _branch_opcodes: 343 | return Instruction(name=_table[ch], length=3, 344 | value=_sbint16(data[offset+1:offset+3]), 345 | rep='%s %d' % (_table[ch], 346 | _sbint16(data[offset+1:offset+3]))) 347 | 348 | # other opcodes which have to be handled independently 349 | if ch in _other_opcodes: 350 | return _other_opcodes[ch](ch, data, offset) 351 | 352 | # if the entry in the table is a string, then it's an instruction without 353 | # anything special, so we can simply return it 354 | if ch in _table: 355 | return Instruction(name=_table[ch], length=1) 356 | 357 | # unknown opcode 358 | return None 359 | -------------------------------------------------------------------------------- /tests.py: -------------------------------------------------------------------------------- 1 | 2 | """ 3 | 4 | Unittests that verify the integrity of pyasm2. 5 | 6 | """ 7 | 8 | from pyasm2.x86 import * 9 | import unittest 10 | 11 | class CheckSyntax(unittest.TestCase): 12 | def test_syntax(self): 13 | eq = self.assertEqual 14 | ra = self.assertRaises 15 | 16 | eq(str(dword[eax]), 'dword [eax]') 17 | eq(str(byte[eax+eax*4]), 'byte [eax+eax*4]') 18 | eq(str(word[0xdeadf00d+8*esi+esp]), 'word [esp+esi*8+0xdeadf00d]') 19 | eq(str(eax+esi), '[eax+esi]') 20 | eq(str(eax+ecx*1), '[ecx+eax]') 21 | eq(str(dword[0x00112233]), 'dword [0x112233]') 22 | ra(AssertionError, lambda: eax+eax+eax) 23 | ra(AssertionError, lambda: esp*8) 24 | eq(0xb00b+ebp*8+ebx, ebx+ebp*8+0xb00b) 25 | ra(AssertionError, lambda: eax+0x111223344) 26 | #eq(str(dword[cs:eax+ebx]), 'dword [cs:eax+ebx]') 27 | eq(dword[cs:0x13371337], dword[cs:0x13371337]) 28 | #eq(str(dword[cs:0xdeadf00d]), 'dword [cs:0xdeadf00d]') 29 | eq(dword[eax-0x1000], dword[eax+0xfffff000]) 30 | 31 | def test_modrm(self): 32 | eq = self.assertEqual 33 | m = Instruction().modrm 34 | 35 | eq(m(eax, dword[eax]), '\x00') 36 | eq(m(ecx, dword[ebx]), m(dword[ebx], ecx)) 37 | eq(m(esi, dword[esp+ebp*8+0x11223344]), '\xb4\xec\x44\x33\x22\x11') 38 | eq(m(eax, dword[ebp]), '\x45\x00') 39 | eq(m(edi, dword[esp]), '\x3c\x24') 40 | eq(m(dword[esi+eax], ebx), '\x1c\x06') 41 | eq(m(esi, dword[edi]), '\x37') 42 | eq(m(ecx, dword[edx+ebp+0xdeadf00d]), '\x8c\x2a\x0d\xf0\xad\xde') 43 | eq(m(edi, dword[esi*8]), '\x3c\xf5\x00\x00\x00\x00') 44 | eq(m(edx, dword[ebp+eax*4]), '\x54\x85\x00') 45 | eq(m(eax, dword[eax+0x7f]), '\x40\x7f') 46 | eq(m(eax, dword[eax+0x80]), '\x80\x80\x00\x00\x00') 47 | eq(m(eax, dword[eax-0x80]), '\x40\x80') 48 | eq(m(eax, dword[eax-0x81]), '\x80\x7f\xff\xff\xff') 49 | eq(m(eax, dword[eax-2]), '\x40\xfe') 50 | eq(m(eax, dword[eax+0x40]), '\x40\x40') 51 | eq(m(eax, ebx), '\xc3') 52 | eq(m(esi, edi), '\xf7') 53 | 54 | def test_pack(self): 55 | eq = self.assertEqual 56 | 57 | eq(byte.pack(1), '\x01') 58 | eq(word.pack(1), '\x01\x00') 59 | eq(dword.pack(1), '\x01\x00\x00\x00') 60 | eq(qword.pack(1), '\x01\x00\x00\x00\x00\x00\x00\x00') 61 | 62 | def test_instructions(self): 63 | eq = lambda i, s, b: (self.assertEqual(repr(i), s, 64 | 'Invalid string representation for: ' + repr(i)), 65 | self.assertEqual(str(i), b, 'Invalid encoding for: ' + 66 | repr(i) + ' -> ' + repr(str(i)))) 67 | ra = self.assertRaises 68 | 69 | eq(retn(), 'retn', '\xc3') 70 | eq(nop(), 'nop', '\x90') 71 | eq(retn(0x80), 'retn 0x80', '\xc2\x80\x00') 72 | 73 | eq(mov(eax, 0xdeadf00d), 'mov eax, 0xdeadf00d', '\xb8\x0d\xf0\xad\xde') 74 | eq(mov(esi, 0x11223344), 'mov esi, 0x11223344', '\xbe\x44\x33\x22\x11') 75 | eq(mov(edi, dword [esp+ebx*4+0x0c]), 'mov edi, dword [esp+ebx*4+0xc]', 76 | '\x8b\x7c\x9c\x0c') 77 | eq(mov(dword[ebp+0x30], ecx), 'mov dword [ebp+0x30], ecx', 78 | '\x89\x4d\x30') 79 | 80 | eq(push(ebx), 'push ebx', '\x53') 81 | eq(xchg(ebp, eax), 'xchg ebp, eax', '\x95') 82 | eq(push(edi), 'push edi', '\x57') 83 | eq(pop(ebx), 'pop ebx', '\x5b') 84 | eq(inc(edx), 'inc edx', '\x42') 85 | eq(dec(esi), 'dec esi', '\x4e') 86 | 87 | eq(test(ecx, ecx), 'test ecx, ecx', '\x85\xc9') 88 | eq(xchg(esi, esp), 'xchg esi, esp', '\x87\xe6') 89 | 90 | eq(pshufd(xmm4, oword[edx], 0x11), 'pshufd xmm4, oword [edx], 0x11', 91 | '\x66\x0f\x70\x22\x11') 92 | eq(pshufd(xmm2, xmm0, 0x40), 'pshufd xmm2, xmm0, 0x40', 93 | '\x66\x0f\x70\xd0\x40') 94 | 95 | eq(paddd(xmm2, xmm5), 'paddd xmm2, xmm5', '\x66\x0f\xfe\xd5') 96 | 97 | ra(Exception, lambda: paddd(xmm0, eax)) 98 | ra(Exception, lambda: mov(eax, xmm1)) 99 | ra(Exception, lambda: mov(eax, byte[ebx])) 100 | 101 | eq(inc(ecx, lock=True), 'lock inc ecx', '\xf0\x41') 102 | eq(stosd(rep=True), 'rep stosd', '\xf3\xab') 103 | eq(scasb(repne=True), 'repne scasb', '\xf2\xae') 104 | 105 | eq(lea(eax, [esp+eax*2+0x42]), 'lea eax, [esp+eax*2+0x42]', 106 | '\x8d\x44\x44\x42') 107 | 108 | eq(mov(dword[ebx+0x44332211], 0x88776655), 109 | 'mov dword [ebx+0x44332211], 0x88776655', '\xc7\x83' + ''.join( 110 | map(chr, range(0x11, 0x99, 0x11)))) 111 | 112 | eq(movss(xmm6, xmm3), 'movss xmm6, xmm3', '\xf3\x0f\x10\xf3') 113 | eq(movd(xmm7, edi), 'movd xmm7, edi', '\x66\x0f\x6e\xff') 114 | eq(pand(xmm4, oword [ecx]), 'pand xmm4, oword [ecx]', 115 | '\x66\x0f\xdb\x21') 116 | eq(movapd(xmm6, oword [ebx]), 'movapd xmm6, oword [ebx]', 117 | '\x66\x0f\x28\x33') 118 | 119 | eq(add(byte[eax], 0x42), 'add byte [eax], 0x42', '\x80\x00\x42') 120 | eq(cmp_(dword[esp+ecx*8+0x0c], 0x42), 121 | 'cmp dword [esp+ecx*8+0xc], 0x42', '\x83\x7c\xcc\x0c\x42') 122 | eq(cmp_(byte[ebx], 0x13), 'cmp byte [ebx], 0x13', '\x80\x3b\x13') 123 | eq(mov(byte[ecx], 0x37), 'mov byte [ecx], 0x37', '\xc6\x01\x37') 124 | eq(add(eax, 1), 'add eax, 0x1', '\x83\xc0\x01') 125 | eq(mov(bl, 1), 'mov bl, 0x1', '\xb3\x01') 126 | eq(add(eax, 0x1111), 'add eax, 0x1111', '\x05\x11\x11\x00\x00') 127 | eq(add(ebx, 0x2222), 'add ebx, 0x2222', '\x81\xc3\x22\x22\x00\x00') 128 | eq(push(es), 'push es', '\x06') 129 | eq(push(0x42), 'push 0x42', '\x6a\x42') 130 | eq(push(0x111), 'push 0x111', '\x68\x11\x01\x00\x00') 131 | eq(push(dword[2]), 'push dword [0x2]', '\xff\x35\x02\x00\x00\x00') 132 | eq(push(dword[esp+edx*2]), 'push dword [esp+edx*2]', '\xff\x34\x54') 133 | eq(pop(eax), 'pop eax', '\x58') 134 | eq(pop(dword[edx]), 'pop dword [edx]', '\x8f\x02') 135 | eq(pop(dword[6]), 'pop dword [0x6]', '\x8f\x05\x06\x00\x00\x00') 136 | eq(pop(ss), 'pop ss', '\x17') 137 | eq(rol(ebx, 1), 'rol ebx, 0x1', '\xd1\xc3') 138 | eq(rol(ebx, 2), 'rol ebx, 0x2', '\xc1\xc3\x02') 139 | eq(rol(edx, cl), 'rol edx, cl', '\xd3\xc2') 140 | eq(xor(edx, esi), 'xor edx, esi', '\x31\xf2') 141 | eq(shl(esi, 4), 'shl esi, 0x4', '\xc1\xe6\x04') 142 | eq(sal(esi, 4), 'sal esi, 0x4', '\xc1\xe6\x04') 143 | eq(xchg(byte[esp+0x42], al), 'xchg byte [esp+0x42], al', 144 | '\x86\x44\x24\x42') 145 | eq(xchg(al, byte[esp+0x42]), 'xchg byte [esp+0x42], al', 146 | '\x86\x44\x24\x42') 147 | eq(div(eax), 'div eax', '\xf7\xf0') 148 | eq(movzx(eax, byte [1]), 'movzx eax, byte [0x1]', 149 | '\x0f\xb6\x05\x01\x00\x00\x00') 150 | eq(movsx(eax, al), 'movsx eax, al', '\x0f\xbe\xc0') 151 | 152 | eq(add(ecx, 0xff), 'add ecx, 0xff', '\x81\xc1\xff\x00\x00\x00') 153 | eq(add(ecx, -0x1), 'add ecx, -0x1', '\x83\xc1\xff') 154 | eq(imul(eax, ecx, 0xff), 'imul eax, ecx, 0xff', '\x69\xc1\xff\x00\x00\x00') 155 | eq(imul(eax, ecx, -0x1), 'imul eax, ecx, -0x1', '\x6b\xc1\xff') 156 | eq(cmp_(eax, 0xff), 'cmp eax, 0xff', '\x3d\xff\x00\x00\x00') 157 | eq(cmp_(eax, -0x1), 'cmp eax, -0x1', '\x83\xf8\xff') 158 | 159 | def test_block(self): 160 | eq = lambda i, s, b: (self.assertEqual(repr(i), s, 161 | 'Invalid string representation for: ' + repr(i)), 162 | self.assertEqual(str(i), b, 'Invalid encoding for: ' + 163 | str(i) + ' -> ' + repr(str(i)))) 164 | 165 | eq2 = lambda i, s, b: (self.assertEqual(repr(i), s, 166 | 'Invalid string representation for: ' + repr(i)), 167 | self.assertEqual(i.assemble(), b, 'Invalid encoding for: ' + 168 | repr(b) + ' -> ' + repr(i.assemble()))) 169 | 170 | eq(block(mov(eax, 1), mov(ebx, 1)), 'mov eax, 0x1\nmov ebx, 0x1\n', 171 | '\xb8\x01\x00\x00\x00\xbb\x01\x00\x00\x00') 172 | 173 | b = block(mov(eax, ebx)) 174 | b += mov(ecx, edx) 175 | eq(b, 'mov eax, ebx\nmov ecx, edx\n', '\x8b\xc3\x8b\xca') 176 | 177 | c = block(mov(esi, dword[eax]), scasb(rep=True)) 178 | eq(c, 'mov esi, dword [eax]\nrep scasb\n', '\x8b\x30\xf3\xae') 179 | 180 | b += c 181 | b_s = 'mov eax, ebx\nmov ecx, edx\nmov esi, dword [eax]\nrep scasb\n' 182 | b_e = '\x8b\xc3\x8b\xca\x8b\x30\xf3\xae' 183 | eq(b, b_s, b_e) 184 | 185 | d = block(xor(eax, eax), lbl, inc(eax), cmp_(eax, 0x10), jnz(lbl(-1))) 186 | eq2(d, 'xor eax, eax\n__lbl_0:\ninc eax\ncmp eax, 0x10\njnz __lbl_0\n', 187 | '\x31\xc0\x40\x83\xf8\x10\x0f\x85\xf6\xff\xff\xff') 188 | 189 | # blocks allow instructions / labels without actually creating an 190 | # instance if that's not required, e.g. instructions that don't take 191 | # any operators 192 | eq2(block(jmp(lbl(1)), nop, lbl, retn), 193 | 'jmp __lbl_0\nnop\n__lbl_0:\nretn\n', 194 | '\xe9\x01\x00\x00\x00\x90\xc3') 195 | 196 | # simulation of jmp(lbl(0)) 197 | eq2(block(lbl, jmp(lbl(-1))), '__lbl_0:\njmp __lbl_0\n', 198 | '\xe9\xfb\xff\xff\xff') 199 | 200 | # partially unrolling a useless loop, to show "merging" of blocks. 201 | e_init = block(xor(ebx, ebx), mov(ecx, 0x40)) 202 | e_init_s = 'xor ebx, ebx\nmov ecx, 0x40\n' 203 | e_init_e = '\x31\xdb\xb9\x40\x00\x00\x00' 204 | eq2(e_init, e_init_s, e_init_e) 205 | eq2(block(e_init), e_init_s, e_init_e) 206 | 207 | e_end = block(mov(eax, dword[esp+8]), retn) 208 | e_end_s = 'mov eax, dword [esp+0x8]\nretn\n' 209 | e_end_e = '\x8b\x44\x24\x08\xc3' 210 | eq2(e_end, e_end_s, e_end_e) 211 | 212 | eq2(block(e_init, b, b, b, b, e_end), e_init_s + b_s * 4 + e_end_s, 213 | e_init_e + b_e * 4 + e_end_e) 214 | 215 | # global named labels 216 | eq2(block(lbl('loop'), inc(eax), jmp(lbl('loop'))), 217 | '__lbl_loop:\ninc eax\njmp __lbl_loop\n', 218 | '\x40\xe9\xfa\xff\xff\xff') 219 | 220 | # local named labels 221 | block.block_id = 0 222 | eq2(block('loop', inc(eax), inc(ebx), jmp('loop')), 223 | '__lbl_1_loop:\ninc eax\ninc ebx\njmp __lbl_1_loop\n', 224 | '\x40\x43\xe9\xf9\xff\xff\xff') 225 | 226 | # tweaked anonymous label references 227 | block.block_id = 0 228 | eq2(block(lbl, inc(dword[esp]), jmp(lbl-1)), 229 | '__lbl_0:\ninc dword [esp]\njmp __lbl_0\n', 230 | '\xff\x04\x24\xe9\xf8\xff\xff\xff') 231 | 232 | # temporary blocks as lists 233 | a = block(mov(eax, ebx), mov(ebx, 42)) 234 | eq2(a + [xor(ecx, ecx), retn], 235 | 'mov eax, ebx\nmov ebx, 0x2a\nxor ecx, ecx\nretn\n', 236 | '\x8b\xc3\xbb\x2a\x00\x00\x00\x31\xc9\xc3') 237 | 238 | # combining instructions directly 239 | eq2(mov(eax, ebx) + mov(ebx, 42), 'mov eax, ebx\nmov ebx, 0x2a\n', 240 | '\x8b\xc3\xbb\x2a\x00\x00\x00') 241 | 242 | # merging blocks with relative jumps 243 | #eq(block(d, d, d), 'xor eax, eax\n__lbl_0:\ninc eax\ncmp eax, 0x10\n' + 244 | # 'jnz __lbl_0\nxor eax, eax\n__lbl_1:\ninc eax\ncmp eax, 0x10\n' + 245 | # 'jnz __lbl_1\nxor eax, eax\n__lbl_2:\ninc eax\ncmp eax, 0x10\n' + 246 | # 'jnz __lbl_2', 247 | # '\x31\xc0\x40\x83\xf8\x10\x0f\x85\xf6\xff\xff\xff' * 3) 248 | 249 | def test_optimization(self): 250 | eq = lambda i, s, b: (self.assertEqual(repr(i), s, 251 | 'Invalid string representation for: ' + repr(i)), 252 | self.assertEqual(str(i), b, 'Invalid encoding for: ' + 253 | str(i) + ' -> ' + repr(str(i)))) 254 | 255 | # [ebx*2] -> [ebx+ebx] 256 | eq(mov(eax, dword[ebx*2+3]), 'mov eax, dword [ebx+ebx+0x3]', 257 | '\x8b\x44\x1b\x03') 258 | 259 | if __name__ == '__main__': 260 | unittest.main(verbosity=2) 261 | -------------------------------------------------------------------------------- /x86.py: -------------------------------------------------------------------------------- 1 | """ 2 | 3 | pyasm2 - x86 assembler library (C) 2012 Jurriaan Bremer 4 | 5 | Although its called pyasm2, this is not per se a successor of Pyasm or pyASM. 6 | pyasm2 aims to be as flexible as possible, it will support x86, SSE and SSE2. 7 | 8 | A key feature of pyasm2 is the ability to have blocks of instructions and 9 | being able to give the base address at a later time, that is, you don't need 10 | to know the address of instructions before-hand. For example, you can construct 11 | a series of instructions, request the size that will be needed in order to 12 | store all instructions as sequence, allocate this memory and write the 13 | instructions from there, this approach is very useful when making JIT 14 | compilers etc. 15 | 16 | The syntax of pyasm2 is supposed to be as simple as possible. 17 | 18 | """ 19 | import struct 20 | import types 21 | 22 | 23 | class Immediate: 24 | """Defines Immediates, immediates can also be used as addresses.""" 25 | def __init__(self, value=0, addr=False, signed=False): 26 | self.value = value 27 | self.addr = addr 28 | 29 | if signed: 30 | if -2**7 <= value < 2**7: 31 | self.size = byte.size 32 | elif -2**15 <= value < 2**15: 33 | self.size = word.size 34 | else: 35 | self.size = dword.size 36 | self.value = (2**31 + value) % 2**32 - 2**31 37 | else: 38 | if 0 <= value < 2**8: 39 | self.size = byte.size 40 | elif 0 <= value < 2**16: 41 | self.size = word.size 42 | else: 43 | self.size = dword.size 44 | self.value = value % 2**32 45 | 46 | def __int__(self): 47 | return self.value 48 | 49 | def __long__(self): 50 | return self.value 51 | 52 | def __cmp__(self, other): 53 | return self.value != int(other) 54 | 55 | def __str__(self): 56 | if self.value < 0: 57 | return '-0x%x' % -self.value 58 | else: 59 | return '0x%x' % self.value 60 | 61 | class SignedImmediate(Immediate): 62 | """Defines Signed Immediates.""" 63 | def __init__(self, value=0, addr=False): 64 | Immediate.__init__(self, value, addr, signed=True) 65 | 66 | 67 | class SegmentRegister: 68 | """Defines the Segment Registers.""" 69 | def __init__(self, index, name): 70 | self.index = index 71 | self.name = name 72 | 73 | def __str__(self): 74 | return self.name 75 | 76 | def __repr__(self): 77 | return self.name 78 | 79 | def __index__(self): 80 | return self.index 81 | 82 | # make an alias `imm' to Immediate in order to simplify the creation of 83 | # Instruction's 84 | imm = Immediate 85 | signed_imm = SignedImmediate 86 | 87 | # define each segment register. 88 | es = SegmentRegister(0, 'es') 89 | cs = SegmentRegister(1, 'cs') 90 | ss = SegmentRegister(2, 'ss') 91 | ds = SegmentRegister(3, 'ds') 92 | fs = SegmentRegister(4, 'fs') 93 | gs = SegmentRegister(5, 'gs') 94 | 95 | # array of segment registers, according to their index 96 | SegmentRegister.register = (es, cs, ss, ds, fs, gs) 97 | 98 | 99 | class MemoryAddress: 100 | def __init__(self, size=None, segment=None, reg1=None, reg2=None, 101 | mult=None, disp=None): 102 | """Create a new Memory Address.""" 103 | # check if a register is valid.. 104 | f = lambda x: x is None or isinstance(x, (gpr, xmm)) 105 | if not size: 106 | size = None 107 | assert size is None or size in (8, 16, 32, 64, 128) 108 | assert segment is None or isinstance(segment, SegmentRegister) 109 | f(reg1) 110 | f(reg2) 111 | assert mult is None or mult in (1, 2, 4, 8) 112 | assert disp is None or int(disp) >= 0 and int(disp) < 2**32 113 | 114 | self.size = size 115 | self.segment = segment 116 | self.reg1 = reg1 117 | self.reg2 = reg2 118 | self.mult = mult 119 | self.disp = Immediate(disp) if isinstance(disp, (int, long)) else disp 120 | 121 | self.clean() 122 | 123 | def clean(self): 124 | """Makes sure that the internal representation of the Memory Address 125 | is as easy as possible. 126 | 127 | For example, we don't want `esp' in `reg2' (and `esp' can't have a 128 | `mult' other than one. Neither do we want to have a `reg2' with `mult' 129 | 1 when `reg1' is None. 130 | 131 | Note that we can't use `esp' directly, because it's not initialized 132 | the first time(s) we call this function, therefore we use its index, 133 | which is 4. 134 | 135 | """ 136 | # `esp' can't have a multiplier other than one. 137 | if self.reg2 is not None: 138 | assert self.reg2.index != 4 or self.mult == 1 139 | 140 | # swap registers if `reg2' contains `esp' 141 | if self.reg2 is not None and self.reg2.index == 4: 142 | self.reg1, self.reg2 = self.reg2, self.reg1 143 | 144 | # store `reg2' as `reg1' if `reg1' is None and `mult' is one. 145 | if self.reg1 is None and self.mult == 1: 146 | self.reg1, self.reg2, self.mult = self.reg2, None, None 147 | 148 | return self 149 | 150 | def final_clean(self): 151 | """Special clean function to clean and/or optimize right before 152 | assembling this Memory Address. 153 | 154 | When `reg1' is None, `mult' is two and `reg2' is not `esp', then we 155 | can optimize it by using `reg1', ie [eax*2] -> [eax+eax]. 156 | 157 | """ 158 | if self.reg1 is None and self.mult == 2 and self.reg2 != esp: 159 | self.reg1, self.mult = self.reg2, 1 160 | 161 | def merge(self, other, add=True): 162 | """Merge self with a Displacement, Register or Memory Address.""" 163 | # it is not possible to merge with one of the predefined Memory 164 | # Addresses 165 | assert id(self) not in map(id, (byte, word, dword, qword, oword)) 166 | 167 | if isinstance(other, (int, long, Immediate)): 168 | assert int(other) >= 0 and int(other) < 2**32 169 | assert self.disp is None 170 | 171 | if add: 172 | self.disp = other 173 | else: 174 | self.disp = -other 175 | 176 | return self.clean() 177 | 178 | if isinstance(other, (GeneralPurposeRegister, XmmRegister)): 179 | assert self.reg1 is None or self.reg2 is None 180 | 181 | if self.reg1 is None: 182 | self.reg1 = other 183 | else: 184 | self.reg2 = other 185 | self.mult = 1 186 | 187 | return self.clean() 188 | 189 | if isinstance(other, MemoryAddress): 190 | assert self.size is None or other.size is None 191 | assert self.segment is None or other.segment is None 192 | assert self.disp is None or other.disp is None 193 | 194 | if self.size is None: 195 | self.size = other.size 196 | 197 | if self.segment is None: 198 | self.segment = other.segment 199 | 200 | reg1, reg2 = other.reg1, other.reg2 201 | 202 | if self.reg1 is None: 203 | if reg1 is not None: 204 | self.reg1, reg1 = reg1, None 205 | elif reg2 is not None and other.mult == 1: 206 | self.reg1, reg2 = reg2, None 207 | 208 | if self.reg2 is None: 209 | if reg1 is not None: 210 | self.reg2, self.mult, reg1 = reg1, 1, None 211 | elif reg2 is not None: 212 | self.reg2, self.mult, reg2 = reg2, other.mult, None 213 | 214 | assert reg1 is None and reg2 is None 215 | 216 | if self.disp is None: 217 | self.disp = other.disp 218 | 219 | return self.clean() 220 | 221 | raise Exception('Invalid Parameter') 222 | 223 | def __index__(self): 224 | """Encode a Memory Address as index. 225 | 226 | We have to be able to encode a Memory Address into an integer in 227 | order to use slices (which we do for instruction that use segment 228 | register.) 229 | 230 | Memory Layout is as following (displacement has to be the lower 32 231 | bits in the event that something like `dword [cs:0x401000]' is used.) 232 | 32 bits - displacement 233 | 4 bits - reg1 234 | 4 bits - reg2 235 | 3 bits - mult 236 | 237 | If the displacement is None, it will be encoded as 0, and will be 238 | decoded as None later. 239 | General Purpose Registers are encoded as their `index' increased with 240 | one, or 0 if None. 241 | Multiplication is encoded using a table, which can be found below. 242 | 243 | """ 244 | mults = {None: 0, 1: 1, 2: 2, 4: 3, 8: 4} 245 | # for encoding general purpose registers 246 | f = lambda x: x.index + 1 if x is not None else 0 247 | return \ 248 | (int(self.disp) if self.disp is not None else 0) + \ 249 | (f(self.reg1) << 32) + \ 250 | (f(self.reg2) << 36) + \ 251 | (mults[self.mult] << 40) 252 | 253 | def _decode_index(self, index): 254 | """Decodes a Memory Address encoded with __index__().""" 255 | mults = (None, 1, 2, 4, 8) 256 | # for decoding general purpose registers 257 | f = lambda x, y: y.register32[x-1] if x else None 258 | return MemoryAddress(disp=index % 2**32 if index % 2**32 else None, 259 | reg1=f((index >> 32) % 2**4, gpr), 260 | reg2=f((index >> 36) % 2**4, gpr), 261 | mult=mults[(index >> 40) % 2**3]) 262 | 263 | def __getitem__(self, key): 264 | """Item or Slice to this MemoryAddress size. 265 | 266 | A slice, represented as [segment:address], defines a segment register 267 | and an address, the address is a combination of Displacements and 268 | General Purpose Registers (optionally with multiplication.) 269 | 270 | An item, represented as [address], only defines an address. 271 | 272 | """ 273 | if isinstance(key, slice): 274 | ma = MemoryAddress(size=self.size, 275 | segment=SegmentRegister.register[key.start]) 276 | return ma.merge(self._decode_index(key.stop)) 277 | else: 278 | return MemoryAddress(size=self.size).merge(key) 279 | 280 | def __add__(self, other): 281 | """self + other""" 282 | return self.merge(other) 283 | 284 | def __radd__(self, other): 285 | """other + self""" 286 | return self.merge(other) 287 | 288 | def __sub__(self, other): 289 | """self - other""" 290 | return self.merge(other, add=False) 291 | 292 | def __rsub__(self, other): 293 | """other - self""" 294 | return self.merge(other, add=False) 295 | 296 | def __str__(self): 297 | """Representation of this Memory Address.""" 298 | sizes = {8: 'byte', 16: 'word', 32: 'dword', 64: 'qword', 128: 'oword'} 299 | s = '' 300 | if self.reg1 is not None: 301 | s += str(self.reg1) 302 | if self.reg2 is not None: 303 | q = str(self.reg2) if self.mult == 1 else \ 304 | str(self.reg2) + '*' + str(self.mult) 305 | s += q if not len(s) else '+' + q 306 | if self.disp: 307 | if self.disp >= 0: 308 | q = '0x%x' % int(self.disp) 309 | else: 310 | q = '-0x%x' % -int(self.disp) 311 | if not len(s) or q[0] == '-': 312 | s += q 313 | else: 314 | s += '+' + q 315 | if self.size is not None: 316 | if self.segment is not None: 317 | return '%s [%s:%s]' % (sizes[self.size], str(self.segment), s) 318 | else: 319 | return '%s [%s]' % (sizes[self.size], s) 320 | return '[%s]' % s if self.segment is None else \ 321 | '[%s:%s]' % (str(self.segment), s) 322 | 323 | def __repr__(self): 324 | """Representation of this Memory Address.""" 325 | return self.__str__() 326 | 327 | def __cmp__(self, other): 328 | """Check if two elements are the same, or not.""" 329 | return 0 if self.size == other.size and \ 330 | self.segment == other.segment and \ 331 | self.reg1 == other.reg1 and self.reg2 == other.reg2 and \ 332 | self.mult == other.mult and self.disp == other.disp else -1 333 | 334 | def pack(self, value): 335 | """Pack a value depending on the `size' of this Memory Address.""" 336 | assert self.size is not None 337 | 338 | fmt = {8: 'B', 16: 'H', 32: 'I', 64: 'Q'} 339 | 340 | # convert the value, if it's negative. 341 | value = int(value) if int(value) >= 0 else int(value) + 2**self.size 342 | 343 | return struct.pack(fmt[self.size], value) 344 | 345 | # define the size for the memory addresses 346 | byte = MemoryAddress(size=8) 347 | word = MemoryAddress(size=16) 348 | dword = MemoryAddress(size=32) 349 | qword = MemoryAddress(size=64) 350 | oword = MemoryAddress(size=128) 351 | 352 | # make an alias `mem' to MemoryAddress in order to simplify the creation of 353 | # Instruction's 354 | mem = MemoryAddress 355 | 356 | 357 | class GeneralPurposeRegister: 358 | """Defines the General Purpose Registers.""" 359 | def __init__(self, index, name, size): 360 | self.index = index 361 | self.name = name 362 | self.size = size.size 363 | 364 | def __add__(self, other): 365 | """self + other""" 366 | if isinstance(other, (int, long, Immediate)): 367 | return MemoryAddress(reg1=self, disp=other) 368 | if isinstance(other, GeneralPurposeRegister): 369 | return MemoryAddress(reg1=self, reg2=other, mult=1) 370 | if isinstance(other, MemoryAddress): 371 | return other.merge(self) 372 | raise Exception('Invalid Parameter') 373 | 374 | def __radd__(self, other): 375 | """other + self""" 376 | return self.__add__(other) 377 | 378 | def __sub__(self, other): 379 | """self - other""" 380 | return self.__add__(2**32 - other) 381 | 382 | def __mul__(self, other): 383 | """self * other""" 384 | return MemoryAddress(reg2=self, mult=other) 385 | 386 | def __rmul__(self, other): 387 | """other * self""" 388 | return MemoryAddress(reg2=self, mult=other) 389 | 390 | def __str__(self): 391 | return self.name 392 | 393 | def __repr__(self): 394 | return self.name 395 | 396 | def __index__(self): 397 | """Index of this register.""" 398 | return MemoryAddress(reg1=self).__index__() 399 | 400 | # define the general purpose registers 401 | al = GeneralPurposeRegister(0, 'al', byte) 402 | cl = GeneralPurposeRegister(1, 'cl', byte) 403 | dl = GeneralPurposeRegister(2, 'dl', byte) 404 | bl = GeneralPurposeRegister(3, 'bl', byte) 405 | ah = GeneralPurposeRegister(4, 'ah', byte) 406 | ch = GeneralPurposeRegister(5, 'ch', byte) 407 | dh = GeneralPurposeRegister(6, 'dh', byte) 408 | bh = GeneralPurposeRegister(7, 'bh', byte) 409 | 410 | ax = GeneralPurposeRegister(0, 'ax', word) 411 | cx = GeneralPurposeRegister(1, 'cx', word) 412 | dx = GeneralPurposeRegister(2, 'dx', word) 413 | bx = GeneralPurposeRegister(3, 'bx', word) 414 | sp = GeneralPurposeRegister(4, 'sp', word) 415 | bp = GeneralPurposeRegister(5, 'bp', word) 416 | si = GeneralPurposeRegister(6, 'si', word) 417 | di = GeneralPurposeRegister(7, 'di', word) 418 | 419 | eax = GeneralPurposeRegister(0, 'eax', dword) 420 | ecx = GeneralPurposeRegister(1, 'ecx', dword) 421 | edx = GeneralPurposeRegister(2, 'edx', dword) 422 | ebx = GeneralPurposeRegister(3, 'ebx', dword) 423 | esp = GeneralPurposeRegister(4, 'esp', dword) 424 | ebp = GeneralPurposeRegister(5, 'ebp', dword) 425 | esi = GeneralPurposeRegister(6, 'esi', dword) 426 | edi = GeneralPurposeRegister(7, 'edi', dword) 427 | 428 | # array of general purpose registers, according to their index 429 | GeneralPurposeRegister.register8 = (al, cl, dl, bl, ah, ch, dh, bh) 430 | GeneralPurposeRegister.register16 = (ax, cx, dx, bx, sp, bp, si, di) 431 | GeneralPurposeRegister.register32 = (eax, ecx, edx, ebx, esp, ebp, esi, edi) 432 | 433 | # make an alias `gpr' to GeneralPurposeRegister in order to simplify the 434 | # creation of Instruction's 435 | gpr = GeneralPurposeRegister 436 | 437 | 438 | class XmmRegister: 439 | """Defines the Xmm Registers, registers used for the SSE instructions.""" 440 | def __init__(self, index, name): 441 | self.index = index 442 | self.name = name 443 | self.size = oword.size 444 | 445 | def __str__(self): 446 | return self.name 447 | 448 | def __repr__(self): 449 | return self.name 450 | 451 | xmm0 = XmmRegister(0, 'xmm0') 452 | xmm1 = XmmRegister(1, 'xmm1') 453 | xmm2 = XmmRegister(2, 'xmm2') 454 | xmm3 = XmmRegister(3, 'xmm3') 455 | xmm4 = XmmRegister(4, 'xmm4') 456 | xmm5 = XmmRegister(5, 'xmm5') 457 | xmm6 = XmmRegister(6, 'xmm6') 458 | xmm7 = XmmRegister(7, 'xmm7') 459 | 460 | # make an alias `xmm' to XmmRegister in order to simplify the creation of 461 | # Instruction's 462 | xmm = XmmRegister 463 | 464 | 465 | class MemoryGeneralPurposeRegister(MemoryAddress, GeneralPurposeRegister): 466 | """A combination of MemoryAddress and GeneralPurposeRegister, 467 | useful for modrm encoding etc.""" 468 | pass 469 | 470 | # a combination of operand types that can be used in modrm bytes. 471 | memgpr = MemoryGeneralPurposeRegister 472 | 473 | 474 | class MemoryXmmRegister(MemoryAddress, XmmRegister): 475 | """Combination of MemoryAddress and XmmRegister.""" 476 | pass 477 | 478 | memxmm = MemoryXmmRegister 479 | 480 | 481 | class Instruction: 482 | """Base class for every instruction. 483 | 484 | Instructions that don't take any operands place their opcode as integer or 485 | string in `_opcode_'. 486 | Instructions that have one or more (maximum of three) operands fill the 487 | `_enc_' table, one entry per encoding. The layout of this encoding is a 488 | list of tuples, with a layout like the following. 489 | 490 | (opcode, operand1, operand2, operand3) 491 | 492 | `opcode' is an integer or string representing the opcode of this encoding. 493 | `operand1', `operand2' and `operand3' are tuples defining the size and 494 | type of operand, `operand2' and `operand3' are obviously optional. If an 495 | operand is not a tuple, it defines a hardcoded operand. 496 | 497 | """ 498 | VALID_OPERANDS = (int, long, SegmentRegister, GeneralPurposeRegister, 499 | MemoryAddress, Immediate, XmmRegister, list) 500 | 501 | # we use a ctypes-like way to implement instructions. 502 | _opcode_ = None 503 | _enc_ = [] 504 | _name_ = None 505 | 506 | def __init__(self, operand1=None, operand2=None, operand3=None, 507 | lock=False, rep=False, repne=False): 508 | """Initialize a new Instruction object.""" 509 | assert operand1 is None or isinstance(operand1, self.VALID_OPERANDS) 510 | assert operand2 is None or isinstance(operand2, self.VALID_OPERANDS) 511 | assert operand3 is None or isinstance(operand3, self.VALID_OPERANDS) 512 | assert not isinstance(operand1, list) or len(operand1) == 1 513 | assert not isinstance(operand2, list) or len(operand2) == 1 514 | 515 | # convert int and long's to Immediate values. 516 | f = lambda x: x if not isinstance(x, (int, long)) else Immediate(x, signed=x < 0) 517 | # convert lists with one entry to Memory Addresses 518 | g = lambda x: x if not isinstance(x, list) else x[0] 519 | 520 | self.op1 = g(f(operand1)) 521 | self.op2 = g(f(operand2)) 522 | self.op3 = f(operand3) 523 | self.lock = lock 524 | self.rep = rep 525 | self.repne = repne 526 | 527 | # clean operands, if needed 528 | self.clean() 529 | 530 | # check the correct encoding for this combination of operands 531 | self.encoding() 532 | 533 | def clean(self): 534 | """Alters the order of operands if needed.""" 535 | 536 | # the `xchg' instruction requires operands ordered as `memgpr, gpr'. 537 | if isinstance(self, xchg) and isinstance(self.op1, gpr) and \ 538 | isinstance(self.op2, mem): 539 | self.op1, self.op2 = self.op2, self.op1 540 | 541 | if isinstance(self.op1, mem): 542 | self.op1.final_clean() 543 | 544 | if isinstance(self.op2, mem): 545 | self.op2.final_clean() 546 | 547 | def modrm(self, op1, op2): 548 | """Encode two operands into their modrm representation.""" 549 | # we make sure `op2' is always the Memory Address (if present at all) 550 | if isinstance(op1, MemoryAddress): 551 | op1, op2 = op2, op1 552 | 553 | # a brief explanation of variabele names in this function. 554 | # there is a modrm byte, which contains `reg', `mod' and `rm' and 555 | # there is a sib byte, which contains `S', `index' and `base'. 556 | # for more explanation on the encoding, see also: 557 | # http://sandpile.org/x86/opc_rm.htm for the modrm byte, and 558 | # http://sandpile.org/x86/opc_rm.htm for the sib byte. 559 | 560 | reg = op1.index 561 | 562 | buf = '' 563 | sib = False 564 | 565 | if isinstance(op2, (GeneralPurposeRegister, XmmRegister)): 566 | mod = 3 567 | rm = op2.index 568 | 569 | elif isinstance(op2, MemoryAddress): 570 | mults = {1: 0, 2: 1, 4: 2, 8: 3} 571 | if op2.reg1 is None: 572 | if op2.reg2 is None: 573 | # there should be atleast a displacement 574 | assert op2.disp is not None 575 | mod = 0 576 | rm = 5 577 | buf = struct.pack('I', op2.disp % 2**32) 578 | else: 579 | sib = True 580 | S = mults[op2.mult] 581 | index = op2.reg2.index 582 | mod = 0 583 | rm = 4 584 | # it's not possible to have a register with a 585 | # multiplication other than one without a 32bit 586 | # displacement. 587 | base = 5 588 | buf = struct.pack('I', op2.disp % 2**32 if op2.disp else 0) 589 | else: 590 | if op2.reg2 is None: 591 | # special case for `esp', since it requires the sib byte 592 | if op2.reg1.index == 4: 593 | sib = True 594 | base = 4 595 | index = 4 596 | S = 0 597 | rm = 4 598 | mod = 2 599 | # special case for `ebp', since it requires a displacement 600 | elif op2.reg1.index == 5: 601 | rm = 5 602 | mod = 3 603 | else: 604 | rm = op2.reg1.index 605 | mod = 2 606 | # special case for `esp', since it requires the sib byte 607 | elif op2.reg1.index == 4: 608 | sib = True 609 | base = 4 610 | index = op2.reg2.index 611 | S = mults[op2.mult] 612 | rm = 4 613 | mod = 2 614 | # special case for `ebp', since it requires a displacement 615 | elif op2.reg1.index == 5: 616 | sib = True 617 | index = op2.reg2.index 618 | S = mults[op2.mult] 619 | base = 5 620 | rm = 4 621 | mod = 3 622 | else: 623 | sib = True 624 | rm = 4 625 | base = op2.reg1.index 626 | index = op2.reg2.index 627 | S = mults[op2.mult] 628 | mod = 2 629 | 630 | # if `mod' is two here, then there can be either a 8bit, 32bit or 631 | # no displacement at all. when `mod' is three, there has to be 632 | # either a 8bit displacement or a 32bit one. 633 | if mod in (2, 3): 634 | if op2.disp is not None: 635 | disp = int(op2.disp) % 2**32 636 | if op2.disp is None: 637 | if mod == 3: 638 | mod = 1 639 | buf = '\x00' 640 | else: 641 | mod = 0 642 | elif disp >= 0 and disp < 0x80: 643 | mod = 1 644 | buf = chr(disp) 645 | elif disp >= 0xffffff80 and disp < 2**32: 646 | mod = 1 647 | buf = chr(disp & 0xff) 648 | else: 649 | mod = 2 650 | buf = struct.pack('I', disp) 651 | 652 | # construct the modrm byte 653 | ret = chr((mod << 6) + (reg << 3) + rm) 654 | if sib: 655 | # if required, construct the sib byte 656 | ret += chr((S << 6) + (index << 3) + base) 657 | # append the buf, if it contains anything. 658 | return ret + buf 659 | 660 | def encoding(self): 661 | """Returns the Encoding used, as defined by `_enc_'. 662 | 663 | If the instruction doesn't take any operands, None is returned. 664 | If the instruction takes one or more (maximum of three) operands and a 665 | match is found in `_enc_', the match is returned, otherwise an 666 | Exception is raised. 667 | 668 | """ 669 | if self.op1 is None: 670 | return None 671 | 672 | for enc in self._enc_: 673 | opcode, op1, op2, op3 = enc + (None,) * (4 - len(enc)) 674 | 675 | def operand_count(a, b, c): 676 | return (a is not None) + (b is not None) + (c is not None) 677 | 678 | # check if the amount of operands match. 679 | if operand_count(self.op1, self.op2, self.op3) != \ 680 | operand_count(op1, op2, op3): 681 | continue 682 | 683 | # if the encoding is not a tuple, then it's a hardcoded value. 684 | if not isinstance(op1, tuple): 685 | # check if the classes and objects match. 686 | if op1.__class__ != self.op1.__class__ or op1 != self.op1: 687 | continue 688 | # check the operand (and size) of this match 689 | elif not issubclass(op1[1], self.op1.__class__): 690 | continue 691 | elif hasattr(self.op1, 'size') and op1[0] is not None: 692 | if op1[1] in (imm, signed_imm): 693 | if op1[1](int(self.op1)).size > op1[0].size: 694 | continue 695 | else: 696 | if op1[0].size != self.op1.size: 697 | continue 698 | 699 | if op2 is None: 700 | self._encoding = (opcode, op1, op2, op3) 701 | return self._encoding 702 | 703 | # if the encoding is not a tuple, then it's a hardcoded value. 704 | if not isinstance(op2, tuple): 705 | # check if the classes and objects match. 706 | if op2.__class__ != self.op2.__class__ or op2 != self.op2: 707 | continue 708 | # check the operand (and size) of this match 709 | elif not issubclass(op2[1], self.op2.__class__): 710 | continue 711 | elif hasattr(self.op2, 'size') and op2[0] is not None: 712 | if op2[1] in (imm, signed_imm): 713 | if op2[1](int(self.op2)).size > op2[0].size: 714 | continue 715 | else: 716 | if op2[0].size != self.op2.size: 717 | continue 718 | 719 | if op3 is None: 720 | self._encoding = (opcode, op1, op2, op3) 721 | return self._encoding 722 | 723 | # check if the third operand matches (can only be an Immediate) 724 | if not issubclass(op3[1], self.op3.__class__): 725 | continue 726 | elif op3[1] in (imm, signed_imm) and \ 727 | op3[1](int(self.op3)).size > op3[0].size: 728 | continue 729 | 730 | # we found a matching encoding, return it. 731 | self._encoding = (opcode, op1, op2, op3) 732 | return self._encoding 733 | 734 | raise Exception('Unknown or Invalid Encoding') 735 | 736 | def name(self): 737 | """The name of this instruction.""" 738 | return self._name_ or self.__class__.__name__ 739 | 740 | def __repr__(self): 741 | """Representation of this Instruction.""" 742 | s = '' 743 | 744 | if self.lock: 745 | s += 'lock ' 746 | 747 | if self.repne: 748 | s += 'repne ' 749 | 750 | if self.rep: 751 | s += 'rep ' 752 | 753 | s += self.name() 754 | ops = filter(lambda x: x is not None, (self.op1, self.op2, self.op3)) 755 | if len(ops): 756 | return s + ' ' + ', '.join(map(str, ops)) 757 | return s 758 | 759 | def __len__(self): 760 | """Return the Length of the Machine Code.""" 761 | return self.__str__().__len__() 762 | 763 | def __str__(self): 764 | """Encode this Instruction into its machine code representation.""" 765 | enc = self.encoding() 766 | 767 | ret = '' 768 | 769 | if self.lock: 770 | ret += '\xf0' 771 | 772 | if self.repne: 773 | ret += '\xf2' 774 | 775 | if self.rep: 776 | ret += '\xf3' 777 | 778 | if enc is None: 779 | op = self._opcode_ 780 | ret += chr(op) if isinstance(op, int) else op 781 | return ret 782 | 783 | opcode, op1, op2, op3 = enc 784 | ops = (self.op1, self.op2, self.op3) 785 | modrm_reg = modrm_rm = None 786 | 787 | ret += chr(opcode) if isinstance(opcode, int) else opcode 788 | disp = '' 789 | 790 | for i in xrange(3): 791 | op = enc[i+1] 792 | # we don't have to process empty operands or hardcoded values 793 | if op is None or not isinstance(op, tuple): 794 | continue 795 | 796 | size, typ = op[:2] 797 | 798 | # if a third index is given in the operand's tuple, then that 799 | # means that we have to emulate the `reg' for the modrm byte. 800 | # the value of `reg' is therefore given as third value. 801 | if len(op) == 3: 802 | modrm_reg = gpr.register32[op[2]] 803 | 804 | # handle Immediates 805 | if typ in (imm, signed_imm): 806 | disp += size.pack(ops[i]) 807 | continue 808 | 809 | # handle the reg part of the modrm byte 810 | if typ not in (mem, memgpr, memxmm) and modrm_reg is None: 811 | modrm_reg = ops[i] 812 | continue 813 | 814 | if isinstance(ops[i], (gpr, xmm)) and modrm_rm is not None: 815 | modrm_reg = ops[i] 816 | continue 817 | 818 | # handle the rm part of the modrm byte 819 | if typ in (mem, gpr, xmm, memgpr, memxmm): 820 | modrm_rm = ops[i] 821 | continue 822 | 823 | raise Exception('Unknown Type') 824 | 825 | if modrm_reg or modrm_rm: 826 | ret += self.modrm(modrm_reg, modrm_rm) 827 | 828 | self._encode = ret + disp 829 | return self._encode 830 | 831 | def __add__(self, other): 832 | return Block(self, other) 833 | 834 | def __radd__(self, other): 835 | return Block(other, self) 836 | 837 | 838 | class RelativeJump: 839 | _index_ = None 840 | _name_ = None 841 | 842 | def __init__(self, value): 843 | self.value = value 844 | 845 | def __len__(self): 846 | return 6 if self._index_ is not None else 5 847 | 848 | def name(self): 849 | """The name of this instruction.""" 850 | return self._name_ or self.__class__.__name__ 851 | 852 | def __repr__(self): 853 | value = self.value 854 | if not isinstance(value, str): 855 | value = repr(value) 856 | return self.name() + ' ' + value 857 | 858 | def assemble(self, short=True, labels={}, offset=0): 859 | """Assemble the Relative Jump. 860 | 861 | `short' indicates if the relative offset should be encoded as 8bit 862 | value, if possible. 863 | `offset' is the offset of this jump 864 | 865 | """ 866 | to = self.value 867 | if isinstance(self.value, Label): 868 | if isinstance(self.value.index, str): 869 | to = labels[self.value.index] 870 | else: 871 | index = self.value.index + self.value.base 872 | to = labels[index - (self.value.index > 0)] 873 | elif isinstance(self.value, str): 874 | to = labels[self.value] 875 | 876 | if self._index_ is None: 877 | return chr(self._opcode_) + dword.pack(to - offset - 5) 878 | else: 879 | return '\x0f' + chr(0x80 + self._index_) + dword.pack( 880 | to - offset - 6) 881 | 882 | 883 | class Block: 884 | block_id = 0 885 | 886 | def __init__(self, *args): 887 | self._l = [] 888 | 889 | # unique block id for each Block 890 | Block.block_id += 1 891 | 892 | # current index for new labels 893 | self.label_base = 0 894 | 895 | # add each argument to the list 896 | map(self.append, args) 897 | 898 | def __repr__(self): 899 | """Return a string representation of all instructions chained.""" 900 | # convert an instruction into a string representation, labels need an 901 | # additional semicolon and have to be absolute offset, rather than 902 | # a relative one 903 | ret = '' 904 | index = 0 905 | for instr in self._l: 906 | if isinstance(instr, Label): 907 | instr.base = index 908 | index += 1 909 | ret += repr(instr) + ':\n' 910 | elif isinstance(instr, str): 911 | index += 1 912 | ret += '%s:\n' % instr 913 | elif isinstance(instr, RelativeJump) and \ 914 | isinstance(instr.value, Label): 915 | instr.value.base = index 916 | ret += repr(instr) + '\n' 917 | else: 918 | ret += repr(instr) + '\n' 919 | return ret 920 | 921 | def __str__(self): 922 | """Return the Machine Code representation.""" 923 | return ''.join(map(str, self._l)) 924 | 925 | def assemble(self, recursion=10): 926 | """Assemble the given Block. 927 | 928 | Assembly can *ONLY* be called on the top-level assembly block. Any 929 | unresolved labels will result in an exception. 930 | 931 | `recursion' indicates the maximal amount of times to recurse in order 932 | to optimize the size of conditional jumps (see docs.) 933 | 934 | """ 935 | local_labels = {} 936 | global_labels = {} 937 | offset = 0 938 | 939 | # first we obtain the offset of each label 940 | for idx, instr in enumerate(self._l): 941 | # convert any class objects to instances 942 | if isinstance(instr, types.ClassType): 943 | instr = instr() 944 | self._l[idx] = instr 945 | 946 | if isinstance(instr, (str, Label)): 947 | # named local label 948 | if isinstance(instr, str): 949 | local_labels[instr] = offset 950 | 951 | # anonymous local label 952 | elif not instr.index: 953 | instr.index = len(local_labels) 954 | local_labels[instr.index] = offset 955 | 956 | # global named label 957 | elif isinstance(instr.index, str): 958 | global_labels[instr.index] = offset 959 | local_labels[instr.index] = offset 960 | 961 | elif isinstance(instr, Instruction): 962 | offset += len(instr) 963 | 964 | elif isinstance(instr, RelativeJump): 965 | # 5 for unconditional jumps, 6 for conditional jumps 966 | offset += 5 + (instr._index_ is not None) 967 | 968 | # is this a label? 969 | if isinstance(instr.value, Label): 970 | 971 | # is this an anonymous label? 972 | if isinstance(instr.value, (int, long)): 973 | # make an absolute index from the relative one 974 | instr.value.index += len(local_labels) 975 | 976 | # do at most `recursion' iterations in order to try to optimize the 977 | # relative jumps 978 | # ... 979 | 980 | machine_code = '' 981 | offset = 0 982 | 983 | # now we assemble the machine code 984 | for instr in self._l: 985 | 986 | if isinstance(instr, Instruction): 987 | machine_code += str(instr) 988 | 989 | elif isinstance(instr, RelativeJump): 990 | machine_code += instr.assemble(short=False, 991 | labels=local_labels, 992 | offset=offset) 993 | 994 | offset = len(machine_code) 995 | 996 | return machine_code 997 | 998 | def append(self, other): 999 | """Append instruction(s) in `other' to `self'.""" 1000 | # if a class object was given, we create an instance ourselves 1001 | # this can be either an Instruction or a Label 1002 | if isinstance(other, (types.ClassType, _MetaLabel)): 1003 | other = other() 1004 | 1005 | def labelify(val): 1006 | if isinstance(val, Label): 1007 | val.base = self.label_base 1008 | if isinstance(val, str): 1009 | val = '__lbl_%d_%s' % (self.block_id, val) 1010 | return val 1011 | 1012 | if isinstance(other, Label): 1013 | other.base = self.label_base 1014 | self._l.append(other) 1015 | self.label_base += 1 1016 | 1017 | elif isinstance(other, str): 1018 | self._l.append('__lbl_%d_%s' % (self.block_id, other)) 1019 | self.label_base += 1 1020 | 1021 | elif isinstance(other, RelativeJump): 1022 | self._l.append(other) 1023 | 1024 | other.value = labelify(other.value) 1025 | 1026 | elif isinstance(other, Instruction): 1027 | self._l.append(other) 1028 | 1029 | other.op1 = labelify(other.op1) 1030 | other.op2 = labelify(other.op2) 1031 | # TODO add memory address support 1032 | 1033 | elif isinstance(other, (list, tuple)): 1034 | map(self.append, other) 1035 | 1036 | elif isinstance(other, Block): 1037 | # we merge the `other' block with ours, by appending. 1038 | # TODO deepcopy might get in a recursive loop somehow, if that 1039 | # ever occurs, implement a __deepcopy__ which only makes a new 1040 | # copy of Labels 1041 | # map(self.append, map(copy.deepcopy, other._l)) 1042 | map(self.append, other._l) 1043 | 1044 | else: 1045 | raise Exception('This object is not welcome here.') 1046 | 1047 | return self 1048 | 1049 | def __iadd__(self, other): 1050 | """self += other""" 1051 | self.append(other) 1052 | return self 1053 | 1054 | def __add__(self, other): 1055 | """self + other""" 1056 | return Block(self, other) 1057 | 1058 | def __radd__(self, other): 1059 | """other + self""" 1060 | return Block(other, self) 1061 | 1062 | def __iter__(self): 1063 | return self._l 1064 | 1065 | block = Block 1066 | 1067 | 1068 | class _MetaLabel(type): 1069 | def __sub__(cls, other): 1070 | return Label(-other) 1071 | 1072 | def __add__(cls, other): 1073 | return Label(other) 1074 | 1075 | 1076 | class Label: 1077 | __metaclass__ = _MetaLabel 1078 | 1079 | def __init__(self, index=0): 1080 | self.index = index 1081 | 1082 | # any base to add to `index' 1083 | self.base = 0 1084 | 1085 | def __repr__(self): 1086 | if isinstance(self.index, str): 1087 | return '__lbl_%s' % self.index 1088 | 1089 | index = self.index + self.base 1090 | 1091 | # as we have to include Label(0) possibilities 1092 | if self.index > 0: 1093 | index -= 1 1094 | 1095 | return '__lbl_%s' % index 1096 | 1097 | lbl = Label 1098 | 1099 | 1100 | class retn(Instruction): 1101 | _opcode_ = 0xc3 1102 | _enc_ = [(0xc2, (word, imm))] 1103 | 1104 | ret = retn 1105 | 1106 | 1107 | class leave(Instruction): 1108 | _opcode_ = 0xc9 1109 | 1110 | 1111 | class nop(Instruction): 1112 | _opcode_ = 0x90 1113 | 1114 | 1115 | class mov(Instruction): 1116 | # mov r32, imm32 and mov r8, imm32 1117 | _enc_ = \ 1118 | zip(range(0xb0, 0xb8), gpr.register8, ((byte, imm),) * 8) + \ 1119 | zip(range(0xb8, 0xc0), gpr.register32, ((dword, imm),) * 8) + \ 1120 | [ 1121 | (0x8b, (dword, gpr), (dword, memgpr)), 1122 | (0x89, (dword, memgpr), (dword, gpr)), 1123 | (0x88, (byte, memgpr), (byte, gpr)), 1124 | (0x8a, (byte, gpr), (byte, memgpr)), 1125 | (0xc6, (byte, memgpr, 0), (byte, imm)), 1126 | (0xc7, (dword, memgpr, 0), (dword, imm)), 1127 | ] 1128 | 1129 | class movzx(Instruction): 1130 | _enc_ = [ 1131 | ('\x0f\xb6', (dword, gpr), (byte, memgpr)), 1132 | ('\x0f\xb7', (dword, gpr), (word, memgpr)), 1133 | ] 1134 | 1135 | 1136 | class movsx(Instruction): 1137 | _enc_ = [ 1138 | ('\x0f\xbe', (dword, gpr), (byte, memgpr)), 1139 | ('\x0f\xbf', (dword, gpr), (word, memgpr)), 1140 | ] 1141 | 1142 | 1143 | class push(Instruction): 1144 | # push r32 1145 | _enc_ = zip(range(0x50, 0x58), gpr.register32) + [ 1146 | (0x06, es), 1147 | (0x0e, cs), 1148 | (0x16, ss), 1149 | (0x1e, ds), 1150 | ('\x0f\xa0', fs), 1151 | ('\x0f\xa8', gs), 1152 | (0x6a, (byte, signed_imm)), 1153 | (0x68, (dword, imm)), 1154 | (0xff, (dword, mem, 6)), 1155 | ] 1156 | 1157 | 1158 | class pop(Instruction): 1159 | # pop r32 1160 | _enc_ = zip(range(0x58, 0x60), gpr.register32) + [ 1161 | (0x07, es), 1162 | (0x17, ss), 1163 | (0x1f, ds), 1164 | ('\x0f\xa1', fs), 1165 | ('\x0f\xa9', gs), 1166 | (0x8f, (dword, mem, 0)), 1167 | ] 1168 | 1169 | 1170 | class inc(Instruction): 1171 | # inc r32 1172 | _enc_ = zip(range(0x40, 0x48), gpr.register32) + [ 1173 | (0xfe, (byte, memgpr, 0)), 1174 | (0xff, (dword, memgpr, 0))] 1175 | 1176 | 1177 | class dec(Instruction): 1178 | # dec r32 1179 | _enc_ = zip(range(0x48, 0x50), gpr.register32) + [ 1180 | (0xfe, (byte, memgpr, 1)), 1181 | (0xff, (dword, memgpr, 1))] 1182 | 1183 | 1184 | class xchg(Instruction): 1185 | # xchg eax, r32 1186 | _enc_ = zip(range(0x91, 0x98), gpr.register32[1:], (eax,) * 8) + [ 1187 | (0x86, (byte, memgpr), (byte, gpr)), 1188 | (0x87, (dword, memgpr), (dword, memgpr))] 1189 | 1190 | 1191 | class stosb(Instruction): 1192 | _opcode_ = 0xaa 1193 | 1194 | 1195 | class stosd(Instruction): 1196 | _opcode_ = 0xab 1197 | 1198 | 1199 | class lodsb(Instruction): 1200 | _opcode_ = 0xac 1201 | 1202 | 1203 | class lodsd(Instruction): 1204 | _opcode_ = 0xad 1205 | 1206 | 1207 | class scasb(Instruction): 1208 | _opcode_ = 0xae 1209 | 1210 | 1211 | class scasd(Instruction): 1212 | _opcode_ = 0xaf 1213 | 1214 | 1215 | class lea(Instruction): 1216 | _enc_ = [(0x8d, (dword, gpr), (None, mem))] 1217 | 1218 | 1219 | class pshufd(Instruction): 1220 | _enc_ = [('\x66\x0f\x70', (oword, xmm), (oword, memxmm), (byte, imm))] 1221 | 1222 | 1223 | class paddb(Instruction): 1224 | _enc_ = [('\x66\x0f\xfc', (oword, xmm), (oword, memxmm))] 1225 | 1226 | 1227 | class paddw(Instruction): 1228 | _enc_ = [('\x66\x0f\xfd', (oword, xmm), (oword, memxmm))] 1229 | 1230 | 1231 | class paddd(Instruction): 1232 | _enc_ = [('\x66\x0f\xfe', (oword, xmm), (oword, memxmm))] 1233 | 1234 | 1235 | class psubb(Instruction): 1236 | _enc_ = [('\x66\x0f\xf8', (oword, xmm), (oword, memxmm))] 1237 | 1238 | 1239 | class psubw(Instruction): 1240 | _enc_ = [('\x66\x0f\xf9', (oword, xmm), (oword, memxmm))] 1241 | 1242 | 1243 | class psubd(Instruction): 1244 | _enc_ = [('\x66\x0f\xfa', (oword, xmm), (oword, memxmm))] 1245 | 1246 | 1247 | class pand(Instruction): 1248 | _enc_ = [('\x66\x0f\xdb', (oword, xmm), (oword, memxmm))] 1249 | 1250 | 1251 | class pandn(Instruction): 1252 | _enc_ = [('\x66\x0f\xdf', (oword, xmm), (oword, memxmm))] 1253 | 1254 | 1255 | class por(Instruction): 1256 | _enc_ = [('\x66\x0f\xeb', (oword, xmm), (oword, memxmm))] 1257 | 1258 | 1259 | class pxor(Instruction): 1260 | _enc_ = [('\x66\x0f\xef', (oword, xmm), (oword, memxmm))] 1261 | 1262 | 1263 | class pmuludq(Instruction): 1264 | _enc_ = [('\x66\x0f\xf4', (oword, xmm), (oword, memxmm))] 1265 | 1266 | 1267 | class movaps(Instruction): 1268 | _enc_ = [ 1269 | ('\x0f\x28', (oword, xmm), (oword, memxmm)), 1270 | ('\x0f\x29', (oword, memxmm), (oword, xmm)), 1271 | ] 1272 | 1273 | 1274 | class movups(Instruction): 1275 | _enc_ = [ 1276 | ('\x0f\x10', (oword, xmm), (oword, memxmm)), 1277 | ('\x0f\x11', (oword, memxmm), (oword, xmm)), 1278 | ] 1279 | 1280 | 1281 | class movapd(Instruction): 1282 | _enc_ = [ 1283 | ('\x66\x0f\x28', (oword, xmm), (oword, memxmm)), 1284 | ('\x66\x0f\x29', (oword, memxmm), (oword, xmm)), 1285 | ] 1286 | 1287 | 1288 | class movd(Instruction): 1289 | _enc_ = [ 1290 | ('\x66\x0f\x6e', (oword, xmm), (dword, memgpr)), 1291 | ('\x66\x0f\x7e', (dword, memgpr), (oword, xmm)), 1292 | ] 1293 | 1294 | 1295 | class movss(Instruction): 1296 | _enc_ = [ 1297 | ('\xf3\x0f\x10', (oword, xmm), (oword, memxmm)), 1298 | ('\xf3\x0f\x11', (oword, memxmm), (oword, xmm)), 1299 | ] 1300 | 1301 | 1302 | class jo(RelativeJump): 1303 | _index_ = 0 1304 | 1305 | 1306 | class jno(RelativeJump): 1307 | _index_ = 1 1308 | 1309 | 1310 | class jb(RelativeJump): 1311 | _index_ = 2 1312 | 1313 | 1314 | class jnb(RelativeJump): 1315 | _index_ = 3 1316 | 1317 | jae = jnb 1318 | 1319 | 1320 | class jz(RelativeJump): 1321 | _index_ = 4 1322 | 1323 | 1324 | class jnz(RelativeJump): 1325 | _index_ = 5 1326 | 1327 | 1328 | class jbe(RelativeJump): 1329 | _index_ = 6 1330 | 1331 | 1332 | class jnbe(RelativeJump): 1333 | _index_ = 7 1334 | 1335 | ja = jnbe 1336 | 1337 | 1338 | class js(RelativeJump): 1339 | _index_ = 8 1340 | 1341 | 1342 | class jns(RelativeJump): 1343 | _index_ = 9 1344 | 1345 | 1346 | class jp(RelativeJump): 1347 | _index_ = 10 1348 | 1349 | 1350 | class jnp(RelativeJump): 1351 | _index_ = 11 1352 | 1353 | 1354 | class jl(RelativeJump): 1355 | _index_ = 12 1356 | 1357 | 1358 | class jnl(RelativeJump): 1359 | _index_ = 13 1360 | 1361 | jge = jnl 1362 | 1363 | 1364 | class jle(RelativeJump): 1365 | _index_ = 14 1366 | 1367 | 1368 | class jnle(RelativeJump): 1369 | _index_ = 15 1370 | 1371 | 1372 | def _branch_instr(name, opcode, enc, arg): 1373 | if not isinstance(arg, (int, long, str, Label)): 1374 | i = Instruction(arg) 1375 | i._enc_ = enc 1376 | i._name_ = name 1377 | return i 1378 | r = RelativeJump(arg) 1379 | r._opcode_ = opcode 1380 | r._name_ = name 1381 | return r 1382 | 1383 | 1384 | def jmp(arg): 1385 | return _branch_instr('jmp', 0xe9, None, arg) 1386 | 1387 | 1388 | def call(arg): 1389 | return _branch_instr('call', 0xe8, None, arg) 1390 | 1391 | _group_1_opcodes = lambda x: [ 1392 | (0x00+8*x, (byte, memgpr), (byte, gpr)), 1393 | (0x01+8*x, (dword, memgpr), (dword, gpr)), 1394 | (0x02+8*x, (byte, gpr), (byte, memgpr)), 1395 | (0x03+8*x, (dword, gpr), (dword, memgpr)), 1396 | (0x04+8*x, al, (byte, imm)), 1397 | (0x80, (byte, memgpr, x), (byte, imm)), 1398 | (0x83, (dword, memgpr, x), (byte, signed_imm)), 1399 | (0x05+8*x, eax, (dword, imm)), 1400 | (0x81, (dword, memgpr, x), (dword, imm))] 1401 | 1402 | 1403 | class add(Instruction): 1404 | _enc_ = _group_1_opcodes(0) 1405 | 1406 | 1407 | class or_(Instruction): 1408 | _enc_ = _group_1_opcodes(1) 1409 | _name_ = 'or' 1410 | 1411 | 1412 | class adc(Instruction): 1413 | _enc_ = _group_1_opcodes(2) 1414 | 1415 | 1416 | class sbb(Instruction): 1417 | _enc_ = _group_1_opcodes(3) 1418 | 1419 | 1420 | class and_(Instruction): 1421 | _enc_ = _group_1_opcodes(4) 1422 | _name_ = 'and' 1423 | 1424 | 1425 | class sub(Instruction): 1426 | _enc_ = _group_1_opcodes(5) 1427 | 1428 | 1429 | class xor(Instruction): 1430 | _enc_ = _group_1_opcodes(6) 1431 | 1432 | 1433 | class cmp_(Instruction): 1434 | _enc_ = _group_1_opcodes(7) 1435 | _name_ = 'cmp' 1436 | 1437 | 1438 | class test(Instruction): 1439 | _enc_ = [ 1440 | (0x84, (byte, memgpr), (byte, gpr)), 1441 | (0x85, (dword, memgpr), (dword, memgpr)), 1442 | (0xa8, al, (byte, imm)), 1443 | (0xa9, eax, (dword, imm)), 1444 | (0xf6, (byte, memgpr, 0), (byte, imm)), 1445 | (0xf7, (dword, memgpr, 0), (dword, imm)), 1446 | ] 1447 | 1448 | _group_2_opcodes = lambda x: [ 1449 | (0xd0, (byte, memgpr, x), imm(1)), 1450 | (0xd1, (dword, memgpr, x), imm(1)), 1451 | (0xd2, (byte, memgpr, x), cl), 1452 | (0xd3, (dword, memgpr, x), cl), 1453 | (0xc0, (byte, memgpr, x), (byte, imm)), 1454 | (0xc1, (dword, memgpr, x), (byte, imm))] 1455 | 1456 | 1457 | class rol(Instruction): 1458 | _enc_ = _group_2_opcodes(0) 1459 | 1460 | 1461 | class ror(Instruction): 1462 | _enc_ = _group_2_opcodes(1) 1463 | 1464 | 1465 | class rcl(Instruction): 1466 | _enc_ = _group_2_opcodes(2) 1467 | 1468 | 1469 | class rcr(Instruction): 1470 | _enc_ = _group_2_opcodes(3) 1471 | 1472 | 1473 | class shl(Instruction): 1474 | _enc_ = _group_2_opcodes(4) 1475 | 1476 | 1477 | class shr(Instruction): 1478 | _enc_ = _group_2_opcodes(5) 1479 | 1480 | 1481 | class sal(Instruction): 1482 | _enc_ = _group_2_opcodes(4) 1483 | 1484 | 1485 | class sar(Instruction): 1486 | _enc_ = _group_2_opcodes(7) 1487 | 1488 | _group_3_opcodes = lambda x: [ 1489 | (0xf6, (byte, memgpr, x)), 1490 | (0xf7, (dword, memgpr, x))] 1491 | 1492 | 1493 | class not_(Instruction): 1494 | _enc_ = _group_3_opcodes(2) 1495 | _name_ = 'not' 1496 | 1497 | 1498 | class neg(Instruction): 1499 | _enc_ = _group_3_opcodes(3) 1500 | 1501 | 1502 | class mul(Instruction): 1503 | _enc_ = _group_3_opcodes(4) 1504 | 1505 | 1506 | class imul(Instruction): 1507 | _enc_ = _group_3_opcodes(5) + [ 1508 | ('\x0f\xaf', (dword, gpr), (dword, memgpr)), 1509 | (0x6b, (dword, gpr), (dword, memgpr), (byte, signed_imm)), 1510 | (0x69, (dword, gpr), (dword, memgpr), (dword, imm)) 1511 | ] 1512 | 1513 | 1514 | class div(Instruction): 1515 | _enc_ = _group_3_opcodes(6) 1516 | 1517 | 1518 | class idiv(Instruction): 1519 | _enc_ = _group_3_opcodes(7) 1520 | 1521 | 1522 | class movsb(Instruction): 1523 | _opcode_ = 0xa4 1524 | 1525 | 1526 | class movsd(Instruction): 1527 | _opcode_ = 0xa5 1528 | 1529 | 1530 | class cmpsb(Instruction): 1531 | _opcode_ = 0xa6 1532 | 1533 | 1534 | class cmpsd(Instruction): 1535 | _opcode_ = 0xa7 1536 | 1537 | 1538 | class pushf(Instruction): 1539 | _opcode_ = 0x9c 1540 | 1541 | 1542 | class popf(Instruction): 1543 | _opcode_ = 0x9d 1544 | 1545 | 1546 | class cpuid(Instruction): 1547 | _opcode_ = '\x0f\xa2' 1548 | 1549 | 1550 | class sysenter(Instruction): 1551 | _opcode_ = '\x0f\x34' 1552 | 1553 | 1554 | class fninit(Instruction): 1555 | _opcode_ = '\xdb\xe3' 1556 | 1557 | 1558 | class cdq(Instruction): 1559 | _opcode_ = 0x99 1560 | 1561 | 1562 | class cld(Instruction): 1563 | _opcode_ = 0xfc 1564 | --------------------------------------------------------------------------------