├── IMPLEMENTATION.md
├── README.md
├── __init__.py
├── java.py
├── tests.py
└── x86.py


/IMPLEMENTATION.md:
--------------------------------------------------------------------------------
  1 | # pyasm2
  2 | &copy; 2012, Jurriaan Bremer
  3 | 
  4 | ## Introduction
  5 | 
  6 | _pyasm2_ is an x86 assembler library. It allows an easy Intel-like assembly
  7 | syntax, with support for sequences of instructions, as well as labels.
  8 | 
  9 | ## Simple Usage
 10 | 
 11 | Here are some examples to illustrate the simplicity of pyasm2. For each
 12 | example the normal Intel-syntax is given, followed by the equivalent using
 13 | pyasm2.
 14 | 
 15 | *   `push eax` &rarr; **`push(eax)`**
 16 | *   `mov eax, ebx` &rarr; **`mov(eax, ebx)`**
 17 | *   `lea edx, [ebp+eax*4+32]` &rarr; **`lea(edx, [ebp+eax*4+32])`**
 18 | *   `movzx ebx, byte [esp-64]` &rarr; **`movzx(ebx, byte [esp-64])`**
 19 | *   `mov eax, dword fs:[0xc0]` &rarr; **`mov(eax, dword [fs:0xc0])`**
 20 | 
 21 | Note that pyasm2 throws an exception if the instruction doesn't support the
 22 | given operands (an operand is like a parameter to an instruction.)
 23 | 
 24 | A few simple command-line examples.
 25 | ```python
 26 | >>> from pyasm2 import *
 27 | >>> mov(eax, dword[ebx+0x100])
 28 | mov eax, dword [ebx+0x100]
 29 | >>> push(dword[esp])
 30 | push dword [esp]
 31 | >>> mov(eax, eax, eax) # invalid encoding
 32 | ... snip ...
 33 | Exception: Unknown or Invalid Encoding
 34 | ```
 35 | 
 36 | ## Blocks
 37 | 
 38 | Besides normal instructions pyasm2 also supports sequences of instructions,
 39 | referred to as *blocks* from now on.
 40 | 
 41 | Blocks are especially useful when chaining multiple instructions. Besides
 42 | that, blocks automatically resolve relative jumps, labels, etc.
 43 | 
 44 | A simple example of a function that does only one thing; zero the *eax*
 45 | register (the default return value of a function on x86) and returning to the
 46 | caller, looks like the following.
 47 | 
 48 | ```python
 49 | Block(
 50 |     xor(eax, eax),
 51 |     retn()
 52 | )
 53 | ```
 54 | 
 55 | Before we discuss further on blocks, we first need an introduction on pyasm2
 56 | labels.
 57 | 
 58 | ## Labels
 59 | 
 60 | pyasm2 supports two types of Labels; anonymous labels and named labels.
 61 | 
 62 | #### Anonymous Labels
 63 | 
 64 | Anonymous labels get an index, and can be referred to by a relative index.
 65 | 
 66 | For example, the following block increases the *eax* register infinite times.
 67 | (The -1 in this example is a relative index, so -1 points to the last defined
 68 | Label.)
 69 | 
 70 | ```python
 71 | Block(
 72 |     Label(),
 73 |     inc(eax),
 74 |     jmp(Label(-1))
 75 | )
 76 | ```
 77 | 
 78 | It is, however, not possible to reference to anonymous labels outside of the
 79 | current block (i.e. an IndexError is thrown.)
 80 | 
 81 | There are three different possible values for relative indices.
 82 | 
 83 | *   *Negative Index* &rarr; Points to an anonymous label before the current
 84 |     instruction.
 85 | *   *Zero Index* &rarr; Points to a transparant label which points to the
 86 |     current instruction.
 87 | *   *Positive Index* &rarr; Points to an anonymous label after the current
 88 |     instruction.
 89 | 
 90 | (This does indeed mean that relative index *1* points to the first label
 91 | after the current instruction.)
 92 | 
 93 | Throughout the following sections we will refer to this snippet, by rewriting
 94 | it a little bit every time.
 95 | 
 96 | #### Global Named Labels
 97 | 
 98 | A new named label can be created by creating a new Label instance with the
 99 | name as first parameter. Referencing a named label is just like referencing
100 | an anonymous label, but instead of passing an index, you give a string as
101 | parameter.
102 | 
103 | ```python
104 | Block(
105 |     Label('loop'),
106 |     inc(eax),
107 |     jmp(Label('loop'))
108 | )
109 | ```
110 | 
111 | Note that this type of named label is global, that is, other blocks can
112 | reference to this particular label as well. This is useful for example when
113 | defining a function. (Note that two or more blocks can *not* declare the same
114 | global named labels!)
115 | 
116 | #### Local Named Labels
117 | 
118 | Whereas one could make a global named label using e.g. `Label('name')`, it is
119 | also possible to make a *local* named label; a label that's only defined for
120 | the current block. Because local labels are more commonly used than global
121 | labels, their syntax is easier as well. Local named labels are simply created
122 | **and referenced** by using a string as name.
123 | 
124 | ```python
125 | Block(
126 |     'loop',
127 |     inc(eax),
128 |     jmp('loop')
129 | )
130 | ```
131 | 
132 | #### Label References
133 | 
134 | -Labels are referenced by e.g. `Label('name')`. When looking up label
135 | references, pyasm2 will first try to find the label in the current block,
136 | and only if there is no such label in the current block, it will look it up
137 | in the parent. In other words, local named labels are more important than
138 | global named labels.-
139 | 
140 | Local Named Labels and Global Named Labels can *not* be mixed. E.g. the
141 | following snippet throws an error.
142 | 
143 | ```python
144 | Block(
145 |     Label('loop'),  # global named label
146 |     inc(eax),
147 |     jmp('loop')     # local named label
148 | ```
149 | 
150 | ### Further Label Tweaks
151 | 
152 | Now we've seen the types of labels supported by pyasm2, it is time to get to
153 | some awesome tweaks which will speed up development and clean up your code
154 | even further.
155 | 
156 | #### Label classobj instead of instance
157 | 
158 | It is possible to define an Anonymous Label by passing the Label class,
159 | instead of passing an instance.
160 | 
161 | ```python
162 | Block(
163 |     Label,
164 |     inc(eax),
165 |     jmp(Label(-1))
166 | )
167 | ```
168 | 
169 | #### Global Named Labels as variabele
170 | 
171 | Because global named labels are able to reference to labels outside their
172 | current scope (a block), it is also possible to reference to them as a
173 | variabele (e.g. a function.)
174 | 
175 | ```python
176 | return_zero = Label('return_zero')
177 | f = Block(
178 |     return_zero,
179 |     xor(eax, eax),
180 |     retn()
181 | )
182 | f2 = Block(
183 |     call(return_zero),
184 |     # ... do something ...
185 | )
186 | ```
187 | 
188 | #### Alias Label to L
189 | 
190 | For those of us that think that the classname *Label* is too long, you could
191 | simply make an alias to **L** (i.e. `L = Label`.)
192 | 
193 | ```python
194 | Block(
195 |     L,
196 |     inc(eax),
197 |     jmp(L(-1))
198 | )
199 | ```
200 | 
201 | #### Tweaked Anonymous Label References
202 | 
203 | Because `jmp(L(-1))` looks pretty ugly (see the [Alias Label to L][] section),
204 | we've tweaked anonymous label references even further to the point where you
205 | can add or subtract a relative index directly to/from the `Label` class.
206 | 
207 | [Alias Label to L]: #alias-label-to-l
208 | 
209 | ```python
210 | Block(
211 |     L,
212 |     inc(eax),
213 |     jmp(L-1)
214 | )
215 | ```
216 | 
217 | #### Offset from a Label
218 | 
219 | Sometimes it might be necessary to add or subtract a value from the address of
220 | a label, in those cases the following technique applies.
221 | 
222 | ```python
223 | Block(
224 |     L,
225 |     nop,
226 |     mov(eax, Label(-1)+1)
227 | )
228 | ```
229 | 
230 | In this example the anonymous label will be referenced, but the value one is
231 | added to it. So `Label(-1)+1` points to the `mov` instruction, because the
232 | `nop` instruction is only one byte in length.
233 | 
234 | Do note that `Label(-1)+1` could be rewritten as `L-1+1`, but *please* don't
235 | do that, we don't want to torture python.
236 | 
237 | ## Blocks part two
238 | 
239 | Now we've seen how pyasm2 handles labels, it's time for some more in-depth
240 | information about blocks.
241 | 
242 | #### Instruction classobj instead of instance
243 | 
244 | Any instruction that does *not* take any additional operands (e.g. `retn`,
245 | `stosb`, `sysenter`, etc.) can be used directly in a block without actually
246 | making an instance. For example, the following two snippets are equal to
247 | pyasm2.
248 | 
249 | ```python
250 | Block(
251 |     mov(eax, 0),
252 |     retn()
253 | )
254 | ```
255 | ```python
256 | Block(
257 |     mov(eax, 0),
258 |     retn
259 | )
260 | ```
261 | 
262 | #### Combining Blocks
263 | 
264 | One can combine multiple blocks by *adding* one to the other. Combining blocks
265 | is actually just merging them, e.g. one block is appended to the other block.
266 | 
267 | ```python
268 | a = Block(
269 |     mov(eax, ebx),
270 |     mov(ebx, 42)
271 | )
272 | b = Block(
273 |     mov(ecx, edx)
274 | )
275 | print repr(a + b)
276 | # Block(mov(eax, ebx), mov(ebx, 42), mov(ecx, edx))
277 | ```
278 | 
279 | #### Temporary Blocks as Lists
280 | 
281 | Temporary blocks, those that you only use to add to other blocks, can be
282 | written as lists (or tuples, for that matter.)
283 | 
284 | ```python
285 | a = Block(
286 |     mov(eax, ebx),
287 |     mov(ebx, 42)
288 | )
289 | print repr(a + [xor(ecx, ecx), retn])
290 | # Block(mov(eax, ebx), mov(ebx, 42), xor(ecx, ecx), retn)
291 | ```
292 | 
293 | This does, however, not work if you want to call *repr* or *str* on the block.
294 | In that particular case, you can do the following.
295 | 
296 | ```python
297 | a = [xor(eax, eax), retn]
298 | print repr(Block(a))
299 | # Block(xor(eax, eax), retn)
300 | ```
301 | 
302 | #### Combining Instructions Directly
303 | 
304 | Instead of writing e.g. `Block(mov(eax, ebx), mov(ebx, 42))`, pyasm2 offers a
305 | shorthand.
306 | 
307 | ```python
308 | a = mov(eax, ebx) + mov(ebx, 42)
309 | print repr(a)
310 | # Block(mov(eax, ebx), mov(ebx, 42))
311 | ```
312 | 
313 | ## Raw Data Sections
314 | 
315 | As any assembler, pyasm2 also supports raw data. There are a few supported
316 | data types; signed/unsigned 8/16/32/64bit integers, strings and labels (which
317 | are 32bit pointers on x86.)
318 | 
319 | Some examples should suffice as explanation.
320 | 
321 | ```python
322 | a = Block(
323 |     String('abc'),
324 |     Int8(0x64),
325 |     Uint8(0x65),
326 |     Uint16(0x6766),
327 |     Int32(0x6b6a6968)
328 | )
329 | print str(a)
330 | # abcdefghijlk
331 | ```
332 | 
333 | #### Raw Data Aliases
334 | 
335 | Some interesting aliases include.
336 | 
337 | *   `S = String`
338 | *   `i8 = Int8`
339 | *   `u8 = Uint8`
340 | *   etc.
341 | 
342 | #### Multiple Items with the same Type
343 | 
344 | It is perfectly possible to define multiple values of the same type in one
345 | simple statement.
346 | 
347 | ```python
348 | a = Uint32(0x11223344, 0x44332211, 0x12345678, 0x87654321)
349 | ```
350 | 
351 | ## Blocks part three
352 | 
353 | Now we have seen the declaration of raw data using pyasm2, it is time to link
354 | code and data sections. For example, in normal executable binaries, it is
355 | normal to have different so-called sections for code and data. This way the
356 | code is seperated from the data.
357 | 
358 | This gives us a problem. When assembling, we do not have to combine the text
359 | and data blocks, so in order to get the correct addresses of code and data,
360 | we do the following. We assign an address to the data section, and from there
361 | give every label with address to the code section. This way the code section
362 | knows where to find the references to those labels.
363 | 
364 | ```python
365 | a = Block(
366 |     mov(eax, L('hello')),
367 |     # ... snip ...
368 | )
369 | b = Block(
370 |     L('hello'),
371 |     String('Hello World!\n\x00')
372 | )
373 | b.base_address(0x402000)
374 | a.references(b)
375 | ```
376 | 
377 | ## pyasm2 Internals
378 | 
379 | Although most of pyasm2 is fairly straightforward (chaining instructions is
380 | not that hard), there is one tricky part: **labels**.
381 | 
382 | To start off, the x86 instruction set provides two types of relative jumps.
383 | Those with an 8bit relative offset, and those with a 32bit relative offset.
384 | 
385 | Besides that, instructions can refer to other instructions or addresses within
386 | a data section, using labels. This means that pyasm2 has to keep track of
387 | these references, and magically fix them in the final step.
388 | 
389 | #### Relative Offset Size
390 | 
391 | So a relative jump can point to another instruction, by using a label. This
392 | raises the question; is the offset to this instruction within the size of an
393 | 8bit relative offset, or a 32bit one?
394 | 
395 | (8bit relative jumps are 2 bytes in length, 32bit ones are 5 bytes for
396 | unconditional jumps, and 6 bytes for conditional ones.)
397 | 
398 | There are two solutions to this problem, as far as I can tell.
399 | 
400 | *   Each label keeps a list of instructions pointing to it. When assembling,
401 |     each of the instructions is updated with the location of the label, so the
402 |     instructions can assemble the address or relative offset accordingly.
403 |     From here the instruction can determine if the offset has to be 8bit or
404 |     32bit.
405 | *   At first each relative jump is created using a 32bit relative offset.
406 |     Then, after assembling each instruction, the instructions are enumerated
407 |     and a check is done if the relative jumps would fit as jumps with an 8bit
408 |     relative offset as well. If that is the case, the jump is updated, and all
409 |     the other instructions are updated as well. This goes one until there are
410 |     no relative jumps left to tweak, or a recursive limit has exceeded.
411 | 
412 | Although the first implementation might be a little better, performance wise.
413 | pyasm2 uses the latter implementation, which is much easier to implement.
414 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | pyasm2 - x86 assembler library          (C) 2012 Jurriaan Bremer
 2 | 
 3 | Although its called pyasm2, this is not per se a successor of Pyasm or pyASM.
 4 | pyasm2 aims to be as flexible as possible, it will support x86, SSE and SSE2.
 5 | 
 6 | A key feature of pyasm2 is the ability to have blocks of instructions and
 7 | being able to give the base address at a later time, that is, you don't need
 8 | to know the address of instructions before-hand. For example, you can construct
 9 | a series of instructions, request the size that will be needed in order to
10 | store all instructions as sequence, allocate this memory and write the
11 | instructions from there, this approach is very useful when making JIT
12 | compilers etc.
13 | 
14 | The syntax of pyasm2 is supposed to be as simple as possible.
15 | 
16 | For example, an instruction such as "**mov eax, dword [ebx+edx*2+32]**" can be
17 | encoded using pyasm2 as the following.
18 | ```python
19 | mov(eax, dword [ebx+edx*2+32])
20 | ```
21 | 
22 | These memory addresses also support **segment registers**, e.g.
23 | ```python
24 | mov(eax, dword[fs:0xc0])
25 | ```
26 | although this is currently only supported in 64bit python versions.
27 | 
28 | Furthermore, pyasm2 makes it possible to **chain multiple instructions**. Take
29 | for example the following statement.
30 | ```python
31 | block(mov(eax, ebx), push(32))
32 | ```
33 | 
34 | However, for more implementation-specific details, please refer to the
35 | *IMPLEMENTATION* file.
36 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/jbremer/pyasm2/ff0f0e1c02146dc9230211f44c2dd04ff3bd4265/__init__.py


--------------------------------------------------------------------------------
/java.py:
--------------------------------------------------------------------------------
  1 | """
  2 | Java Disassembler & Assembler Engine   (C) 2012 Jurriaan Bremer
  3 | https://github.com/jbremer/pyasm2
  4 | 
  5 | """
  6 | import struct
  7 | 
  8 | # http://en.wikipedia.org/wiki/Java_bytecode_instruction_listings
  9 | _table = {
 10 |     0x32: 'aaload',
 11 |     0x53: 'aastore',
 12 |     0x01: 'aconst_null',
 13 |     0x19: 'aload',
 14 |     0x2a: 'aload_0',
 15 |     0x2b: 'aload_1',
 16 |     0x2c: 'aload_2',
 17 |     0x2d: 'aload_3',
 18 |     0xbd: 'anewarray',
 19 |     0xb0: 'areturn',
 20 |     0xbe: 'arraylength',
 21 |     0x3a: 'astore',
 22 |     0x4b: 'astore_0',
 23 |     0x4c: 'astore_1',
 24 |     0x4d: 'astore_2',
 25 |     0x4e: 'astore_3',
 26 |     0xbf: 'athrow',
 27 |     0x33: 'baload',
 28 |     0x54: 'bastore',
 29 |     0x10: 'bipush',
 30 |     0x34: 'caload',
 31 |     0x55: 'castore',
 32 |     0xc0: 'checkcast',
 33 |     0x90: 'd2f',
 34 |     0x8e: 'd2i',
 35 |     0x8f: 'd2l',
 36 |     0x63: 'dadd',
 37 |     0x31: 'daload',
 38 |     0x52: 'dastore',
 39 |     0x98: 'dcmpg',
 40 |     0x97: 'dcmpl',
 41 |     0x0e: 'dconst_0',
 42 |     0x0f: 'dconst_1',
 43 |     0x6f: 'ddiv',
 44 |     0x18: 'dload',
 45 |     0x26: 'dload_0',
 46 |     0x27: 'dload_1',
 47 |     0x28: 'dload_2',
 48 |     0x29: 'dload_3',
 49 |     0x6b: 'dmul',
 50 |     0x77: 'dneg',
 51 |     0x73: 'drem',
 52 |     0xaf: 'dreturn',
 53 |     0x39: 'dstore',
 54 |     0x47: 'dstore_0',
 55 |     0x48: 'dstore_1',
 56 |     0x49: 'dstore_2',
 57 |     0x4a: 'dstore_3',
 58 |     0x67: 'dsub',
 59 |     0x59: 'dup',
 60 |     0x5a: 'dup_x1',
 61 |     0x5b: 'dup_x2',
 62 |     0x5c: 'dup2',
 63 |     0x5d: 'dup2_x1',
 64 |     0x5e: 'dup2_x2',
 65 |     0x8d: 'f2d',
 66 |     0x8b: 'f2i',
 67 |     0x8c: 'f2l',
 68 |     0x62: 'fadd',
 69 |     0x30: 'faload',
 70 |     0x51: 'fastore',
 71 |     0x96: 'fcmpg',
 72 |     0x95: 'fcmpl',
 73 |     0x0b: 'fconst_0',
 74 |     0x0c: 'fconst_1',
 75 |     0x0d: 'fconst_2',
 76 |     0x6e: 'fdiv',
 77 |     0x17: 'fload',
 78 |     0x22: 'fload_0',
 79 |     0x23: 'fload_1',
 80 |     0x24: 'fload_2',
 81 |     0x25: 'fload_3',
 82 |     0x6a: 'fmul',
 83 |     0x76: 'fneg',
 84 |     0x72: 'frem',
 85 |     0xae: 'freturn',
 86 |     0x38: 'fstore',
 87 |     0x43: 'fstore_0',
 88 |     0x44: 'fstore_1',
 89 |     0x45: 'fstore_2',
 90 |     0x46: 'fstore_3',
 91 |     0x66: 'fsub',
 92 |     0xb4: 'getfield',
 93 |     0xb2: 'getstatic',
 94 |     0xa7: 'goto',
 95 |     0xc8: 'goto_w',
 96 |     0x91: 'i2b',
 97 |     0x92: 'i2c',
 98 |     0x87: 'i2d',
 99 |     0x86: 'i2f',
100 |     0x85: 'i2l',
101 |     0x93: 'i2s',
102 |     0x60: 'iadd',
103 |     0x2e: 'iaload',
104 |     0x7e: 'iand',
105 |     0x4f: 'iastore',
106 |     0x02: 'iconst_m1',
107 |     0x03: 'iconst_0',
108 |     0x04: 'iconst_1',
109 |     0x05: 'iconst_2',
110 |     0x06: 'iconst_3',
111 |     0x07: 'iconst_4',
112 |     0x08: 'iconst_5',
113 |     0x6c: 'idiv',
114 |     0xa5: 'if_acmpeq',
115 |     0xa6: 'if_acmpne',
116 |     0x9f: 'if_icmpeq',
117 |     0xa0: 'if_icmpne',
118 |     0xa1: 'if_icmplt',
119 |     0xa2: 'if_icmpge',
120 |     0xa3: 'if_icmpgt',
121 |     0xa4: 'if_icmple',
122 |     0x99: 'ifeq',
123 |     0x9a: 'ifne',
124 |     0x9b: 'iflt',
125 |     0x9c: 'ifge',
126 |     0x9d: 'ifgt',
127 |     0x9e: 'ifle',
128 |     0xc7: 'ifnonnull',
129 |     0xc6: 'ifnull',
130 |     0x84: 'iinc',
131 |     0x15: 'iload',
132 |     0x1a: 'iload_0',
133 |     0x1b: 'iload_1',
134 |     0x1c: 'iload_2',
135 |     0x1d: 'iload_3',
136 |     0x68: 'imul',
137 |     0x74: 'ineg',
138 |     0xc1: 'instanceof',
139 |     0xba: 'invokedynamic',
140 |     0xb9: 'invokeinterface',
141 |     0xb7: 'invokespecial',
142 |     0xb8: 'invokestatic',
143 |     0xb6: 'invokevirtual',
144 |     0x80: 'ior',
145 |     0x70: 'irem',
146 |     0xac: 'ireturn',
147 |     0x78: 'ishl',
148 |     0x7a: 'ishr',
149 |     0x36: 'istore',
150 |     0x3b: 'istore_0',
151 |     0x3c: 'istore_1',
152 |     0x3d: 'istore_2',
153 |     0x3e: 'istore_3',
154 |     0x64: 'isub',
155 |     0x7c: 'iushr',
156 |     0x82: 'ixor',
157 |     0xa8: 'jsr',
158 |     0xc9: 'jsr_w',
159 |     0x8a: 'l2d',
160 |     0x89: 'l2f',
161 |     0x88: 'l2i',
162 |     0x61: 'ladd',
163 |     0x2f: 'laload',
164 |     0x7f: 'land',
165 |     0x50: 'lastore',
166 |     0x94: 'lcmp',
167 |     0x09: 'lconst_0',
168 |     0x0a: 'lconst_1',
169 |     0x12: 'ldc',
170 |     0x13: 'ldc_w',
171 |     0x14: 'ldc2_w',
172 |     0x6d: 'ldiv',
173 |     0x16: 'lload',
174 |     0x1e: 'lload_0',
175 |     0x1f: 'lload_1',
176 |     0x20: 'lload_2',
177 |     0x21: 'lload_3',
178 |     0x69: 'lmul',
179 |     0x75: 'lneg',
180 |     0xab: 'lookupswitch',
181 |     0x81: 'lor',
182 |     0x71: 'lrem',
183 |     0xad: 'lreturn',
184 |     0x79: 'lshl',
185 |     0x7b: 'lshr',
186 |     0x37: 'lstore',
187 |     0x3f: 'lstore_0',
188 |     0x40: 'lstore_1',
189 |     0x41: 'lstore_2',
190 |     0x42: 'lstore_3',
191 |     0x65: 'lsub',
192 |     0x7d: 'lushr',
193 |     0x83: 'lxor',
194 |     0xc2: 'monitorenter',
195 |     0xc3: 'monitorexit',
196 |     0xc5: 'multianewarray',
197 |     0xbb: 'new',
198 |     0xbc: 'newarray',
199 |     0x00: 'nop',
200 |     0x57: 'pop',
201 |     0x58: 'pop2',
202 |     0xb5: 'putfield',
203 |     0xb3: 'putstatic',
204 |     0xa9: 'ret',
205 |     0xb1: 'return',
206 |     0x35: 'saload',
207 |     0x56: 'sastore',
208 |     0x11: 'sipush',
209 |     0x5f: 'swap',
210 |     0xaa: 'tableswitch',
211 |     0xc4: 'wide',
212 |     0xca: 'breakpoint',
213 |     0xfe: 'impdep1',
214 |     0xff: 'impdep2',
215 | }
216 | 
217 | def _sbint16(x): return struct.unpack('>h', x)[0]
218 | def _ubint16(x): return struct.unpack('>H', x)[0]
219 | def _sbint32(x): return struct.unpack('>i', x)[0]
220 | 
221 | # name to opcode table
222 | _names = dict((v if type(v) == str else v[0], k) for k, v in _table.items())
223 | 
224 | # opcodes which are valid for the "wide" instruction with length 3
225 | _wide_opcodes = sorted(_names[x] for x in ('iload', 'fload', 'aload', 'lload',
226 |     'dload', 'istore', 'fstore', 'astore', 'lstore', 'dstore', 'ret'))
227 | 
228 | # opcode which is valid for the "wide" instruction with length 5
229 | _wide_inc = _names['iinc']
230 | 
231 | # opcodes which have a two-byte index into the constant pool (and no further
232 | # arguments)
233 | _index_opcodes = sorted(_names[x] for x in ('anewarray', 'checkcast',
234 |     'getfield', 'getstatic', 'instanceof', 'invokespecial', 'invokestatic',
235 |     'invokevirtual', 'ldc_w', 'ldc2_w', 'new', 'putfield', 'putstatic'))
236 | 
237 | # all opcodes that take two branch bytes; "goto", "jsr", and all opcodes
238 | # starting with "if"
239 | _branch_opcodes = sorted(_names[x] for x in _names if x[:2] == 'if' or
240 |     x == 'goto' or x == 'jsr')
241 | 
242 | _primitive_types = {
243 |     10: 'int',
244 |     8: 'byte',
245 |     11: 'long',
246 |     7: 'double',
247 |     6: 'float',
248 |     5: 'char',
249 |     9: 'short',
250 | }
251 | 
252 | _other_opcodes = {
253 |     'bipush': lambda ch, d, o: Instruction(name=_table[ch], length=2,
254 |         value=ord(d[o+1]), rep='%s %d' % (_table[ch], ord(d[o+1]))),
255 |     'sipush': lambda ch, d, o: Instruction(name=_table[ch], length=3,
256 |         value=_sbint16(d[o+1:o+3]), rep='%s %d' % (_table[ch],
257 |         _sbint16(d[o+1:o+3]))),
258 |     'lookupswitch': lambda ch, d, o: None,
259 |     'tableswitch': lambda ch, d, o: None,
260 |     'newarray': lambda ch, d, o: Instruction(name=_table[ch], length=2,
261 |         value=ord(d[o+1]), rep='%s %s' % (_table[ch],
262 |         _primitive_types[ord(d[o+1])])),
263 |     'goto_w': lambda ch, d, o: Instruction(name=_table[ch], length=5,
264 |         value=_sbint32(d[o+1:o+5]), rep='%s %d' % (_table[ch],
265 |         _sbint32(d[o+1:o+5]))),
266 |     'invokedynamic': lambda ch, d, o: None,
267 |     'invokeinterface': lambda ch, d, o: None,
268 |     'jsr_w': lambda ch, d, o: Instruction(name=_table[ch], length=5,
269 |         value=_sbint32(d[o+1:o+5]), rep='%s %d' % (_table[ch],
270 |         _sbint32(d[o+1:o+5]))),
271 |     'multianewarray': lambda ch, d, o: Instruction(name=_table[ch], length=4,
272 |         cp=_ubint16(d[o+1:o+3]), value=ord(d[o+3]), rep='%s #%d %d' % (
273 |         _table[ch], _ubint16(d[o+1:o+3]), ord(d[o+3]))),
274 |     'ldc': lambda ch, d, o: Instruction(name=_table[ch], length=2,
275 |         cp=ord(d[o+1]), rep='%s #%d' % (_table[ch], ord(d[o+1]))),
276 | }
277 | 
278 | # convert the opcode names of _other_opcodes into opcode indices
279 | _other_opcodes = dict((_names[k], v) for k, v in _other_opcodes.items())
280 | 
281 | class Instruction:
282 |     def __init__(self, name=None, cp=None, local=None, length=None,
283 |             value=None, rep=None):
284 |         self.name = name
285 |         self.cp = cp
286 |         self.local = local
287 |         self.length = length
288 |         self.value = value
289 |         self.rep = rep
290 | 
291 |     def __str__(self):
292 |         return self.rep or self.name
293 | 
294 |     def __repr__(self):
295 |         ret = ['name="%s"' % self.name]
296 |         if self.cp:
297 |             ret += ['cp=%s' % self.cp]
298 |         if self.local:
299 |             ret += ['local=%d' % self.local]
300 |         if self.length:
301 |             ret += ['length=%d' % self.length]
302 |         if self.value:
303 |             ret += ['value="%s"' % self.value]
304 |         if self.rep:
305 |             ret += ['rep="%s"' % self.rep]
306 |         return 'Instruction(%s)' % ', '.join(ret)
307 | 
308 | def disassemble(data, offset=0):
309 |     # opcode
310 |     ch = ord(data[offset])
311 | 
312 |     # the "wide" instruction
313 |     if ch == 0xc4:
314 |         ch2 = ord(data[offset+1])
315 | 
316 |         if ch2 == _wide_inc:
317 |             idx = _ubint16(data[offset+2:offset+4])
318 |             val = _sbint16(data[offset+4:offset+6])
319 |             return Instruction(name=_table[ch2], local=idx, length=6,
320 |                 value=val, rep='%s v%d, %d' % (_table[ch2], idx, val))
321 |         elif ch2 in _wide_opcodes:
322 |             idx = _ubint16(data[offset+2:offset+4])
323 |             return Instruction(name=_table[ch2], local=idx, length=4,
324 |                 rep='%s v%d' % (_table[ch2], idx))
325 |         else:
326 |             return None
327 | 
328 |     # if the opcode is in _wide_opcodes then it loads or stores a local
329 |     if ch in _wide_opcodes:
330 |         return Instruction(name=_table[ch], length=2,
331 |             local=ord(data[offset+1]), rep='%s v%d' % (_table[ch],
332 |             ord(data[offset+1])))
333 | 
334 |     # instructions which only have an index into the constant pool as argument
335 |     if ch in _index_opcodes:
336 |         return Instruction(name=_table[ch], length=3,
337 |             cp=_ubint16(data[offset+1:offset+3]),
338 |             rep='%s #%d' % (_table[ch],
339 |             _ubint16(data[offset+1:offset+3])))
340 | 
341 |     # branch instructions that take a two-byte branch offset
342 |     if ch in _branch_opcodes:
343 |         return Instruction(name=_table[ch], length=3,
344 |             value=_sbint16(data[offset+1:offset+3]),
345 |             rep='%s %d' % (_table[ch],
346 |             _sbint16(data[offset+1:offset+3])))
347 | 
348 |     # other opcodes which have to be handled independently
349 |     if ch in _other_opcodes:
350 |         return _other_opcodes[ch](ch, data, offset)
351 | 
352 |     # if the entry in the table is a string, then it's an instruction without
353 |     # anything special, so we can simply return it
354 |     if ch in _table:
355 |         return Instruction(name=_table[ch], length=1)
356 | 
357 |     # unknown opcode
358 |     return None
359 | 


--------------------------------------------------------------------------------
/tests.py:
--------------------------------------------------------------------------------
  1 | 
  2 | """
  3 | 
  4 | Unittests that verify the integrity of pyasm2.
  5 | 
  6 | """
  7 | 
  8 | from pyasm2.x86 import *
  9 | import unittest
 10 | 
 11 | class CheckSyntax(unittest.TestCase):
 12 |     def test_syntax(self):
 13 |         eq = self.assertEqual
 14 |         ra = self.assertRaises
 15 | 
 16 |         eq(str(dword[eax]), 'dword [eax]')
 17 |         eq(str(byte[eax+eax*4]), 'byte [eax+eax*4]')
 18 |         eq(str(word[0xdeadf00d+8*esi+esp]), 'word [esp+esi*8+0xdeadf00d]')
 19 |         eq(str(eax+esi), '[eax+esi]')
 20 |         eq(str(eax+ecx*1), '[ecx+eax]')
 21 |         eq(str(dword[0x00112233]), 'dword [0x112233]')
 22 |         ra(AssertionError, lambda: eax+eax+eax)
 23 |         ra(AssertionError, lambda: esp*8)
 24 |         eq(0xb00b+ebp*8+ebx, ebx+ebp*8+0xb00b)
 25 |         ra(AssertionError, lambda: eax+0x111223344)
 26 |         #eq(str(dword[cs:eax+ebx]), 'dword [cs:eax+ebx]')
 27 |         eq(dword[cs:0x13371337], dword[cs:0x13371337])
 28 |         #eq(str(dword[cs:0xdeadf00d]), 'dword [cs:0xdeadf00d]')
 29 |         eq(dword[eax-0x1000], dword[eax+0xfffff000])
 30 | 
 31 |     def test_modrm(self):
 32 |         eq = self.assertEqual
 33 |         m = Instruction().modrm
 34 | 
 35 |         eq(m(eax, dword[eax]), '\x00')
 36 |         eq(m(ecx, dword[ebx]), m(dword[ebx], ecx))
 37 |         eq(m(esi, dword[esp+ebp*8+0x11223344]), '\xb4\xec\x44\x33\x22\x11')
 38 |         eq(m(eax, dword[ebp]), '\x45\x00')
 39 |         eq(m(edi, dword[esp]), '\x3c\x24')
 40 |         eq(m(dword[esi+eax], ebx), '\x1c\x06')
 41 |         eq(m(esi, dword[edi]), '\x37')
 42 |         eq(m(ecx, dword[edx+ebp+0xdeadf00d]), '\x8c\x2a\x0d\xf0\xad\xde')
 43 |         eq(m(edi, dword[esi*8]), '\x3c\xf5\x00\x00\x00\x00')
 44 |         eq(m(edx, dword[ebp+eax*4]), '\x54\x85\x00')
 45 |         eq(m(eax, dword[eax+0x7f]), '\x40\x7f')
 46 |         eq(m(eax, dword[eax+0x80]), '\x80\x80\x00\x00\x00')
 47 |         eq(m(eax, dword[eax-0x80]), '\x40\x80')
 48 |         eq(m(eax, dword[eax-0x81]), '\x80\x7f\xff\xff\xff')
 49 |         eq(m(eax, dword[eax-2]), '\x40\xfe')
 50 |         eq(m(eax, dword[eax+0x40]), '\x40\x40')
 51 |         eq(m(eax, ebx), '\xc3')
 52 |         eq(m(esi, edi), '\xf7')
 53 | 
 54 |     def test_pack(self):
 55 |         eq = self.assertEqual
 56 | 
 57 |         eq(byte.pack(1), '\x01')
 58 |         eq(word.pack(1), '\x01\x00')
 59 |         eq(dword.pack(1), '\x01\x00\x00\x00')
 60 |         eq(qword.pack(1), '\x01\x00\x00\x00\x00\x00\x00\x00')
 61 | 
 62 |     def test_instructions(self):
 63 |         eq = lambda i, s, b: (self.assertEqual(repr(i), s,
 64 |             'Invalid string representation for: ' + repr(i)),
 65 |             self.assertEqual(str(i), b, 'Invalid encoding for: ' +
 66 |                 repr(i) + ' -> ' + repr(str(i))))
 67 |         ra = self.assertRaises
 68 | 
 69 |         eq(retn(), 'retn', '\xc3')
 70 |         eq(nop(), 'nop', '\x90')
 71 |         eq(retn(0x80), 'retn 0x80', '\xc2\x80\x00')
 72 | 
 73 |         eq(mov(eax, 0xdeadf00d), 'mov eax, 0xdeadf00d', '\xb8\x0d\xf0\xad\xde')
 74 |         eq(mov(esi, 0x11223344), 'mov esi, 0x11223344', '\xbe\x44\x33\x22\x11')
 75 |         eq(mov(edi, dword [esp+ebx*4+0x0c]), 'mov edi, dword [esp+ebx*4+0xc]',
 76 |             '\x8b\x7c\x9c\x0c')
 77 |         eq(mov(dword[ebp+0x30], ecx), 'mov dword [ebp+0x30], ecx',
 78 |             '\x89\x4d\x30')
 79 | 
 80 |         eq(push(ebx), 'push ebx', '\x53')
 81 |         eq(xchg(ebp, eax), 'xchg ebp, eax', '\x95')
 82 |         eq(push(edi), 'push edi', '\x57')
 83 |         eq(pop(ebx), 'pop ebx', '\x5b')
 84 |         eq(inc(edx), 'inc edx', '\x42')
 85 |         eq(dec(esi), 'dec esi', '\x4e')
 86 | 
 87 |         eq(test(ecx, ecx), 'test ecx, ecx', '\x85\xc9')
 88 |         eq(xchg(esi, esp), 'xchg esi, esp', '\x87\xe6')
 89 | 
 90 |         eq(pshufd(xmm4, oword[edx], 0x11), 'pshufd xmm4, oword [edx], 0x11',
 91 |             '\x66\x0f\x70\x22\x11')
 92 |         eq(pshufd(xmm2, xmm0, 0x40), 'pshufd xmm2, xmm0, 0x40',
 93 |             '\x66\x0f\x70\xd0\x40')
 94 | 
 95 |         eq(paddd(xmm2, xmm5), 'paddd xmm2, xmm5', '\x66\x0f\xfe\xd5')
 96 | 
 97 |         ra(Exception, lambda: paddd(xmm0, eax))
 98 |         ra(Exception, lambda: mov(eax, xmm1))
 99 |         ra(Exception, lambda: mov(eax, byte[ebx]))
100 | 
101 |         eq(inc(ecx, lock=True), 'lock inc ecx', '\xf0\x41')
102 |         eq(stosd(rep=True), 'rep stosd', '\xf3\xab')
103 |         eq(scasb(repne=True), 'repne scasb', '\xf2\xae')
104 | 
105 |         eq(lea(eax, [esp+eax*2+0x42]), 'lea eax, [esp+eax*2+0x42]',
106 |             '\x8d\x44\x44\x42')
107 | 
108 |         eq(mov(dword[ebx+0x44332211], 0x88776655),
109 |             'mov dword [ebx+0x44332211], 0x88776655', '\xc7\x83' + ''.join(
110 |             map(chr, range(0x11, 0x99, 0x11))))
111 | 
112 |         eq(movss(xmm6, xmm3), 'movss xmm6, xmm3', '\xf3\x0f\x10\xf3')
113 |         eq(movd(xmm7, edi), 'movd xmm7, edi', '\x66\x0f\x6e\xff')
114 |         eq(pand(xmm4, oword [ecx]), 'pand xmm4, oword [ecx]',
115 |             '\x66\x0f\xdb\x21')
116 |         eq(movapd(xmm6, oword [ebx]), 'movapd xmm6, oword [ebx]',
117 |             '\x66\x0f\x28\x33')
118 | 
119 |         eq(add(byte[eax], 0x42), 'add byte [eax], 0x42', '\x80\x00\x42')
120 |         eq(cmp_(dword[esp+ecx*8+0x0c], 0x42),
121 |             'cmp dword [esp+ecx*8+0xc], 0x42', '\x83\x7c\xcc\x0c\x42')
122 |         eq(cmp_(byte[ebx], 0x13), 'cmp byte [ebx], 0x13', '\x80\x3b\x13')
123 |         eq(mov(byte[ecx], 0x37), 'mov byte [ecx], 0x37', '\xc6\x01\x37')
124 |         eq(add(eax, 1), 'add eax, 0x1', '\x83\xc0\x01')
125 |         eq(mov(bl, 1), 'mov bl, 0x1', '\xb3\x01')
126 |         eq(add(eax, 0x1111), 'add eax, 0x1111', '\x05\x11\x11\x00\x00')
127 |         eq(add(ebx, 0x2222), 'add ebx, 0x2222', '\x81\xc3\x22\x22\x00\x00')
128 |         eq(push(es), 'push es', '\x06')
129 |         eq(push(0x42), 'push 0x42', '\x6a\x42')
130 |         eq(push(0x111), 'push 0x111', '\x68\x11\x01\x00\x00')
131 |         eq(push(dword[2]), 'push dword [0x2]', '\xff\x35\x02\x00\x00\x00')
132 |         eq(push(dword[esp+edx*2]), 'push dword [esp+edx*2]', '\xff\x34\x54')
133 |         eq(pop(eax), 'pop eax', '\x58')
134 |         eq(pop(dword[edx]), 'pop dword [edx]', '\x8f\x02')
135 |         eq(pop(dword[6]), 'pop dword [0x6]', '\x8f\x05\x06\x00\x00\x00')
136 |         eq(pop(ss), 'pop ss', '\x17')
137 |         eq(rol(ebx, 1), 'rol ebx, 0x1', '\xd1\xc3')
138 |         eq(rol(ebx, 2), 'rol ebx, 0x2', '\xc1\xc3\x02')
139 |         eq(rol(edx, cl), 'rol edx, cl', '\xd3\xc2')
140 |         eq(xor(edx, esi), 'xor edx, esi', '\x31\xf2')
141 |         eq(shl(esi, 4), 'shl esi, 0x4', '\xc1\xe6\x04')
142 |         eq(sal(esi, 4), 'sal esi, 0x4', '\xc1\xe6\x04')
143 |         eq(xchg(byte[esp+0x42], al), 'xchg byte [esp+0x42], al',
144 |             '\x86\x44\x24\x42')
145 |         eq(xchg(al, byte[esp+0x42]), 'xchg byte [esp+0x42], al',
146 |             '\x86\x44\x24\x42')
147 |         eq(div(eax), 'div eax', '\xf7\xf0')
148 |         eq(movzx(eax, byte [1]), 'movzx eax, byte [0x1]',
149 |             '\x0f\xb6\x05\x01\x00\x00\x00')
150 |         eq(movsx(eax, al), 'movsx eax, al', '\x0f\xbe\xc0')
151 | 
152 |         eq(add(ecx, 0xff), 'add ecx, 0xff', '\x81\xc1\xff\x00\x00\x00')
153 |         eq(add(ecx, -0x1), 'add ecx, -0x1', '\x83\xc1\xff')
154 |         eq(imul(eax, ecx, 0xff), 'imul eax, ecx, 0xff', '\x69\xc1\xff\x00\x00\x00')
155 |         eq(imul(eax, ecx, -0x1), 'imul eax, ecx, -0x1', '\x6b\xc1\xff')
156 |         eq(cmp_(eax, 0xff), 'cmp eax, 0xff', '\x3d\xff\x00\x00\x00')
157 |         eq(cmp_(eax, -0x1), 'cmp eax, -0x1', '\x83\xf8\xff')
158 | 
159 |     def test_block(self):
160 |         eq = lambda i, s, b: (self.assertEqual(repr(i), s,
161 |             'Invalid string representation for: ' + repr(i)),
162 |             self.assertEqual(str(i), b, 'Invalid encoding for: ' +
163 |                 str(i) + ' -> ' + repr(str(i))))
164 | 
165 |         eq2 = lambda i, s, b: (self.assertEqual(repr(i), s,
166 |             'Invalid string representation for: ' + repr(i)),
167 |             self.assertEqual(i.assemble(), b, 'Invalid encoding for: ' +
168 |                 repr(b) + ' -> ' + repr(i.assemble())))
169 | 
170 |         eq(block(mov(eax, 1), mov(ebx, 1)), 'mov eax, 0x1\nmov ebx, 0x1\n',
171 |             '\xb8\x01\x00\x00\x00\xbb\x01\x00\x00\x00')
172 | 
173 |         b = block(mov(eax, ebx))
174 |         b += mov(ecx, edx)
175 |         eq(b, 'mov eax, ebx\nmov ecx, edx\n', '\x8b\xc3\x8b\xca')
176 | 
177 |         c = block(mov(esi, dword[eax]), scasb(rep=True))
178 |         eq(c, 'mov esi, dword [eax]\nrep scasb\n', '\x8b\x30\xf3\xae')
179 | 
180 |         b += c
181 |         b_s = 'mov eax, ebx\nmov ecx, edx\nmov esi, dword [eax]\nrep scasb\n'
182 |         b_e = '\x8b\xc3\x8b\xca\x8b\x30\xf3\xae'
183 |         eq(b, b_s, b_e)
184 | 
185 |         d = block(xor(eax, eax), lbl, inc(eax), cmp_(eax, 0x10), jnz(lbl(-1)))
186 |         eq2(d, 'xor eax, eax\n__lbl_0:\ninc eax\ncmp eax, 0x10\njnz __lbl_0\n',
187 |             '\x31\xc0\x40\x83\xf8\x10\x0f\x85\xf6\xff\xff\xff')
188 | 
189 |         # blocks allow instructions / labels without actually creating an
190 |         # instance if that's not required, e.g. instructions that don't take
191 |         # any operators
192 |         eq2(block(jmp(lbl(1)), nop, lbl, retn),
193 |             'jmp __lbl_0\nnop\n__lbl_0:\nretn\n',
194 |             '\xe9\x01\x00\x00\x00\x90\xc3')
195 | 
196 |         # simulation of jmp(lbl(0))
197 |         eq2(block(lbl, jmp(lbl(-1))), '__lbl_0:\njmp __lbl_0\n',
198 |             '\xe9\xfb\xff\xff\xff')
199 | 
200 |         # partially unrolling a useless loop, to show "merging" of blocks.
201 |         e_init = block(xor(ebx, ebx), mov(ecx, 0x40))
202 |         e_init_s = 'xor ebx, ebx\nmov ecx, 0x40\n'
203 |         e_init_e = '\x31\xdb\xb9\x40\x00\x00\x00'
204 |         eq2(e_init, e_init_s, e_init_e)
205 |         eq2(block(e_init), e_init_s, e_init_e)
206 | 
207 |         e_end = block(mov(eax, dword[esp+8]), retn)
208 |         e_end_s = 'mov eax, dword [esp+0x8]\nretn\n'
209 |         e_end_e = '\x8b\x44\x24\x08\xc3'
210 |         eq2(e_end, e_end_s, e_end_e)
211 | 
212 |         eq2(block(e_init, b, b, b, b, e_end), e_init_s + b_s * 4 + e_end_s,
213 |             e_init_e + b_e * 4 + e_end_e)
214 | 
215 |         # global named labels
216 |         eq2(block(lbl('loop'), inc(eax), jmp(lbl('loop'))),
217 |             '__lbl_loop:\ninc eax\njmp __lbl_loop\n',
218 |             '\x40\xe9\xfa\xff\xff\xff')
219 | 
220 |         # local named labels
221 |         block.block_id = 0
222 |         eq2(block('loop', inc(eax), inc(ebx), jmp('loop')),
223 |             '__lbl_1_loop:\ninc eax\ninc ebx\njmp __lbl_1_loop\n',
224 |             '\x40\x43\xe9\xf9\xff\xff\xff')
225 | 
226 |         # tweaked anonymous label references
227 |         block.block_id = 0
228 |         eq2(block(lbl, inc(dword[esp]), jmp(lbl-1)),
229 |             '__lbl_0:\ninc dword [esp]\njmp __lbl_0\n',
230 |             '\xff\x04\x24\xe9\xf8\xff\xff\xff')
231 | 
232 |         # temporary blocks as lists
233 |         a = block(mov(eax, ebx), mov(ebx, 42))
234 |         eq2(a + [xor(ecx, ecx), retn],
235 |             'mov eax, ebx\nmov ebx, 0x2a\nxor ecx, ecx\nretn\n',
236 |             '\x8b\xc3\xbb\x2a\x00\x00\x00\x31\xc9\xc3')
237 | 
238 |         # combining instructions directly
239 |         eq2(mov(eax, ebx) + mov(ebx, 42), 'mov eax, ebx\nmov ebx, 0x2a\n',
240 |             '\x8b\xc3\xbb\x2a\x00\x00\x00')
241 | 
242 |         # merging blocks with relative jumps
243 |         #eq(block(d, d, d), 'xor eax, eax\n__lbl_0:\ninc eax\ncmp eax, 0x10\n' +
244 |         #    'jnz __lbl_0\nxor eax, eax\n__lbl_1:\ninc eax\ncmp eax, 0x10\n' +
245 |         #    'jnz __lbl_1\nxor eax, eax\n__lbl_2:\ninc eax\ncmp eax, 0x10\n' +
246 |         #    'jnz __lbl_2',
247 |         #    '\x31\xc0\x40\x83\xf8\x10\x0f\x85\xf6\xff\xff\xff' * 3)
248 | 
249 |     def test_optimization(self):
250 |         eq = lambda i, s, b: (self.assertEqual(repr(i), s,
251 |             'Invalid string representation for: ' + repr(i)),
252 |             self.assertEqual(str(i), b, 'Invalid encoding for: ' +
253 |                 str(i) + ' -> ' + repr(str(i))))
254 | 
255 |         # [ebx*2] -> [ebx+ebx]
256 |         eq(mov(eax, dword[ebx*2+3]), 'mov eax, dword [ebx+ebx+0x3]',
257 |             '\x8b\x44\x1b\x03')
258 | 
259 | if __name__ == '__main__':
260 |     unittest.main(verbosity=2)
261 | 


--------------------------------------------------------------------------------
/x86.py:
--------------------------------------------------------------------------------
   1 | """
   2 | 
   3 | pyasm2 - x86 assembler library          (C) 2012 Jurriaan Bremer
   4 | 
   5 | Although its called pyasm2, this is not per se a successor of Pyasm or pyASM.
   6 | pyasm2 aims to be as flexible as possible, it will support x86, SSE and SSE2.
   7 | 
   8 | A key feature of pyasm2 is the ability to have blocks of instructions and
   9 | being able to give the base address at a later time, that is, you don't need
  10 | to know the address of instructions before-hand. For example, you can construct
  11 | a series of instructions, request the size that will be needed in order to
  12 | store all instructions as sequence, allocate this memory and write the
  13 | instructions from there, this approach is very useful when making JIT
  14 | compilers etc.
  15 | 
  16 | The syntax of pyasm2 is supposed to be as simple as possible.
  17 | 
  18 | """
  19 | import struct
  20 | import types
  21 | 
  22 | 
  23 | class Immediate:
  24 |     """Defines Immediates, immediates can also be used as addresses."""
  25 |     def __init__(self, value=0, addr=False, signed=False):
  26 |         self.value = value
  27 |         self.addr = addr
  28 | 
  29 |         if signed:
  30 |             if -2**7 <= value < 2**7:
  31 |                 self.size = byte.size
  32 |             elif -2**15 <= value < 2**15:
  33 |                 self.size = word.size
  34 |             else:
  35 |                 self.size = dword.size
  36 |                 self.value = (2**31 + value) % 2**32 - 2**31
  37 |         else:
  38 |             if 0 <= value < 2**8:
  39 |                 self.size = byte.size
  40 |             elif 0 <= value < 2**16:
  41 |                 self.size = word.size
  42 |             else:
  43 |                 self.size = dword.size
  44 |                 self.value = value % 2**32
  45 | 
  46 |     def __int__(self):
  47 |         return self.value
  48 | 
  49 |     def __long__(self):
  50 |         return self.value
  51 | 
  52 |     def __cmp__(self, other):
  53 |         return self.value != int(other)
  54 | 
  55 |     def __str__(self):
  56 |         if self.value < 0:
  57 |             return '-0x%x' % -self.value
  58 |         else:
  59 |             return '0x%x' % self.value
  60 | 
  61 | class SignedImmediate(Immediate):
  62 |     """Defines Signed Immediates."""
  63 |     def __init__(self, value=0, addr=False):
  64 | 		Immediate.__init__(self, value, addr, signed=True)
  65 | 
  66 | 
  67 | class SegmentRegister:
  68 |     """Defines the Segment Registers."""
  69 |     def __init__(self, index, name):
  70 |         self.index = index
  71 |         self.name = name
  72 | 
  73 |     def __str__(self):
  74 |         return self.name
  75 | 
  76 |     def __repr__(self):
  77 |         return self.name
  78 | 
  79 |     def __index__(self):
  80 |         return self.index
  81 | 
  82 | # make an alias `imm' to Immediate in order to simplify the creation of
  83 | # Instruction's
  84 | imm = Immediate
  85 | signed_imm = SignedImmediate
  86 | 
  87 | # define each segment register.
  88 | es = SegmentRegister(0, 'es')
  89 | cs = SegmentRegister(1, 'cs')
  90 | ss = SegmentRegister(2, 'ss')
  91 | ds = SegmentRegister(3, 'ds')
  92 | fs = SegmentRegister(4, 'fs')
  93 | gs = SegmentRegister(5, 'gs')
  94 | 
  95 | # array of segment registers, according to their index
  96 | SegmentRegister.register = (es, cs, ss, ds, fs, gs)
  97 | 
  98 | 
  99 | class MemoryAddress:
 100 |     def __init__(self, size=None, segment=None, reg1=None, reg2=None,
 101 |                  mult=None, disp=None):
 102 |         """Create a new Memory Address."""
 103 |         # check if a register is valid..
 104 |         f = lambda x: x is None or isinstance(x, (gpr, xmm))
 105 |         if not size:
 106 |             size = None
 107 |         assert size is None or size in (8, 16, 32, 64, 128)
 108 |         assert segment is None or isinstance(segment, SegmentRegister)
 109 |         f(reg1)
 110 |         f(reg2)
 111 |         assert mult is None or mult in (1, 2, 4, 8)
 112 |         assert disp is None or int(disp) >= 0 and int(disp) < 2**32
 113 | 
 114 |         self.size = size
 115 |         self.segment = segment
 116 |         self.reg1 = reg1
 117 |         self.reg2 = reg2
 118 |         self.mult = mult
 119 |         self.disp = Immediate(disp) if isinstance(disp, (int, long)) else disp
 120 | 
 121 |         self.clean()
 122 | 
 123 |     def clean(self):
 124 |         """Makes sure that the internal representation of the Memory Address
 125 |             is as easy as possible.
 126 | 
 127 |         For example, we don't want `esp' in `reg2' (and `esp' can't have a
 128 |         `mult' other than one. Neither do we want to have a `reg2' with `mult'
 129 |         1 when `reg1' is None.
 130 | 
 131 |         Note that we can't use `esp' directly, because it's not initialized
 132 |         the first time(s) we call this function, therefore we use its index,
 133 |         which is 4.
 134 | 
 135 |         """
 136 |         # `esp' can't have a multiplier other than one.
 137 |         if self.reg2 is not None:
 138 |             assert self.reg2.index != 4 or self.mult == 1
 139 | 
 140 |         # swap registers if `reg2' contains `esp'
 141 |         if self.reg2 is not None and self.reg2.index == 4:
 142 |             self.reg1, self.reg2 = self.reg2, self.reg1
 143 | 
 144 |         # store `reg2' as `reg1' if `reg1' is None and `mult' is one.
 145 |         if self.reg1 is None and self.mult == 1:
 146 |             self.reg1, self.reg2, self.mult = self.reg2, None, None
 147 | 
 148 |         return self
 149 | 
 150 |     def final_clean(self):
 151 |         """Special clean function to clean and/or optimize right before
 152 |             assembling this Memory Address.
 153 | 
 154 |         When `reg1' is None, `mult' is two and `reg2' is not `esp', then we
 155 |         can optimize it by using `reg1', ie [eax*2] -> [eax+eax].
 156 | 
 157 |         """
 158 |         if self.reg1 is None and self.mult == 2 and self.reg2 != esp:
 159 |             self.reg1, self.mult = self.reg2, 1
 160 | 
 161 |     def merge(self, other, add=True):
 162 |         """Merge self with a Displacement, Register or Memory Address."""
 163 |         # it is not possible to merge with one of the predefined Memory
 164 |         # Addresses
 165 |         assert id(self) not in map(id, (byte, word, dword, qword, oword))
 166 | 
 167 |         if isinstance(other, (int, long, Immediate)):
 168 |             assert int(other) >= 0 and int(other) < 2**32
 169 |             assert self.disp is None
 170 | 
 171 |             if add:
 172 |                 self.disp = other
 173 |             else:
 174 |                 self.disp = -other
 175 | 
 176 |             return self.clean()
 177 | 
 178 |         if isinstance(other, (GeneralPurposeRegister, XmmRegister)):
 179 |             assert self.reg1 is None or self.reg2 is None
 180 | 
 181 |             if self.reg1 is None:
 182 |                 self.reg1 = other
 183 |             else:
 184 |                 self.reg2 = other
 185 |                 self.mult = 1
 186 | 
 187 |             return self.clean()
 188 | 
 189 |         if isinstance(other, MemoryAddress):
 190 |             assert self.size is None or other.size is None
 191 |             assert self.segment is None or other.segment is None
 192 |             assert self.disp is None or other.disp is None
 193 | 
 194 |             if self.size is None:
 195 |                 self.size = other.size
 196 | 
 197 |             if self.segment is None:
 198 |                 self.segment = other.segment
 199 | 
 200 |             reg1, reg2 = other.reg1, other.reg2
 201 | 
 202 |             if self.reg1 is None:
 203 |                 if reg1 is not None:
 204 |                     self.reg1, reg1 = reg1, None
 205 |                 elif reg2 is not None and other.mult == 1:
 206 |                     self.reg1, reg2 = reg2, None
 207 | 
 208 |             if self.reg2 is None:
 209 |                 if reg1 is not None:
 210 |                     self.reg2, self.mult, reg1 = reg1, 1, None
 211 |                 elif reg2 is not None:
 212 |                     self.reg2, self.mult, reg2 = reg2, other.mult, None
 213 | 
 214 |             assert reg1 is None and reg2 is None
 215 | 
 216 |             if self.disp is None:
 217 |                 self.disp = other.disp
 218 | 
 219 |             return self.clean()
 220 | 
 221 |         raise Exception('Invalid Parameter')
 222 | 
 223 |     def __index__(self):
 224 |         """Encode a Memory Address as index.
 225 | 
 226 |         We have to be able to encode a Memory Address into an integer in
 227 |         order to use slices (which we do for instruction that use segment
 228 |         register.)
 229 | 
 230 |         Memory Layout is as following (displacement has to be the lower 32
 231 |         bits in the event that something like `dword [cs:0x401000]' is used.)
 232 |         32 bits - displacement
 233 |         4  bits - reg1
 234 |         4  bits - reg2
 235 |         3  bits - mult
 236 | 
 237 |         If the displacement is None, it will be encoded as 0, and will be
 238 |         decoded as None later.
 239 |         General Purpose Registers are encoded as their `index' increased with
 240 |         one, or 0 if None.
 241 |         Multiplication is encoded using a table, which can be found below.
 242 | 
 243 |         """
 244 |         mults = {None: 0, 1: 1, 2: 2, 4: 3, 8: 4}
 245 |         # for encoding general purpose registers
 246 |         f = lambda x: x.index + 1 if x is not None else 0
 247 |         return \
 248 |             (int(self.disp) if self.disp is not None else 0) + \
 249 |             (f(self.reg1) << 32) + \
 250 |             (f(self.reg2) << 36) + \
 251 |             (mults[self.mult] << 40)
 252 | 
 253 |     def _decode_index(self, index):
 254 |         """Decodes a Memory Address encoded with __index__()."""
 255 |         mults = (None, 1, 2, 4, 8)
 256 |         # for decoding general purpose registers
 257 |         f = lambda x, y: y.register32[x-1] if x else None
 258 |         return MemoryAddress(disp=index % 2**32 if index % 2**32 else None,
 259 |                              reg1=f((index >> 32) % 2**4, gpr),
 260 |                              reg2=f((index >> 36) % 2**4, gpr),
 261 |                              mult=mults[(index >> 40) % 2**3])
 262 | 
 263 |     def __getitem__(self, key):
 264 |         """Item or Slice to this MemoryAddress size.
 265 | 
 266 |         A slice, represented as [segment:address], defines a segment register
 267 |         and an address, the address is a combination of Displacements and
 268 |         General Purpose Registers (optionally with multiplication.)
 269 | 
 270 |         An item, represented as [address], only defines an address.
 271 | 
 272 |         """
 273 |         if isinstance(key, slice):
 274 |             ma = MemoryAddress(size=self.size,
 275 |                                segment=SegmentRegister.register[key.start])
 276 |             return ma.merge(self._decode_index(key.stop))
 277 |         else:
 278 |             return MemoryAddress(size=self.size).merge(key)
 279 | 
 280 |     def __add__(self, other):
 281 |         """self + other"""
 282 |         return self.merge(other)
 283 | 
 284 |     def __radd__(self, other):
 285 |         """other + self"""
 286 |         return self.merge(other)
 287 | 
 288 |     def __sub__(self, other):
 289 |         """self - other"""
 290 |         return self.merge(other, add=False)
 291 | 
 292 |     def __rsub__(self, other):
 293 |         """other - self"""
 294 |         return self.merge(other, add=False)
 295 | 
 296 |     def __str__(self):
 297 |         """Representation of this Memory Address."""
 298 |         sizes = {8: 'byte', 16: 'word', 32: 'dword', 64: 'qword', 128: 'oword'}
 299 |         s = ''
 300 |         if self.reg1 is not None:
 301 |             s += str(self.reg1)
 302 |         if self.reg2 is not None:
 303 |             q = str(self.reg2) if self.mult == 1 else \
 304 |                 str(self.reg2) + '*' + str(self.mult)
 305 |             s += q if not len(s) else '+' + q
 306 |         if self.disp:
 307 |             if self.disp >= 0:
 308 |                 q = '0x%x' % int(self.disp)
 309 |             else:
 310 |                 q = '-0x%x' % -int(self.disp)
 311 |             if not len(s) or q[0] == '-':
 312 |                 s += q
 313 |             else:
 314 |                 s += '+' + q
 315 |         if self.size is not None:
 316 |             if self.segment is not None:
 317 |                 return '%s [%s:%s]' % (sizes[self.size], str(self.segment), s)
 318 |             else:
 319 |                 return '%s [%s]' % (sizes[self.size], s)
 320 |         return '[%s]' % s if self.segment is None else \
 321 |             '[%s:%s]' % (str(self.segment), s)
 322 | 
 323 |     def __repr__(self):
 324 |         """Representation of this Memory Address."""
 325 |         return self.__str__()
 326 | 
 327 |     def __cmp__(self, other):
 328 |         """Check if two elements are the same, or not."""
 329 |         return 0 if self.size == other.size and \
 330 |             self.segment == other.segment and \
 331 |             self.reg1 == other.reg1 and self.reg2 == other.reg2 and \
 332 |             self.mult == other.mult and self.disp == other.disp else -1
 333 | 
 334 |     def pack(self, value):
 335 |         """Pack a value depending on the `size' of this Memory Address."""
 336 |         assert self.size is not None
 337 | 
 338 |         fmt = {8: 'B', 16: 'H', 32: 'I', 64: 'Q'}
 339 | 
 340 |         # convert the value, if it's negative.
 341 |         value = int(value) if int(value) >= 0 else int(value) + 2**self.size
 342 | 
 343 |         return struct.pack(fmt[self.size], value)
 344 | 
 345 | # define the size for the memory addresses
 346 | byte = MemoryAddress(size=8)
 347 | word = MemoryAddress(size=16)
 348 | dword = MemoryAddress(size=32)
 349 | qword = MemoryAddress(size=64)
 350 | oword = MemoryAddress(size=128)
 351 | 
 352 | # make an alias `mem' to MemoryAddress in order to simplify the creation of
 353 | # Instruction's
 354 | mem = MemoryAddress
 355 | 
 356 | 
 357 | class GeneralPurposeRegister:
 358 |     """Defines the General Purpose Registers."""
 359 |     def __init__(self, index, name, size):
 360 |         self.index = index
 361 |         self.name = name
 362 |         self.size = size.size
 363 | 
 364 |     def __add__(self, other):
 365 |         """self + other"""
 366 |         if isinstance(other, (int, long, Immediate)):
 367 |             return MemoryAddress(reg1=self, disp=other)
 368 |         if isinstance(other, GeneralPurposeRegister):
 369 |             return MemoryAddress(reg1=self, reg2=other, mult=1)
 370 |         if isinstance(other, MemoryAddress):
 371 |             return other.merge(self)
 372 |         raise Exception('Invalid Parameter')
 373 | 
 374 |     def __radd__(self, other):
 375 |         """other + self"""
 376 |         return self.__add__(other)
 377 | 
 378 |     def __sub__(self, other):
 379 |         """self - other"""
 380 |         return self.__add__(2**32 - other)
 381 | 
 382 |     def __mul__(self, other):
 383 |         """self * other"""
 384 |         return MemoryAddress(reg2=self, mult=other)
 385 | 
 386 |     def __rmul__(self, other):
 387 |         """other * self"""
 388 |         return MemoryAddress(reg2=self, mult=other)
 389 | 
 390 |     def __str__(self):
 391 |         return self.name
 392 | 
 393 |     def __repr__(self):
 394 |         return self.name
 395 | 
 396 |     def __index__(self):
 397 |         """Index of this register."""
 398 |         return MemoryAddress(reg1=self).__index__()
 399 | 
 400 | # define the general purpose registers
 401 | al = GeneralPurposeRegister(0, 'al', byte)
 402 | cl = GeneralPurposeRegister(1, 'cl', byte)
 403 | dl = GeneralPurposeRegister(2, 'dl', byte)
 404 | bl = GeneralPurposeRegister(3, 'bl', byte)
 405 | ah = GeneralPurposeRegister(4, 'ah', byte)
 406 | ch = GeneralPurposeRegister(5, 'ch', byte)
 407 | dh = GeneralPurposeRegister(6, 'dh', byte)
 408 | bh = GeneralPurposeRegister(7, 'bh', byte)
 409 | 
 410 | ax = GeneralPurposeRegister(0, 'ax', word)
 411 | cx = GeneralPurposeRegister(1, 'cx', word)
 412 | dx = GeneralPurposeRegister(2, 'dx', word)
 413 | bx = GeneralPurposeRegister(3, 'bx', word)
 414 | sp = GeneralPurposeRegister(4, 'sp', word)
 415 | bp = GeneralPurposeRegister(5, 'bp', word)
 416 | si = GeneralPurposeRegister(6, 'si', word)
 417 | di = GeneralPurposeRegister(7, 'di', word)
 418 | 
 419 | eax = GeneralPurposeRegister(0, 'eax', dword)
 420 | ecx = GeneralPurposeRegister(1, 'ecx', dword)
 421 | edx = GeneralPurposeRegister(2, 'edx', dword)
 422 | ebx = GeneralPurposeRegister(3, 'ebx', dword)
 423 | esp = GeneralPurposeRegister(4, 'esp', dword)
 424 | ebp = GeneralPurposeRegister(5, 'ebp', dword)
 425 | esi = GeneralPurposeRegister(6, 'esi', dword)
 426 | edi = GeneralPurposeRegister(7, 'edi', dword)
 427 | 
 428 | # array of general purpose registers, according to their index
 429 | GeneralPurposeRegister.register8 = (al, cl, dl, bl, ah, ch, dh, bh)
 430 | GeneralPurposeRegister.register16 = (ax, cx, dx, bx, sp, bp, si, di)
 431 | GeneralPurposeRegister.register32 = (eax, ecx, edx, ebx, esp, ebp, esi, edi)
 432 | 
 433 | # make an alias `gpr' to GeneralPurposeRegister in order to simplify the
 434 | # creation of Instruction's
 435 | gpr = GeneralPurposeRegister
 436 | 
 437 | 
 438 | class XmmRegister:
 439 |     """Defines the Xmm Registers, registers used for the SSE instructions."""
 440 |     def __init__(self, index, name):
 441 |         self.index = index
 442 |         self.name = name
 443 |         self.size = oword.size
 444 | 
 445 |     def __str__(self):
 446 |         return self.name
 447 | 
 448 |     def __repr__(self):
 449 |         return self.name
 450 | 
 451 | xmm0 = XmmRegister(0, 'xmm0')
 452 | xmm1 = XmmRegister(1, 'xmm1')
 453 | xmm2 = XmmRegister(2, 'xmm2')
 454 | xmm3 = XmmRegister(3, 'xmm3')
 455 | xmm4 = XmmRegister(4, 'xmm4')
 456 | xmm5 = XmmRegister(5, 'xmm5')
 457 | xmm6 = XmmRegister(6, 'xmm6')
 458 | xmm7 = XmmRegister(7, 'xmm7')
 459 | 
 460 | # make an alias `xmm' to XmmRegister in order to simplify the creation of
 461 | # Instruction's
 462 | xmm = XmmRegister
 463 | 
 464 | 
 465 | class MemoryGeneralPurposeRegister(MemoryAddress, GeneralPurposeRegister):
 466 |     """A combination of MemoryAddress and GeneralPurposeRegister,
 467 |         useful for modrm encoding etc."""
 468 |     pass
 469 | 
 470 | # a combination of operand types that can be used in modrm bytes.
 471 | memgpr = MemoryGeneralPurposeRegister
 472 | 
 473 | 
 474 | class MemoryXmmRegister(MemoryAddress, XmmRegister):
 475 |     """Combination of MemoryAddress and XmmRegister."""
 476 |     pass
 477 | 
 478 | memxmm = MemoryXmmRegister
 479 | 
 480 | 
 481 | class Instruction:
 482 |     """Base class for every instruction.
 483 | 
 484 |     Instructions that don't take any operands place their opcode as integer or
 485 |     string in `_opcode_'.
 486 |     Instructions that have one or more (maximum of three) operands fill the
 487 |     `_enc_' table, one entry per encoding. The layout of this encoding is a
 488 |     list of tuples, with a layout like the following.
 489 | 
 490 |     (opcode, operand1, operand2, operand3)
 491 | 
 492 |     `opcode' is an integer or string representing the opcode of this encoding.
 493 |     `operand1', `operand2' and `operand3' are tuples defining the size and
 494 |     type of operand, `operand2' and `operand3' are obviously optional. If an
 495 |     operand is not a tuple, it defines a hardcoded operand.
 496 | 
 497 |     """
 498 |     VALID_OPERANDS = (int, long, SegmentRegister, GeneralPurposeRegister,
 499 |                       MemoryAddress, Immediate, XmmRegister, list)
 500 | 
 501 |     # we use a ctypes-like way to implement instructions.
 502 |     _opcode_ = None
 503 |     _enc_ = []
 504 |     _name_ = None
 505 | 
 506 |     def __init__(self, operand1=None, operand2=None, operand3=None,
 507 |                  lock=False, rep=False, repne=False):
 508 |         """Initialize a new Instruction object."""
 509 |         assert operand1 is None or isinstance(operand1, self.VALID_OPERANDS)
 510 |         assert operand2 is None or isinstance(operand2, self.VALID_OPERANDS)
 511 |         assert operand3 is None or isinstance(operand3, self.VALID_OPERANDS)
 512 |         assert not isinstance(operand1, list) or len(operand1) == 1
 513 |         assert not isinstance(operand2, list) or len(operand2) == 1
 514 | 
 515 |         # convert int and long's to Immediate values.
 516 |         f = lambda x: x if not isinstance(x, (int, long)) else Immediate(x, signed=x < 0)
 517 |         # convert lists with one entry to Memory Addresses
 518 |         g = lambda x: x if not isinstance(x, list) else x[0]
 519 | 
 520 |         self.op1 = g(f(operand1))
 521 |         self.op2 = g(f(operand2))
 522 |         self.op3 = f(operand3)
 523 |         self.lock = lock
 524 |         self.rep = rep
 525 |         self.repne = repne
 526 | 
 527 |         # clean operands, if needed
 528 |         self.clean()
 529 | 
 530 |         # check the correct encoding for this combination of operands
 531 |         self.encoding()
 532 | 
 533 |     def clean(self):
 534 |         """Alters the order of operands if needed."""
 535 | 
 536 |         # the `xchg' instruction requires operands ordered as `memgpr, gpr'.
 537 |         if isinstance(self, xchg) and isinstance(self.op1, gpr) and \
 538 |                 isinstance(self.op2, mem):
 539 |             self.op1, self.op2 = self.op2, self.op1
 540 | 
 541 |         if isinstance(self.op1, mem):
 542 |             self.op1.final_clean()
 543 | 
 544 |         if isinstance(self.op2, mem):
 545 |             self.op2.final_clean()
 546 | 
 547 |     def modrm(self, op1, op2):
 548 |         """Encode two operands into their modrm representation."""
 549 |         # we make sure `op2' is always the Memory Address (if present at all)
 550 |         if isinstance(op1, MemoryAddress):
 551 |             op1, op2 = op2, op1
 552 | 
 553 |         # a brief explanation of variabele names in this function.
 554 |         # there is a modrm byte, which contains `reg', `mod' and `rm' and
 555 |         # there is a sib byte, which contains `S', `index' and `base'.
 556 |         # for more explanation on the encoding, see also:
 557 |         # http://sandpile.org/x86/opc_rm.htm for the modrm byte, and
 558 |         # http://sandpile.org/x86/opc_rm.htm for the sib byte.
 559 | 
 560 |         reg = op1.index
 561 | 
 562 |         buf = ''
 563 |         sib = False
 564 | 
 565 |         if isinstance(op2, (GeneralPurposeRegister, XmmRegister)):
 566 |             mod = 3
 567 |             rm = op2.index
 568 | 
 569 |         elif isinstance(op2, MemoryAddress):
 570 |             mults = {1: 0, 2: 1, 4: 2, 8: 3}
 571 |             if op2.reg1 is None:
 572 |                 if op2.reg2 is None:
 573 |                     # there should be atleast a displacement
 574 |                     assert op2.disp is not None
 575 |                     mod = 0
 576 |                     rm = 5
 577 |                     buf = struct.pack('I', op2.disp % 2**32)
 578 |                 else:
 579 |                     sib = True
 580 |                     S = mults[op2.mult]
 581 |                     index = op2.reg2.index
 582 |                     mod = 0
 583 |                     rm = 4
 584 |                     # it's not possible to have a register with a
 585 |                     # multiplication other than one without a 32bit
 586 |                     # displacement.
 587 |                     base = 5
 588 |                     buf = struct.pack('I', op2.disp % 2**32 if op2.disp else 0)
 589 |             else:
 590 |                 if op2.reg2 is None:
 591 |                     # special case for `esp', since it requires the sib byte
 592 |                     if op2.reg1.index == 4:
 593 |                         sib = True
 594 |                         base = 4
 595 |                         index = 4
 596 |                         S = 0
 597 |                         rm = 4
 598 |                         mod = 2
 599 |                     # special case for `ebp', since it requires a displacement
 600 |                     elif op2.reg1.index == 5:
 601 |                         rm = 5
 602 |                         mod = 3
 603 |                     else:
 604 |                         rm = op2.reg1.index
 605 |                         mod = 2
 606 |                 # special case for `esp', since it requires the sib byte
 607 |                 elif op2.reg1.index == 4:
 608 |                     sib = True
 609 |                     base = 4
 610 |                     index = op2.reg2.index
 611 |                     S = mults[op2.mult]
 612 |                     rm = 4
 613 |                     mod = 2
 614 |                 # special case for `ebp', since it requires a displacement
 615 |                 elif op2.reg1.index == 5:
 616 |                     sib = True
 617 |                     index = op2.reg2.index
 618 |                     S = mults[op2.mult]
 619 |                     base = 5
 620 |                     rm = 4
 621 |                     mod = 3
 622 |                 else:
 623 |                     sib = True
 624 |                     rm = 4
 625 |                     base = op2.reg1.index
 626 |                     index = op2.reg2.index
 627 |                     S = mults[op2.mult]
 628 |                     mod = 2
 629 | 
 630 |             # if `mod' is two here, then there can be either a 8bit, 32bit or
 631 |             # no displacement at all. when `mod' is three, there has to be
 632 |             # either a 8bit displacement or a 32bit one.
 633 |             if mod in (2, 3):
 634 |                 if op2.disp is not None:
 635 |                     disp = int(op2.disp) % 2**32
 636 |                 if op2.disp is None:
 637 |                     if mod == 3:
 638 |                         mod = 1
 639 |                         buf = '\x00'
 640 |                     else:
 641 |                         mod = 0
 642 |                 elif disp >= 0 and disp < 0x80:
 643 |                     mod = 1
 644 |                     buf = chr(disp)
 645 |                 elif disp >= 0xffffff80 and disp < 2**32:
 646 |                     mod = 1
 647 |                     buf = chr(disp & 0xff)
 648 |                 else:
 649 |                     mod = 2
 650 |                     buf = struct.pack('I', disp)
 651 | 
 652 |         # construct the modrm byte
 653 |         ret = chr((mod << 6) + (reg << 3) + rm)
 654 |         if sib:
 655 |             # if required, construct the sib byte
 656 |             ret += chr((S << 6) + (index << 3) + base)
 657 |         # append the buf, if it contains anything.
 658 |         return ret + buf
 659 | 
 660 |     def encoding(self):
 661 |         """Returns the Encoding used, as defined by `_enc_'.
 662 | 
 663 |         If the instruction doesn't take any operands, None is returned.
 664 |         If the instruction takes one or more (maximum of three) operands and a
 665 |         match is found in `_enc_', the match is returned, otherwise an
 666 |         Exception is raised.
 667 | 
 668 |         """
 669 |         if self.op1 is None:
 670 |             return None
 671 | 
 672 |         for enc in self._enc_:
 673 |             opcode, op1, op2, op3 = enc + (None,) * (4 - len(enc))
 674 | 
 675 |             def operand_count(a, b, c):
 676 |                 return (a is not None) + (b is not None) + (c is not None)
 677 | 
 678 |             # check if the amount of operands match.
 679 |             if operand_count(self.op1, self.op2, self.op3) != \
 680 |                     operand_count(op1, op2, op3):
 681 |                 continue
 682 | 
 683 |             # if the encoding is not a tuple, then it's a hardcoded value.
 684 |             if not isinstance(op1, tuple):
 685 |                 # check if the classes and objects match.
 686 |                 if op1.__class__ != self.op1.__class__ or op1 != self.op1:
 687 |                     continue
 688 |             # check the operand (and size) of this match
 689 |             elif not issubclass(op1[1], self.op1.__class__):
 690 |                 continue
 691 |             elif hasattr(self.op1, 'size') and op1[0] is not None:
 692 |                 if op1[1] in (imm, signed_imm):
 693 |                     if op1[1](int(self.op1)).size > op1[0].size:
 694 |                         continue
 695 |                 else:
 696 |                     if op1[0].size != self.op1.size:
 697 |                         continue
 698 | 
 699 |             if op2 is None:
 700 |                 self._encoding = (opcode, op1, op2, op3)
 701 |                 return self._encoding
 702 | 
 703 |             # if the encoding is not a tuple, then it's a hardcoded value.
 704 |             if not isinstance(op2, tuple):
 705 |                 # check if the classes and objects match.
 706 |                 if op2.__class__ != self.op2.__class__ or op2 != self.op2:
 707 |                     continue
 708 |             # check the operand (and size) of this match
 709 |             elif not issubclass(op2[1], self.op2.__class__):
 710 |                 continue
 711 |             elif hasattr(self.op2, 'size') and op2[0] is not None:
 712 |                 if op2[1] in (imm, signed_imm):
 713 |                     if op2[1](int(self.op2)).size > op2[0].size:
 714 |                         continue
 715 |                 else:
 716 |                     if op2[0].size != self.op2.size:
 717 |                         continue
 718 | 
 719 |             if op3 is None:
 720 |                 self._encoding = (opcode, op1, op2, op3)
 721 |                 return self._encoding
 722 | 
 723 |             # check if the third operand matches (can only be an Immediate)
 724 |             if not issubclass(op3[1], self.op3.__class__):
 725 |                 continue
 726 |             elif op3[1] in (imm, signed_imm) and \
 727 |                     op3[1](int(self.op3)).size > op3[0].size:
 728 |                 continue
 729 | 
 730 |             # we found a matching encoding, return it.
 731 |             self._encoding = (opcode, op1, op2, op3)
 732 |             return self._encoding
 733 | 
 734 |         raise Exception('Unknown or Invalid Encoding')
 735 | 
 736 |     def name(self):
 737 |         """The name of this instruction."""
 738 |         return self._name_ or self.__class__.__name__
 739 | 
 740 |     def __repr__(self):
 741 |         """Representation of this Instruction."""
 742 |         s = ''
 743 | 
 744 |         if self.lock:
 745 |             s += 'lock '
 746 | 
 747 |         if self.repne:
 748 |             s += 'repne '
 749 | 
 750 |         if self.rep:
 751 |             s += 'rep '
 752 | 
 753 |         s += self.name()
 754 |         ops = filter(lambda x: x is not None, (self.op1, self.op2, self.op3))
 755 |         if len(ops):
 756 |             return s + ' ' + ', '.join(map(str, ops))
 757 |         return s
 758 | 
 759 |     def __len__(self):
 760 |         """Return the Length of the Machine Code."""
 761 |         return self.__str__().__len__()
 762 | 
 763 |     def __str__(self):
 764 |         """Encode this Instruction into its machine code representation."""
 765 |         enc = self.encoding()
 766 | 
 767 |         ret = ''
 768 | 
 769 |         if self.lock:
 770 |             ret += '\xf0'
 771 | 
 772 |         if self.repne:
 773 |             ret += '\xf2'
 774 | 
 775 |         if self.rep:
 776 |             ret += '\xf3'
 777 | 
 778 |         if enc is None:
 779 |             op = self._opcode_
 780 |             ret += chr(op) if isinstance(op, int) else op
 781 |             return ret
 782 | 
 783 |         opcode, op1, op2, op3 = enc
 784 |         ops = (self.op1, self.op2, self.op3)
 785 |         modrm_reg = modrm_rm = None
 786 | 
 787 |         ret += chr(opcode) if isinstance(opcode, int) else opcode
 788 |         disp = ''
 789 | 
 790 |         for i in xrange(3):
 791 |             op = enc[i+1]
 792 |             # we don't have to process empty operands or hardcoded values
 793 |             if op is None or not isinstance(op, tuple):
 794 |                 continue
 795 | 
 796 |             size, typ = op[:2]
 797 | 
 798 |             # if a third index is given in the operand's tuple, then that
 799 |             # means that we have to emulate the `reg' for the modrm byte.
 800 |             # the value of `reg' is therefore given as third value.
 801 |             if len(op) == 3:
 802 |                 modrm_reg = gpr.register32[op[2]]
 803 | 
 804 |             # handle Immediates
 805 |             if typ in (imm, signed_imm):
 806 |                 disp += size.pack(ops[i])
 807 |                 continue
 808 | 
 809 |             # handle the reg part of the modrm byte
 810 |             if typ not in (mem, memgpr, memxmm) and modrm_reg is None:
 811 |                 modrm_reg = ops[i]
 812 |                 continue
 813 | 
 814 |             if isinstance(ops[i], (gpr, xmm)) and modrm_rm is not None:
 815 |                 modrm_reg = ops[i]
 816 |                 continue
 817 | 
 818 |             # handle the rm part of the modrm byte
 819 |             if typ in (mem, gpr, xmm, memgpr, memxmm):
 820 |                 modrm_rm = ops[i]
 821 |                 continue
 822 | 
 823 |             raise Exception('Unknown Type')
 824 | 
 825 |         if modrm_reg or modrm_rm:
 826 |             ret += self.modrm(modrm_reg, modrm_rm)
 827 | 
 828 |         self._encode = ret + disp
 829 |         return self._encode
 830 | 
 831 |     def __add__(self, other):
 832 |         return Block(self, other)
 833 | 
 834 |     def __radd__(self, other):
 835 |         return Block(other, self)
 836 | 
 837 | 
 838 | class RelativeJump:
 839 |     _index_ = None
 840 |     _name_ = None
 841 | 
 842 |     def __init__(self, value):
 843 |         self.value = value
 844 | 
 845 |     def __len__(self):
 846 |         return 6 if self._index_ is not None else 5
 847 | 
 848 |     def name(self):
 849 |         """The name of this instruction."""
 850 |         return self._name_ or self.__class__.__name__
 851 | 
 852 |     def __repr__(self):
 853 |         value = self.value
 854 |         if not isinstance(value, str):
 855 |             value = repr(value)
 856 |         return self.name() + ' ' + value
 857 | 
 858 |     def assemble(self, short=True, labels={}, offset=0):
 859 |         """Assemble the Relative Jump.
 860 | 
 861 |         `short' indicates if the relative offset should be encoded as 8bit
 862 |         value, if possible.
 863 |         `offset' is the offset of this jump
 864 | 
 865 |         """
 866 |         to = self.value
 867 |         if isinstance(self.value, Label):
 868 |             if isinstance(self.value.index, str):
 869 |                 to = labels[self.value.index]
 870 |             else:
 871 |                 index = self.value.index + self.value.base
 872 |                 to = labels[index - (self.value.index > 0)]
 873 |         elif isinstance(self.value, str):
 874 |             to = labels[self.value]
 875 | 
 876 |         if self._index_ is None:
 877 |             return chr(self._opcode_) + dword.pack(to - offset - 5)
 878 |         else:
 879 |             return '\x0f' + chr(0x80 + self._index_) + dword.pack(
 880 |                 to - offset - 6)
 881 | 
 882 | 
 883 | class Block:
 884 |     block_id = 0
 885 | 
 886 |     def __init__(self, *args):
 887 |         self._l = []
 888 | 
 889 |         # unique block id for each Block
 890 |         Block.block_id += 1
 891 | 
 892 |         # current index for new labels
 893 |         self.label_base = 0
 894 | 
 895 |         # add each argument to the list
 896 |         map(self.append, args)
 897 | 
 898 |     def __repr__(self):
 899 |         """Return a string representation of all instructions chained."""
 900 |         # convert an instruction into a string representation, labels need an
 901 |         # additional semicolon and have to be absolute offset, rather than
 902 |         # a relative one
 903 |         ret = ''
 904 |         index = 0
 905 |         for instr in self._l:
 906 |             if isinstance(instr, Label):
 907 |                 instr.base = index
 908 |                 index += 1
 909 |                 ret += repr(instr) + ':\n'
 910 |             elif isinstance(instr, str):
 911 |                 index += 1
 912 |                 ret += '%s:\n' % instr
 913 |             elif isinstance(instr, RelativeJump) and \
 914 |                     isinstance(instr.value, Label):
 915 |                 instr.value.base = index
 916 |                 ret += repr(instr) + '\n'
 917 |             else:
 918 |                 ret += repr(instr) + '\n'
 919 |         return ret
 920 | 
 921 |     def __str__(self):
 922 |         """Return the Machine Code representation."""
 923 |         return ''.join(map(str, self._l))
 924 | 
 925 |     def assemble(self, recursion=10):
 926 |         """Assemble the given Block.
 927 | 
 928 |         Assembly can *ONLY* be called on the top-level assembly block. Any
 929 |         unresolved labels will result in an exception.
 930 | 
 931 |         `recursion' indicates the maximal amount of times to recurse in order
 932 |         to optimize the size of conditional jumps (see docs.)
 933 | 
 934 |         """
 935 |         local_labels = {}
 936 |         global_labels = {}
 937 |         offset = 0
 938 | 
 939 |         # first we obtain the offset of each label
 940 |         for idx, instr in enumerate(self._l):
 941 |             # convert any class objects to instances
 942 |             if isinstance(instr, types.ClassType):
 943 |                 instr = instr()
 944 |                 self._l[idx] = instr
 945 | 
 946 |             if isinstance(instr, (str, Label)):
 947 |                 # named local label
 948 |                 if isinstance(instr, str):
 949 |                     local_labels[instr] = offset
 950 | 
 951 |                 # anonymous local label
 952 |                 elif not instr.index:
 953 |                     instr.index = len(local_labels)
 954 |                     local_labels[instr.index] = offset
 955 | 
 956 |                 # global named label
 957 |                 elif isinstance(instr.index, str):
 958 |                     global_labels[instr.index] = offset
 959 |                     local_labels[instr.index] = offset
 960 | 
 961 |             elif isinstance(instr, Instruction):
 962 |                 offset += len(instr)
 963 | 
 964 |             elif isinstance(instr, RelativeJump):
 965 |                 # 5 for unconditional jumps, 6 for conditional jumps
 966 |                 offset += 5 + (instr._index_ is not None)
 967 | 
 968 |                 # is this a label?
 969 |                 if isinstance(instr.value, Label):
 970 | 
 971 |                     # is this an anonymous label?
 972 |                     if isinstance(instr.value, (int, long)):
 973 |                         # make an absolute index from the relative one
 974 |                         instr.value.index += len(local_labels)
 975 | 
 976 |         # do at most `recursion' iterations in order to try to optimize the
 977 |         # relative jumps
 978 |         # ...
 979 | 
 980 |         machine_code = ''
 981 |         offset = 0
 982 | 
 983 |         # now we assemble the machine code
 984 |         for instr in self._l:
 985 | 
 986 |             if isinstance(instr, Instruction):
 987 |                 machine_code += str(instr)
 988 | 
 989 |             elif isinstance(instr, RelativeJump):
 990 |                 machine_code += instr.assemble(short=False,
 991 |                                                labels=local_labels,
 992 |                                                offset=offset)
 993 | 
 994 |             offset = len(machine_code)
 995 | 
 996 |         return machine_code
 997 | 
 998 |     def append(self, other):
 999 |         """Append instruction(s) in `other' to `self'."""
1000 |         # if a class object was given, we create an instance ourselves
1001 |         # this can be either an Instruction or a Label
1002 |         if isinstance(other, (types.ClassType, _MetaLabel)):
1003 |             other = other()
1004 | 
1005 |         def labelify(val):
1006 |             if isinstance(val, Label):
1007 |                 val.base = self.label_base
1008 |             if isinstance(val, str):
1009 |                 val = '__lbl_%d_%s' % (self.block_id, val)
1010 |             return val
1011 | 
1012 |         if isinstance(other, Label):
1013 |             other.base = self.label_base
1014 |             self._l.append(other)
1015 |             self.label_base += 1
1016 | 
1017 |         elif isinstance(other, str):
1018 |             self._l.append('__lbl_%d_%s' % (self.block_id, other))
1019 |             self.label_base += 1
1020 | 
1021 |         elif isinstance(other, RelativeJump):
1022 |             self._l.append(other)
1023 | 
1024 |             other.value = labelify(other.value)
1025 | 
1026 |         elif isinstance(other, Instruction):
1027 |             self._l.append(other)
1028 | 
1029 |             other.op1 = labelify(other.op1)
1030 |             other.op2 = labelify(other.op2)
1031 |             # TODO add memory address support
1032 | 
1033 |         elif isinstance(other, (list, tuple)):
1034 |             map(self.append, other)
1035 | 
1036 |         elif isinstance(other, Block):
1037 |             # we merge the `other' block with ours, by appending.
1038 |             # TODO deepcopy might get in a recursive loop somehow, if that
1039 |             # ever occurs, implement a __deepcopy__ which only makes a new
1040 |             # copy of Labels
1041 |             # map(self.append, map(copy.deepcopy, other._l))
1042 |             map(self.append, other._l)
1043 | 
1044 |         else:
1045 |             raise Exception('This object is not welcome here.')
1046 | 
1047 |         return self
1048 | 
1049 |     def __iadd__(self, other):
1050 |         """self += other"""
1051 |         self.append(other)
1052 |         return self
1053 | 
1054 |     def __add__(self, other):
1055 |         """self + other"""
1056 |         return Block(self, other)
1057 | 
1058 |     def __radd__(self, other):
1059 |         """other + self"""
1060 |         return Block(other, self)
1061 | 
1062 |     def __iter__(self):
1063 |         return self._l
1064 | 
1065 | block = Block
1066 | 
1067 | 
1068 | class _MetaLabel(type):
1069 |     def __sub__(cls, other):
1070 |         return Label(-other)
1071 | 
1072 |     def __add__(cls, other):
1073 |         return Label(other)
1074 | 
1075 | 
1076 | class Label:
1077 |     __metaclass__ = _MetaLabel
1078 | 
1079 |     def __init__(self, index=0):
1080 |         self.index = index
1081 | 
1082 |         # any base to add to `index'
1083 |         self.base = 0
1084 | 
1085 |     def __repr__(self):
1086 |         if isinstance(self.index, str):
1087 |             return '__lbl_%s' % self.index
1088 | 
1089 |         index = self.index + self.base
1090 | 
1091 |         # as we have to include Label(0) possibilities
1092 |         if self.index > 0:
1093 |             index -= 1
1094 | 
1095 |         return '__lbl_%s' % index
1096 | 
1097 | lbl = Label
1098 | 
1099 | 
1100 | class retn(Instruction):
1101 |     _opcode_ = 0xc3
1102 |     _enc_ = [(0xc2, (word, imm))]
1103 | 
1104 | ret = retn
1105 | 
1106 | 
1107 | class leave(Instruction):
1108 |     _opcode_ = 0xc9
1109 | 
1110 | 
1111 | class nop(Instruction):
1112 |     _opcode_ = 0x90
1113 | 
1114 | 
1115 | class mov(Instruction):
1116 |     # mov r32, imm32 and mov r8, imm32
1117 |     _enc_ = \
1118 |         zip(range(0xb0, 0xb8), gpr.register8, ((byte, imm),) * 8) + \
1119 |         zip(range(0xb8, 0xc0), gpr.register32, ((dword, imm),) * 8) + \
1120 |         [
1121 |             (0x8b, (dword, gpr), (dword, memgpr)),
1122 |             (0x89, (dword, memgpr), (dword, gpr)),
1123 |             (0x88, (byte, memgpr), (byte, gpr)),
1124 |             (0x8a, (byte, gpr), (byte, memgpr)),
1125 |             (0xc6, (byte, memgpr, 0), (byte, imm)),
1126 |             (0xc7, (dword, memgpr, 0), (dword, imm)),
1127 |         ]
1128 | 
1129 | class movzx(Instruction):
1130 |     _enc_ = [
1131 |         ('\x0f\xb6', (dword, gpr), (byte, memgpr)),
1132 |         ('\x0f\xb7', (dword, gpr), (word, memgpr)),
1133 |     ]
1134 | 
1135 | 
1136 | class movsx(Instruction):
1137 |     _enc_ = [
1138 |         ('\x0f\xbe', (dword, gpr), (byte, memgpr)),
1139 |         ('\x0f\xbf', (dword, gpr), (word, memgpr)),
1140 |     ]
1141 | 
1142 | 
1143 | class push(Instruction):
1144 |     # push r32
1145 |     _enc_ = zip(range(0x50, 0x58), gpr.register32) + [
1146 |         (0x06, es),
1147 |         (0x0e, cs),
1148 |         (0x16, ss),
1149 |         (0x1e, ds),
1150 |         ('\x0f\xa0', fs),
1151 |         ('\x0f\xa8', gs),
1152 |         (0x6a, (byte, signed_imm)),
1153 |         (0x68, (dword, imm)),
1154 |         (0xff, (dword, mem, 6)),
1155 |     ]
1156 | 
1157 | 
1158 | class pop(Instruction):
1159 |     # pop r32
1160 |     _enc_ = zip(range(0x58, 0x60), gpr.register32) + [
1161 |         (0x07, es),
1162 |         (0x17, ss),
1163 |         (0x1f, ds),
1164 |         ('\x0f\xa1', fs),
1165 |         ('\x0f\xa9', gs),
1166 |         (0x8f, (dword, mem, 0)),
1167 |     ]
1168 | 
1169 | 
1170 | class inc(Instruction):
1171 |     # inc r32
1172 |     _enc_ = zip(range(0x40, 0x48), gpr.register32) + [
1173 |         (0xfe, (byte, memgpr, 0)),
1174 |         (0xff, (dword, memgpr, 0))]
1175 | 
1176 | 
1177 | class dec(Instruction):
1178 |     # dec r32
1179 |     _enc_ = zip(range(0x48, 0x50), gpr.register32) + [
1180 |         (0xfe, (byte, memgpr, 1)),
1181 |         (0xff, (dword, memgpr, 1))]
1182 | 
1183 | 
1184 | class xchg(Instruction):
1185 |     # xchg eax, r32
1186 |     _enc_ = zip(range(0x91, 0x98), gpr.register32[1:], (eax,) * 8) + [
1187 |         (0x86, (byte, memgpr), (byte, gpr)),
1188 |         (0x87, (dword, memgpr), (dword, memgpr))]
1189 | 
1190 | 
1191 | class stosb(Instruction):
1192 |     _opcode_ = 0xaa
1193 | 
1194 | 
1195 | class stosd(Instruction):
1196 |     _opcode_ = 0xab
1197 | 
1198 | 
1199 | class lodsb(Instruction):
1200 |     _opcode_ = 0xac
1201 | 
1202 | 
1203 | class lodsd(Instruction):
1204 |     _opcode_ = 0xad
1205 | 
1206 | 
1207 | class scasb(Instruction):
1208 |     _opcode_ = 0xae
1209 | 
1210 | 
1211 | class scasd(Instruction):
1212 |     _opcode_ = 0xaf
1213 | 
1214 | 
1215 | class lea(Instruction):
1216 |     _enc_ = [(0x8d, (dword, gpr), (None, mem))]
1217 | 
1218 | 
1219 | class pshufd(Instruction):
1220 |     _enc_ = [('\x66\x0f\x70', (oword, xmm), (oword, memxmm), (byte, imm))]
1221 | 
1222 | 
1223 | class paddb(Instruction):
1224 |     _enc_ = [('\x66\x0f\xfc', (oword, xmm), (oword, memxmm))]
1225 | 
1226 | 
1227 | class paddw(Instruction):
1228 |     _enc_ = [('\x66\x0f\xfd', (oword, xmm), (oword, memxmm))]
1229 | 
1230 | 
1231 | class paddd(Instruction):
1232 |     _enc_ = [('\x66\x0f\xfe', (oword, xmm), (oword, memxmm))]
1233 | 
1234 | 
1235 | class psubb(Instruction):
1236 |     _enc_ = [('\x66\x0f\xf8', (oword, xmm), (oword, memxmm))]
1237 | 
1238 | 
1239 | class psubw(Instruction):
1240 |     _enc_ = [('\x66\x0f\xf9', (oword, xmm), (oword, memxmm))]
1241 | 
1242 | 
1243 | class psubd(Instruction):
1244 |     _enc_ = [('\x66\x0f\xfa', (oword, xmm), (oword, memxmm))]
1245 | 
1246 | 
1247 | class pand(Instruction):
1248 |     _enc_ = [('\x66\x0f\xdb', (oword, xmm), (oword, memxmm))]
1249 | 
1250 | 
1251 | class pandn(Instruction):
1252 |     _enc_ = [('\x66\x0f\xdf', (oword, xmm), (oword, memxmm))]
1253 | 
1254 | 
1255 | class por(Instruction):
1256 |     _enc_ = [('\x66\x0f\xeb', (oword, xmm), (oword, memxmm))]
1257 | 
1258 | 
1259 | class pxor(Instruction):
1260 |     _enc_ = [('\x66\x0f\xef', (oword, xmm), (oword, memxmm))]
1261 | 
1262 | 
1263 | class pmuludq(Instruction):
1264 |     _enc_ = [('\x66\x0f\xf4', (oword, xmm), (oword, memxmm))]
1265 | 
1266 | 
1267 | class movaps(Instruction):
1268 |     _enc_ = [
1269 |         ('\x0f\x28', (oword, xmm), (oword, memxmm)),
1270 |         ('\x0f\x29', (oword, memxmm), (oword, xmm)),
1271 |     ]
1272 | 
1273 | 
1274 | class movups(Instruction):
1275 |     _enc_ = [
1276 |         ('\x0f\x10', (oword, xmm), (oword, memxmm)),
1277 |         ('\x0f\x11', (oword, memxmm), (oword, xmm)),
1278 |     ]
1279 | 
1280 | 
1281 | class movapd(Instruction):
1282 |     _enc_ = [
1283 |         ('\x66\x0f\x28', (oword, xmm), (oword, memxmm)),
1284 |         ('\x66\x0f\x29', (oword, memxmm), (oword, xmm)),
1285 |     ]
1286 | 
1287 | 
1288 | class movd(Instruction):
1289 |     _enc_ = [
1290 |         ('\x66\x0f\x6e', (oword, xmm), (dword, memgpr)),
1291 |         ('\x66\x0f\x7e', (dword, memgpr), (oword, xmm)),
1292 |     ]
1293 | 
1294 | 
1295 | class movss(Instruction):
1296 |     _enc_ = [
1297 |         ('\xf3\x0f\x10', (oword, xmm), (oword, memxmm)),
1298 |         ('\xf3\x0f\x11', (oword, memxmm), (oword, xmm)),
1299 |     ]
1300 | 
1301 | 
1302 | class jo(RelativeJump):
1303 |     _index_ = 0
1304 | 
1305 | 
1306 | class jno(RelativeJump):
1307 |     _index_ = 1
1308 | 
1309 | 
1310 | class jb(RelativeJump):
1311 |     _index_ = 2
1312 | 
1313 | 
1314 | class jnb(RelativeJump):
1315 |     _index_ = 3
1316 | 
1317 | jae = jnb
1318 | 
1319 | 
1320 | class jz(RelativeJump):
1321 |     _index_ = 4
1322 | 
1323 | 
1324 | class jnz(RelativeJump):
1325 |     _index_ = 5
1326 | 
1327 | 
1328 | class jbe(RelativeJump):
1329 |     _index_ = 6
1330 | 
1331 | 
1332 | class jnbe(RelativeJump):
1333 |     _index_ = 7
1334 | 
1335 | ja = jnbe
1336 | 
1337 | 
1338 | class js(RelativeJump):
1339 |     _index_ = 8
1340 | 
1341 | 
1342 | class jns(RelativeJump):
1343 |     _index_ = 9
1344 | 
1345 | 
1346 | class jp(RelativeJump):
1347 |     _index_ = 10
1348 | 
1349 | 
1350 | class jnp(RelativeJump):
1351 |     _index_ = 11
1352 | 
1353 | 
1354 | class jl(RelativeJump):
1355 |     _index_ = 12
1356 | 
1357 | 
1358 | class jnl(RelativeJump):
1359 |     _index_ = 13
1360 | 
1361 | jge = jnl
1362 | 
1363 | 
1364 | class jle(RelativeJump):
1365 |     _index_ = 14
1366 | 
1367 | 
1368 | class jnle(RelativeJump):
1369 |     _index_ = 15
1370 | 
1371 | 
1372 | def _branch_instr(name, opcode, enc, arg):
1373 |     if not isinstance(arg, (int, long, str, Label)):
1374 |         i = Instruction(arg)
1375 |         i._enc_ = enc
1376 |         i._name_ = name
1377 |         return i
1378 |     r = RelativeJump(arg)
1379 |     r._opcode_ = opcode
1380 |     r._name_ = name
1381 |     return r
1382 | 
1383 | 
1384 | def jmp(arg):
1385 |     return _branch_instr('jmp', 0xe9, None, arg)
1386 | 
1387 | 
1388 | def call(arg):
1389 |     return _branch_instr('call', 0xe8, None, arg)
1390 | 
1391 | _group_1_opcodes = lambda x: [
1392 |     (0x00+8*x, (byte, memgpr), (byte, gpr)),
1393 |     (0x01+8*x, (dword, memgpr), (dword, gpr)),
1394 |     (0x02+8*x, (byte, gpr), (byte, memgpr)),
1395 |     (0x03+8*x, (dword, gpr), (dword, memgpr)),
1396 |     (0x04+8*x, al, (byte, imm)),
1397 |     (0x80, (byte, memgpr, x), (byte, imm)),
1398 |     (0x83, (dword, memgpr, x), (byte, signed_imm)),
1399 |     (0x05+8*x, eax, (dword, imm)),
1400 |     (0x81, (dword, memgpr, x), (dword, imm))]
1401 | 
1402 | 
1403 | class add(Instruction):
1404 |     _enc_ = _group_1_opcodes(0)
1405 | 
1406 | 
1407 | class or_(Instruction):
1408 |     _enc_ = _group_1_opcodes(1)
1409 |     _name_ = 'or'
1410 | 
1411 | 
1412 | class adc(Instruction):
1413 |     _enc_ = _group_1_opcodes(2)
1414 | 
1415 | 
1416 | class sbb(Instruction):
1417 |     _enc_ = _group_1_opcodes(3)
1418 | 
1419 | 
1420 | class and_(Instruction):
1421 |     _enc_ = _group_1_opcodes(4)
1422 |     _name_ = 'and'
1423 | 
1424 | 
1425 | class sub(Instruction):
1426 |     _enc_ = _group_1_opcodes(5)
1427 | 
1428 | 
1429 | class xor(Instruction):
1430 |     _enc_ = _group_1_opcodes(6)
1431 | 
1432 | 
1433 | class cmp_(Instruction):
1434 |     _enc_ = _group_1_opcodes(7)
1435 |     _name_ = 'cmp'
1436 | 
1437 | 
1438 | class test(Instruction):
1439 |     _enc_ = [
1440 |         (0x84, (byte, memgpr), (byte, gpr)),
1441 |         (0x85, (dword, memgpr), (dword, memgpr)),
1442 |         (0xa8, al, (byte, imm)),
1443 |         (0xa9, eax, (dword, imm)),
1444 |         (0xf6, (byte, memgpr, 0), (byte, imm)),
1445 |         (0xf7, (dword, memgpr, 0), (dword, imm)),
1446 |     ]
1447 | 
1448 | _group_2_opcodes = lambda x: [
1449 |     (0xd0, (byte, memgpr, x), imm(1)),
1450 |     (0xd1, (dword, memgpr, x), imm(1)),
1451 |     (0xd2, (byte, memgpr, x), cl),
1452 |     (0xd3, (dword, memgpr, x), cl),
1453 |     (0xc0, (byte, memgpr, x), (byte, imm)),
1454 |     (0xc1, (dword, memgpr, x), (byte, imm))]
1455 | 
1456 | 
1457 | class rol(Instruction):
1458 |     _enc_ = _group_2_opcodes(0)
1459 | 
1460 | 
1461 | class ror(Instruction):
1462 |     _enc_ = _group_2_opcodes(1)
1463 | 
1464 | 
1465 | class rcl(Instruction):
1466 |     _enc_ = _group_2_opcodes(2)
1467 | 
1468 | 
1469 | class rcr(Instruction):
1470 |     _enc_ = _group_2_opcodes(3)
1471 | 
1472 | 
1473 | class shl(Instruction):
1474 |     _enc_ = _group_2_opcodes(4)
1475 | 
1476 | 
1477 | class shr(Instruction):
1478 |     _enc_ = _group_2_opcodes(5)
1479 | 
1480 | 
1481 | class sal(Instruction):
1482 |     _enc_ = _group_2_opcodes(4)
1483 | 
1484 | 
1485 | class sar(Instruction):
1486 |     _enc_ = _group_2_opcodes(7)
1487 | 
1488 | _group_3_opcodes = lambda x: [
1489 |     (0xf6, (byte, memgpr, x)),
1490 |     (0xf7, (dword, memgpr, x))]
1491 | 
1492 | 
1493 | class not_(Instruction):
1494 |     _enc_ = _group_3_opcodes(2)
1495 |     _name_ = 'not'
1496 | 
1497 | 
1498 | class neg(Instruction):
1499 |     _enc_ = _group_3_opcodes(3)
1500 | 
1501 | 
1502 | class mul(Instruction):
1503 |     _enc_ = _group_3_opcodes(4)
1504 | 
1505 | 
1506 | class imul(Instruction):
1507 |     _enc_ = _group_3_opcodes(5) + [
1508 |         ('\x0f\xaf', (dword, gpr), (dword, memgpr)),
1509 |         (0x6b, (dword, gpr), (dword, memgpr), (byte, signed_imm)),
1510 |         (0x69, (dword, gpr), (dword, memgpr), (dword, imm))
1511 |     ]
1512 | 
1513 | 
1514 | class div(Instruction):
1515 |     _enc_ = _group_3_opcodes(6)
1516 | 
1517 | 
1518 | class idiv(Instruction):
1519 |     _enc_ = _group_3_opcodes(7)
1520 | 
1521 | 
1522 | class movsb(Instruction):
1523 |     _opcode_ = 0xa4
1524 | 
1525 | 
1526 | class movsd(Instruction):
1527 |     _opcode_ = 0xa5
1528 | 
1529 | 
1530 | class cmpsb(Instruction):
1531 |     _opcode_ = 0xa6
1532 | 
1533 | 
1534 | class cmpsd(Instruction):
1535 |     _opcode_ = 0xa7
1536 | 
1537 | 
1538 | class pushf(Instruction):
1539 |     _opcode_ = 0x9c
1540 | 
1541 | 
1542 | class popf(Instruction):
1543 |     _opcode_ = 0x9d
1544 | 
1545 | 
1546 | class cpuid(Instruction):
1547 |     _opcode_ = '\x0f\xa2'
1548 | 
1549 | 
1550 | class sysenter(Instruction):
1551 |     _opcode_ = '\x0f\x34'
1552 | 
1553 | 
1554 | class fninit(Instruction):
1555 |     _opcode_ = '\xdb\xe3'
1556 | 
1557 | 
1558 | class cdq(Instruction):
1559 |     _opcode_ = 0x99
1560 | 
1561 | 
1562 | class cld(Instruction):
1563 |     _opcode_ = 0xfc
1564 | 


--------------------------------------------------------------------------------