Opcode | 13 |Instruction | 14 |Op/En | 15 |64-bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
37 | 20 |AAA | 21 |NP | 22 |Invalid | 23 |Valid | 24 |ASCII adjust AL after addition. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Adjusts the sum of two unpacked BCD values to create an unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAA instruction is only useful when it follows an ADD instruction that adds (binary addition) two unpacked BCD values and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result.
41 |If the addition produces a decimal carry, the AH register increments by 1, and the CF and AF flags are set. If there was no decimal carry, the CF and AF flags are cleared and the AH register is unchanged. In either case, bits 4 through 7 of the AL register are set to 0.
42 |This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.
43 |IF 64-Bit Mode 45 | THEN 46 | #UD; 47 | ELSE 48 | IF ((AL AND 0FH) > 9) or (AF = 1) 49 | THEN 50 | AX ← AX + 106H; 51 | AF ← 1; 52 | CF ← 1; 53 | ELSE 54 | AF ← 0; 55 | CF ← 0; 56 | FI; 57 | AL ← AL AND 0FH; 58 | FI;59 |
The AF and CF flags are set to 1 if the adjustment results in a decimal carry; otherwise they are set to 0. The OF, SF, ZF, and PF flags are undefined.
61 |#UD | 65 |If the LOCK prefix is used. |
Same exceptions as protected mode.
68 |Same exceptions as protected mode.
70 |Same exceptions as protected mode.
72 |#UD | 76 |If in 64-bit mode. |
Opcode | 13 |Instruction | 14 |Op/En | 15 |64-bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
3F | 20 |AAS | 21 |NP | 22 |Invalid | 23 |Valid | 24 |ASCII adjust AL after subtraction. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Adjusts the result of the subtraction of two unpacked BCD values to create a unpacked BCD result. The AL register is the implied source and destination operand for this instruction. The AAS instruction is only useful when it follows a SUB instruction that subtracts (binary subtraction) one unpacked BCD value from another and stores a byte result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result.
41 |If the subtraction produced a decimal carry, the AH register decrements by 1, and the CF and AF flags are set. If no decimal carry occurred, the CF and AF flags are cleared, and the AH register is unchanged. In either case, the AL register is left with its top four bits set to 0.
42 |This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode.
43 |IF 64-bit mode 45 | THEN 46 | #UD; 47 | ELSE 48 | IF ((AL AND 0FH) > 9) or (AF = 1) 49 | THEN 50 | AX ← AX – 6; 51 | AH ← AH – 1; 52 | AF ← 1; 53 | CF ← 1; 54 | AL ← AL AND 0FH; 55 | ELSE 56 | CF ← 0; 57 | AF ← 0; 58 | AL ← AL AND 0FH; 59 | FI; 60 | FI;61 |
The AF and CF flags are set to 1 if there is a decimal borrow; otherwise, they are cleared to 0. The OF, SF, ZF, and PF flags are undefined.
63 |#UD | 67 |If the LOCK prefix is used. |
Same exceptions as protected mode.
70 |Same exceptions as protected mode.
72 |Same exceptions as protected mode.
74 |#UD | 78 |If in 64-bit mode. |
Opcode/Instruction | 13 |Op/En | 14 |64/32-bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
66 0F 38 DE /r AESDEC xmm1, xmm2/m128 | 19 |RM | 20 |V/V | 21 |AES | 22 |Perform one round of an AES decryption flow, using the Equivalent Inverse Cipher, operating on a 128-bit data (state) from xmm1 with a 128-bit round key from xmm2/m128. |
VEX.NDS.128.66.0F38.WIG DE /r VAESDEC xmm1, xmm2, xmm3/m128 | 25 |RVM | 26 |V/V | 27 |Both AES and AVX flags | 28 |Perform one round of an AES decryption flow, using the Equivalent Inverse Cipher, operating on a 128-bit data (state) from xmm2 with a 128-bit round key from xmm3/m128; store the result in xmm1. |
Op/En | 33 |Operand 1 | 34 |Operand2 | 35 |Operand3 | 36 |Operand4 |
RM | 39 |ModRM:reg (r, w) | 40 |ModRM:r/m (r) | 41 |NA | 42 |NA |
RVM | 45 |ModRM:reg (w) | 46 |VEX.vvvv (r) | 47 |ModRM:r/m (r) | 48 |NA |
This instruction performs a single round of the AES decryption flow using the Equivalent Inverse Cipher, with the round key from the second source operand, operating on a 128-bit data (state) from the first source operand, and store the result in the destination operand.
51 |Use the AESDEC instruction for all but the last decryption round. For the last decryption round, use the AESDE-CLAST instruction.
52 |128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.
53 |VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.
54 |AESDEC
56 |STATE ← SRC1; 57 | RoundKey ← SRC2; 58 | STATE ← InvShiftRows( STATE ); 59 | STATE ← InvSubBytes( STATE ); 60 | STATE ← InvMixColumns( STATE ); 61 | DEST[127:0] ← STATE XOR RoundKey; 62 | DEST[VLMAX-1:128] (Unmodified)63 |
VAESDEC
64 |STATE ← SRC1; 65 | RoundKey ← SRC2; 66 | STATE ← InvShiftRows( STATE ); 67 | STATE ← InvSubBytes( STATE ); 68 | STATE ← InvMixColumns( STATE ); 69 | DEST[127:0] ← STATE XOR RoundKey; 70 | DEST[VLMAX-1:128] ← 071 |
(V)AESDEC:
73 |__m128i _mm_aesdec (__m128i, __m128i)
74 |None
76 |See Exceptions Type 4.
-------------------------------------------------------------------------------- /html/AESDECLAST.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32-bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
66 0F 38 DF /r AESDECLAST xmm1, xmm2/m128 | 19 |RM | 20 |V/V | 21 |AES | 22 |Perform the last round of an AES decryption flow, using the Equivalent Inverse Cipher, operating on a 128-bit data (state) from xmm1 with a 128-bit round key from xmm2/m128. |
VEX.NDS.128.66.0F38.WIG DF /r VAESDECLAST xmm1, xmm2, xmm3/m128 | 25 |RVM | 26 |V/V | 27 |Both AES and AVX flags | 28 |Perform the last round of an AES decryption flow, using the Equivalent Inverse Cipher, operating on a 128-bit data (state) from xmm2 with a 128-bit round key from xmm3/m128; store the result in xmm1. |
Op/En | 33 |Operand 1 | 34 |Operand2 | 35 |Operand3 | 36 |Operand4 |
RM | 39 |ModRM:reg (r, w) | 40 |ModRM:r/m (r) | 41 |NA | 42 |NA |
RVM | 45 |ModRM:reg (w) | 46 |VEX.vvvv (r) | 47 |ModRM:r/m (r) | 48 |NA |
This instruction performs the last round of the AES decryption flow using the Equivalent Inverse Cipher, with the round key from the second source operand, operating on a 128-bit data (state) from the first source operand, and store the result in the destination operand.
51 |128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.
52 |VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.
53 |AESDECLAST
55 |STATE ← SRC1; 56 | RoundKey ← SRC2; 57 | STATE ← InvShiftRows( STATE ); 58 | STATE ← InvSubBytes( STATE ); 59 | DEST[127:0] ← STATE XOR RoundKey; 60 | DEST[VLMAX-1:128] (Unmodified)61 |
VAESDECLAST
62 |STATE ← SRC1; 63 | RoundKey ← SRC2; 64 | STATE ← InvShiftRows( STATE ); 65 | STATE ← InvSubBytes( STATE ); 66 | DEST[127:0] ← STATE XOR RoundKey; 67 | DEST[VLMAX-1:128] ← 068 |
(V)AESDECLAST:
70 |__m128i _mm_aesdeclast (__m128i, __m128i)
71 |None
73 |See Exceptions Type 4.
-------------------------------------------------------------------------------- /html/AESENC.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32-bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
66 0F 38 DC /r AESENC xmm1, xmm2/m128 | 19 |RM | 20 |V/V | 21 |AES | 22 |Perform one round of an AES encryption flow, operating on a 128-bit data (state) from xmm1 with a 128-bit round key from xmm2/m128. |
VEX.NDS.128.66.0F38.WIG DC /r VAESENC xmm1, xmm2, xmm3/m128 | 25 |RVM | 26 |V/V | 27 |Both AES and AVX flags | 28 |Perform one round of an AES encryption flow, operating on a 128-bit data (state) from xmm2 with a 128-bit round key from the xmm3/m128; store the result in xmm1. |
Op/En | 33 |Operand 1 | 34 |Operand2 | 35 |Operand3 | 36 |Operand4 |
RM | 39 |ModRM:reg (r, w) | 40 |ModRM:r/m (r) | 41 |NA | 42 |NA |
RVM | 45 |ModRM:reg (w) | 46 |VEX.vvvv (r) | 47 |ModRM:r/m (r) | 48 |NA |
This instruction performs a single round of an AES encryption flow using a round key from the second source operand, operating on 128-bit data (state) from the first source operand, and store the result in the destination operand.
51 |Use the AESENC instruction for all but the last encryption rounds. For the last encryption round, use the AESENC-CLAST instruction.
52 |128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.
53 |VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.
54 |AESENC
56 |STATE ← SRC1; 57 | RoundKey ← SRC2; 58 | STATE ← ShiftRows( STATE ); 59 | STATE ← SubBytes( STATE ); 60 | STATE ← MixColumns( STATE ); 61 | DEST[127:0] ← STATE XOR RoundKey; 62 | DEST[VLMAX-1:128] (Unmodified)63 |
VAESENC
64 |STATE (cid:197) SRC1; 65 | RoundKey (cid:197) SRC2; 66 | STATE (cid:197) ShiftRows( STATE ); 67 | STATE (cid:197) SubBytes( STATE ); 68 | STATE (cid:197) MixColumns( STATE ); 69 | DEST[127:0] (cid:197) STATE XOR RoundKey; 70 | DEST[VLMAX-1:128] (cid:197) 071 |
(V)AESENC:
73 |__m128i _mm_aesenc (__m128i, __m128i)
74 |None
76 |See Exceptions Type 4.
-------------------------------------------------------------------------------- /html/AESENCLAST.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32-bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
66 0F 38 DD /r AESENCLAST xmm1, xmm2/m128 | 19 |RM | 20 |V/V | 21 |AES | 22 |Perform the last round of an AES encryption flow, operating on a 128-bit data (state) from xmm1 with a 128-bit round key from xmm2/m128. |
VEX.NDS.128.66.0F38.WIG DD /r VAESENCLAST xmm1, xmm2, xmm3/m128 | 25 |RVM | 26 |V/V | 27 |Both AES and AVX flags | 28 |Perform the last round of an AES encryption flow, operating on a 128-bit data (state) from xmm2 with a 128 bit round key from xmm3/m128; store the result in xmm1. |
Op/En | 33 |Operand 1 | 34 |Operand2 | 35 |Operand3 | 36 |Operand4 |
RM | 39 |ModRM:reg (r, w) | 40 |ModRM:r/m (r) | 41 |NA | 42 |NA |
RVM | 45 |ModRM:reg (w) | 46 |VEX.vvvv (r) | 47 |ModRM:r/m (r) | 48 |NA |
This instruction performs the last round of an AES encryption flow using a round key from the second source operand, operating on 128-bit data (state) from the first source operand, and store the result in the destination operand.
51 |128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.
52 |VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination YMM register are zeroed.
53 |AESENCLAST
55 |STATE ← SRC1; 56 | RoundKey ← SRC2; 57 | STATE ← ShiftRows( STATE ); 58 | STATE ← SubBytes( STATE ); 59 | DEST[127:0] ← STATE XOR RoundKey; 60 | DEST[VLMAX-1:128] (Unmodified)61 |
VAESENCLAST
62 |STATE (cid:197) SRC1; 63 | RoundKey (cid:197) SRC2; 64 | STATE (cid:197) ShiftRows( STATE ); 65 | STATE (cid:197) SubBytes( STATE ); 66 | DEST[127:0] (cid:197) STATE XOR RoundKey; 67 | DEST[VLMAX-1:128] (cid:197) 068 |
(V)AESENCLAST:
70 |__m128i _mm_aesenclast (__m128i, __m128i)
71 |None
73 |See Exceptions Type 4.
-------------------------------------------------------------------------------- /html/AESIMC.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32-bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
66 0F 38 DB /r AESIMC xmm1, xmm2/m128 | 19 |RM | 20 |V/V | 21 |AES | 22 |Perform the InvMixColumn transformation on a 128-bit round key from xmm2/m128 and store the result in xmm1. |
VEX.128.66.0F38.WIG DB /r VAESIMC xmm1, xmm2/m128 | 25 |RM | 26 |V/V | 27 |Both AES and AVX flags | 28 |Perform the InvMixColumn transformation on a 128-bit round key from xmm2/m128 and store the result in xmm1. |
Op/En | 33 |Operand 1 | 34 |Operand2 | 35 |Operand3 | 36 |Operand4 |
RM | 39 |ModRM:reg (w) | 40 |ModRM:r/m (r) | 41 |NA | 42 |NA |
Perform the InvMixColumns transformation on the source operand and store the result in the destination operand. The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory loca-tion.
45 |Note: the AESIMC instruction should be applied to the expanded AES round keys (except for the first and last round key) in order to prepare them for decryption using the “Equivalent Inverse Cipher” (defined in FIPS 197).
46 |128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding YMM destination register remain unchanged.
47 |VEX.128 encoded version: Bits (VLMAX-1:128) of the destination YMM register are zeroed.
48 |Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.
49 |AESIMC
51 |DEST[127:0] ← InvMixColumns( SRC ); 52 | DEST[VLMAX-1:128] (Unmodified)53 |
VAESIMC
54 |DEST[127:0] (cid:197) InvMixColumns( SRC ); 55 | DEST[VLMAX-1:128] (cid:197) 0;56 |
(V)AESIMC:
58 |__m128i _mm_aesimc (__m128i)
59 |None
61 |See Exceptions Type 4; additionally
63 |#UD | 66 |If VEX.vvvv ≠ 1111B. |
Opcode/Instruction | 13 |Op/En | 14 |64/32 -bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.NDS.LZ.0F38.W0 F2 /r ANDN r32a, r32b, r/m32 | 19 |RVM | 20 |V/V | 21 |BMI1 | 22 |Bitwise AND of inverted r32b with r/m32, store result in r32a. |
VEX.NDS.LZ. 0F38.W1 F2 /r ANDN r64a, r64b, r/m64 | 25 |RVM | 26 |V/NE | 27 |BMI1 | 28 |Bitwise AND of inverted r64b with r/m64, store result in r64a. |
Op/En | 33 |Operand 1 | 34 |Operand 2 | 35 |Operand 3 | 36 |Operand 4 |
RVM | 39 |ModRM:reg (w) | 40 |VEX.vvvv (r) | 41 |ModRM:r/m (r) | 42 |NA |
Performs a bitwise logical AND of inverted second operand (the first source operand) with the third operand (the second source operand). The result is stored in the first operand (destination operand).
45 |This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.
46 |DEST ← (NOT SRC1) bitwiseAND SRC2; 48 | SF ← DEST[OperandSize -1]; 49 | ZF ← (DEST = 0);50 |
SF and ZF are updated based on result. OF and CF flags are cleared. AF and PF flags are undefined.
52 |Auto-generated from high-level language.
54 |None
56 |See Section 2.5.1, “Exception Conditions for VEX-Encoded GPR Instructions”, Table 2-29; additionally
58 |#UD | 61 |If VEX.W = 1. |
Opcode/Instruction | 13 |Op/En | 14 |64/32 -bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.NDD.LZ.0F38.W0 F3 /3 BLSI r32, r/m32 | 19 |VM | 20 |V/V | 21 |BMI1 | 22 |Extract lowest set bit from r/m32 and set that bit in r32. |
VEX.NDD.LZ.0F38.W1 F3 /3 BLSI r64, r/m64 | 25 |VM | 26 |V/N.E. | 27 |BMI1 | 28 |Extract lowest set bit from r/m64, and set that bit in r64. |
Op/En | 33 |Operand 1 | 34 |Operand 2 | 35 |Operand 3 | 36 |Operand 4 |
VM | 39 |VEX.vvvv (w) | 40 |ModRM:r/m (r) | 41 |NA | 42 |NA |
Extracts the lowest set bit from the source operand and set the corresponding bit in the destination register. All other bits in the destination operand are zeroed. If no bits are set in the source operand, BLSI sets all the bits in the destination to 0 and sets ZF and CF.
45 |This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.
46 |temp ← (-SRC) bitwiseAND (SRC); 48 | SF ← temp[OperandSize -1]; 49 | ZF ← (temp = 0); 50 | IF SRC = 0 51 | CF ← 0; 52 | ELSE 53 | CF ← 1; 54 | FI 55 | DEST ← temp;56 |
ZF and SF are updated based on the result. CF is set if the source is not zero. OF flags are cleared. AF and PF flags are undefined.
58 |BLSI:
60 |unsigned __int32 _blsi_u32(unsigned __int32 src);
61 |BLSI:
62 |unsigned __int64 _blsi_u64(unsigned __int64 src);
63 |None
65 |See Section 2.5.1, “Exception Conditions for VEX-Encoded GPR Instructions”, Table 2-29; additionally
67 |#UD | 70 |If VEX.W = 1. |
Opcode/Instruction | 13 |Op/En | 14 |64/32 -bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.NDD.LZ.0F38.W0 F3 /2 BLSMSK r32, r/m32 | 19 |VM | 20 |V/V | 21 |BMI1 | 22 |Set all lower bits in r32 to “1” starting from bit 0 to lowest set bit in r/m32. |
VEX.NDD.LZ.0F38.W1 F3 /2 BLSMSK r64, r/m64 | 25 |VM | 26 |V/N.E. | 27 |BMI1 | 28 |Set all lower bits in r64 to “1” starting from bit 0 to lowest set bit in r/m64. |
Op/En | 33 |Operand 1 | 34 |Operand 2 | 35 |Operand 3 | 36 |Operand 4 |
VM | 39 |VEX.vvvv (w) | 40 |ModRM:r/m (r) | 41 |NA | 42 |NA |
Sets all the lower bits of the destination operand to “1” up to and including lowest set bit (=1) in the source operand. If source operand is zero, BLSMSK sets all bits of the destination operand to 1 and also sets CF to 1.
45 |This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.
46 |temp ← (SRC-1) XOR (SRC) ; 48 | SF ← temp[OperandSize -1]; 49 | ZF ← 0; 50 | IF SRC = 0 51 | CF ← 1; 52 | ELSE 53 | CF ← 0; 54 | FI 55 | DEST ← temp;56 |
SF is updated based on the result. CF is set if the source if zero. ZF and OF flags are cleared. AF and PF flag are undefined.
58 |BLSMSK:
60 |unsigned __int32 _blsmsk_u32(unsigned __int32 src);
61 |BLSMSK:
62 |unsigned __int64 _blsmsk_u64(unsigned __int64 src);
63 |None
65 |See Section 2.5.1, “Exception Conditions for VEX-Encoded GPR Instructions”, Table 2-29; additionally
67 |#UD | 70 |If VEX.W = 1. |
Opcode/Instruction | 13 |Op/En | 14 |64/32 -bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.NDD.LZ.0F38.W0 F3 /1 BLSR r32, r/m32 | 19 |VM | 20 |V/V | 21 |BMI1 | 22 |Reset lowest set bit of r/m32, keep all other bits of r/m32 and write result to r32. |
VEX.NDD.LZ.0F38.W1 F3 /1 BLSR r64, r/m64 | 25 |VM | 26 |V/N.E. | 27 |BMI1 | 28 |Reset lowest set bit of r/m64, keep all other bits of r/m64 and write result to r64. |
Op/En | 33 |Operand 1 | 34 |Operand 2 | 35 |Operand 3 | 36 |Operand 4 |
VM | 39 |VEX.vvvv (w) | 40 |ModRM:r/m (r) | 41 |NA | 42 |NA |
Copies all bits from the source operand to the destination operand and resets (=0) the bit position in the destina-tion operand that corresponds to the lowest set bit of the source operand. If the source operand is zero BLSR sets CF.
45 |This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.
46 |temp ← (SRC-1) bitwiseAND ( SRC ); 48 | SF ← temp[OperandSize -1]; 49 | ZF ← (temp = 0); 50 | IF SRC = 0 51 | CF ← 1; 52 | ELSE 53 | CF ← 0; 54 | FI 55 | DEST ← temp;56 |
ZF and SF flags are updated based on the result. CF is set if the source is zero. OF flag is cleared. AF and PF flags are undefined.
58 |BLSR:
60 |unsigned __int32 _blsr_u32(unsigned __int32 src);
61 |BLSR:
62 |unsigned __int64 _blsr_u64(unsigned __int64 src);
63 |None
65 |See Section 2.5.1, “Exception Conditions for VEX-Encoded GPR Instructions”, Table 2-29; additionally
67 |#UD | 70 |If VEX.W = 1. |
Opcode | 13 |Instruction | 14 |Op/En | 15 |64-bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
0F C8+rd | 20 |BSWAP r32 | 21 |O | 22 |Valid* | 23 |Valid | 24 |Reverses the byte order of a 32-bit register. |
REX.W + 0F C8+rd | 27 |BSWAP r64 | 28 |O | 29 |Valid | 30 |N.E. | 31 |Reverses the byte order of a 64-bit register. |
NOTES:
33 |*
34 |See IA-32 Architecture Compatibility section below.
35 |Op/En | 39 |Operand 1 | 40 |Operand 2 | 41 |Operand 3 | 42 |Operand 4 |
O | 45 |opcode + rd (r, w) | 46 |NA | 47 |NA | 48 |NA |
Reverses the byte order of a 32-bit or 64-bit (destination) register. This instruction is provided for converting little-endian values to big-endian format and vice versa. To swap bytes in a word value (16-bit register), use the XCHG instruction. When the BSWAP instruction references a 16-bit register, the result is undefined.
51 |In 64-bit mode, the instruction’s default operation size is 32 bits. Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See the summary chart at the beginning of this section for encoding data and limits.
52 |The BSWAP instruction is not supported on IA-32 processors earlier than the Intel486™ processor family. For compatibility with this instruction, software should include functionally equivalent code for execution on Intel processors earlier than the Intel486 processor family.
54 |TEMP ← DEST 56 | IF 64-bit mode AND OperandSize = 64 57 | THEN 58 | DEST[7:0] ← TEMP[63:56]; 59 | DEST[15:8] ← TEMP[55:48]; 60 | DEST[23:16] ← TEMP[47:40]; 61 | DEST[31:24] ← TEMP[39:32]; 62 | DEST[39:32] ← TEMP[31:24]; 63 | DEST[47:40] ← TEMP[23:16]; 64 | DEST[55:48] ← TEMP[15:8]; 65 | DEST[63:56] ← TEMP[7:0]; 66 | ELSE 67 | DEST[7:0] ← TEMP[31:24]; 68 | DEST[15:8] ← TEMP[23:16]; 69 | DEST[23:16] ← TEMP[15:8]; 70 | DEST[31:24] ← TEMP[7:0]; 71 | FI;72 |
None.
74 |#UD
76 |If the LOCK prefix is used.
-------------------------------------------------------------------------------- /html/BZHI.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |
13 | Opcode/Instruction 14 |VEX.NDS1.LZ.0F38.W0 F5 /r BZHI r32a, r/m32, r32b 15 |VEX.NDS1.LZ.0F38.W1 F5 /r BZHI r64a, r/m64, r64b |
16 |
17 | Op/En 18 |RMV 19 |RMV |
20 |
21 | 64/32 -bit Mode 22 |V/V 23 |V/N.E. |
24 |
25 | CPUID Feature Flag 26 |BMI2 27 |BMI2 |
28 |
29 | Description 30 |Zero bits in r/m32 starting with the position in r32b, write result to r32a. 31 |Zero bits in r/m64 starting with the position in r64b, write result to r64a. |
NOTES:
33 |1. ModRM:r/m is used to encode the first source operand (second operand) and VEX.vvvv encodes the second source operand (third
34 |operand).
35 |Op/En | 39 |Operand 1 | 40 |Operand 2 | 41 |Operand 3 | 42 |Operand 4 |
RMV | 45 |ModRM:reg (w) | 46 |ModRM:r/m (r) | 47 |VEX.vvvv (r) | 48 |NA |
BZHI copies the bits of the first source operand (the second operand) into the destination operand (the first operand) and clears the higher bits in the destination according to the INDEX value specified by the second source operand (the third operand). The INDEX is specified by bits 7:0 of the second source operand. The INDEX value is saturated at the value of OperandSize -1. CF is set, if the number contained in the 8 low bits of the third operand is greater than OperandSize -1.
51 |This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.
52 |N ← SRC2[7:0] 54 | DEST ← SRC1 55 | IF (N < OperandSize) 56 | DEST[OperandSize-1:N] ← 0 57 | FI 58 | IF (N > OperandSize - 1) 59 | CF ← 1 60 | ELSE 61 | CF ← 0 62 | FI63 |
ZF, CF and SF flags are updated based on the result. OF flag is cleared. AF and PF flags are undefined.
65 |BZHI:
67 |unsigned __int32 _bzhi_u32(unsigned __int32 src, unsigned __int32 index);
68 |BZHI:
69 |unsigned __int64 _bzhi_u64(unsigned __int64 src, unsigned __int32 index);
70 |None
72 |See Section 2.5.1, “Exception Conditions for VEX-Encoded GPR Instructions”, Table 2-29; additionally
74 |#UD | 77 |If VEX.W = 1. |
Opcode | 13 |Instruction | 14 |Op/En | 15 |64-bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
98 | 20 |CBW | 21 |NP | 22 |Valid | 23 |Valid | 24 |AX ← sign-extend of AL. |
98 | 27 |CWDE | 28 |NP | 29 |Valid | 30 |Valid | 31 |EAX ← sign-extend of AX. |
REX.W + 98 | 34 |CDQE | 35 |NP | 36 |Valid | 37 |N.E. | 38 |RAX ← sign-extend of EAX. |
Op/En | 43 |Operand 1 | 44 |Operand 2 | 45 |Operand 3 | 46 |Operand 4 |
NP | 49 |NA | 50 |NA | 51 |NA | 52 |NA |
Double the size of the source operand by means of sign extension. The CBW (convert byte to word) instruction copies the sign (bit 7) in the source operand into every bit in the AH register. The CWDE (convert word to double-word) instruction copies the sign (bit 15) of the word in the AX register into the high 16 bits of the EAX register.
55 |CBW and CWDE reference the same opcode. The CBW instruction is intended for use when the operand-size attri-bute is 16; CWDE is intended for use when the operand-size attribute is 32. Some assemblers may force the operand size. Others may treat these two mnemonics as synonyms (CBW/CWDE) and use the setting of the operand-size attribute to determine the size of values to be converted.
56 |In 64-bit mode, the default operation size is the size of the destination register. Use of the REX.W prefix promotes this instruction (CDQE when promoted) to operate on 64-bit operands. In which case, CDQE copies the sign (bit 31) of the doubleword in the EAX register into the high 32 bits of RAX.
57 |IF OperandSize = 16 (* Instruction = CBW *) 59 | THEN 60 | AX ← SignExtend(AL); 61 | ELSE IF (OperandSize = 32, Instruction = CWDE) 62 | EAX ← SignExtend(AX); FI; 63 | ELSE (* 64-Bit Mode, OperandSize = 64, Instruction = CDQE*) 64 | RAX ← SignExtend(EAX); 65 | FI;66 |
None.
68 |#UD
70 |If the LOCK prefix is used.
-------------------------------------------------------------------------------- /html/CLAC.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |Op/En | 15 |64-bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
0F 01 CA | 20 |CLAC | 21 |NP | 22 |Valid | 23 |Valid | 24 |Clear the AC flag in the EFLAGS register. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Clears the AC flag bit in EFLAGS register. This disables any alignment checking of user-mode data accesses. If the SMAP bit is set in the CR4 register, this disallows explicit supervisor-mode data accesses to user-mode pages.
41 |This instruction's operation is the same in non-64-bit modes and 64-bit mode. Attempts to execute CLAC when CPL > 0 cause #UD.
42 |EFLAGS.AC ← 0;44 |
AC cleared. Other flags are unaffected.
46 |#UD | 50 |
51 | If the LOCK prefix is used. 52 |If the CPL > 0. 53 |If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0. |
#UD | 58 |
59 | If the LOCK prefix is used. 60 |If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0. |
#UD | 65 |The CLAC instruction is not recognized in virtual-8086 mode. |
#UD | 70 |
71 | If the LOCK prefix is used. 72 |If the CPL > 0. 73 |If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0. |
#UD | 78 |
79 | If the LOCK prefix is used. 80 |If the CPL > 0. 81 |If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0. |
Opcode | 13 |Instruction | 14 |Op/En | 15 |64-bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
F8 | 20 |CLC | 21 |NP | 22 |Valid | 23 |Valid | 24 |Clear CF flag. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Clears the CF flag in the EFLAGS register. Operation is the same in all modes.
41 |CF ← 0;43 |
The CF flag is set to 0. The OF, ZF, SF, AF, and PF flags are unaffected.
45 |#UD
47 |If the LOCK prefix is used.
-------------------------------------------------------------------------------- /html/CLD.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |Op/En | 15 |64-bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
FC | 20 |CLD | 21 |NP | 22 |Valid | 23 |Valid | 24 |Clear DF flag. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Clears the DF flag in the EFLAGS register. When the DF flag is set to 0, string operations increment the index regis-ters (ESI and/or EDI). Operation is the same in all modes.
41 |DF ← 0;43 |
The DF flag is set to 0. The CF, OF, ZF, SF, AF, and PF flags are unaffected.
45 |#UD
47 |If the LOCK prefix is used.
-------------------------------------------------------------------------------- /html/CLTS.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |Op/En | 15 |64-bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
0F 06 | 20 |CLTS | 21 |NP | 22 |Valid | 23 |Valid | 24 |Clears TS flag in CR0. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Clears the task-switched (TS) flag in the CR0 register. This instruction is intended for use in operating-system procedures. It is a privileged instruction that can only be executed at a CPL of 0. It is allowed to be executed in real-address mode to allow initialization for protected mode.
41 |The processor sets the TS flag every time a task switch occurs. The flag is used to synchronize the saving of FPU context in multitasking applications. See the description of the TS flag in the section titled “Control Registers” in Chapter 2 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A, for more information about this flag.
42 |CLTS operation is the same in non-64-bit modes and 64-bit mode.
43 |See Chapter 25, “VMX Non-Root Operation,” of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C, for more information about the behavior of this instruction in VMX non-root operation.
44 |CR0.TS[bit 3] ← 0;46 |
The TS flag in CR0 register is cleared.
48 |#GP(0) | 52 |If the current privilege level is not 0. |
#UD | 55 |If the LOCK prefix is used. |
#UD | 60 |If the LOCK prefix is used. |
#GP(0) | 65 |CLTS is not recognized in virtual-8086 mode. |
#UD | 68 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
71 |#GP(0) | 75 |If the CPL is greater than 0. |
#UD | 78 |If the LOCK prefix is used. |
Opcode | 13 |Instruction | 14 |Op/En | 15 |64-bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
F5 | 20 |CMC | 21 |NP | 22 |Valid | 23 |Valid | 24 |Complement CF flag. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Complements the CF flag in the EFLAGS register. CMC operation is the same in non-64-bit modes and 64-bit mode.
41 |EFLAGS.CF[bit 0]← NOT EFLAGS.CF[bit 0];43 |
The CF flag contains the complement of its original value. The OF, ZF, SF, AF, and PF flags are unaffected.
45 |#UD
47 |If the LOCK prefix is used.
-------------------------------------------------------------------------------- /html/CVTPD2PI.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
19 | 66 0F 2D /r 20 |CVTPD2PI mm, xmm/m128 |
21 | RM | 22 |Valid | 23 |Valid | 24 |Convert two packed double-precision floating-point values from xmm/m128 to two packed signed doubleword integers in mm. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
RM | 35 |ModRM:reg (w) | 36 |ModRM:r/m (r) | 37 |NA | 38 |NA |
Converts two packed double-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand).
41 |The source operand can be an XMM register or a 128-bit memory location. The destination operand is an MMX tech-nology register.
42 |When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.
43 |This instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTPD2PI instruction is executed.
44 |In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
45 |DEST[31:0] ← Convert_Double_Precision_Floating_Point_To_Integer32(SRC[63:0]); 47 | DEST[63:32] ← Convert_Double_Precision_Floating_Point_To_Integer32(SRC[127:64]);48 |
CVTPD1PI:
50 |__m64 _mm_cvtpd_pi32(__m128d a)
51 |Invalid, Precision.
53 |See Table 22-4, “Exception Conditions for Legacy SIMD/MMX Instructions with FP Exception and 16-Byte Align-ment,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B.
-------------------------------------------------------------------------------- /html/CVTPI2PD.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
19 | 66 0F 2A /r 20 |CVTPI2PD xmm, mm/m64* |
21 | RM | 22 |Valid | 23 |Valid | 24 |Convert two packed signed doubleword integers from mm/mem64 to two packed double-precision floating-point values in xmm. |
NOTES: *Operation is different for different operand sets; see the Description section.
26 |Op/En | 30 |Operand 1 | 31 |Operand 2 | 32 |Operand 3 | 33 |Operand 4 |
RM | 36 |ModRM:reg (w) | 37 |ModRM:r/m (r) | 38 |NA | 39 |NA |
Converts two packed signed doubleword integers in the source operand (second operand) to two packed double-precision floating-point values in the destination operand (first operand).
42 |The source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an XMM register. In addition, depending on the operand configuration:
43 |In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
44 |DEST[63:0] ← Convert_Integer_To_Double_Precision_Floating_Point(SRC[31:0]); 46 | DEST[127:64] ← Convert_Integer_To_Double_Precision_Floating_Point(SRC[63:32]);47 |
CVTPI2PD:
49 |__m128d _mm_cvtpi32_pd(__m64 a)
50 |None
52 |See Table 22-6, “Exception Conditions for Legacy SIMD/MMX Instructions with XMM and without FP Exception,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B.
-------------------------------------------------------------------------------- /html/CVTPI2PS.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
19 | 0F 2A /r 20 |CVTPI2PS xmm, mm/m64 |
21 | RM | 22 |Valid | 23 |Valid | 24 |Convert two signed doubleword integers from mm/m64 to two single-precision floating-point values in xmm. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
RM | 35 |ModRM:reg (w) | 36 |ModRM:r/m (r) | 37 |NA | 38 |NA |
Converts two packed signed doubleword integers in the source operand (second operand) to two packed single-precision floating-point values in the destination operand (first operand).
41 |The source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an XMM register. The results are stored in the low quadword of the destination operand, and the high quadword remains unchanged. When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register.
42 |This instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTPI2PS instruction is executed.
43 |In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
44 |DEST[31:0] ← Convert_Integer_To_Single_Precision_Floating_Point(SRC[31:0]); 46 | DEST[63:32] ← Convert_Integer_To_Single_Precision_Floating_Point(SRC[63:32]); 47 | (* High quadword of destination unchanged *)48 |
CVTPI2PS:
50 |__m128 _mm_cvtpi32_ps(__m128 a, __m64 b)
51 |Precision
53 |See Table 22-5, “Exception Conditions for Legacy SIMD/MMX Instructions with XMM and FP Exception,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B.
-------------------------------------------------------------------------------- /html/CVTPS2PI.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
19 | 0F 2D /r 20 |CVTPS2PI mm, xmm/m64 |
21 | RM | 22 |Valid | 23 |Valid | 24 |Convert two packed single-precision floating-point values from xmm/m64 to two packed signed doubleword integers in mm. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
RM | 35 |ModRM:reg (w) | 36 |ModRM:r/m (r) | 37 |NA | 38 |NA |
Converts two packed single-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand).
41 |The source operand can be an XMM register or a 128-bit memory location. The destination operand is an MMX tech-nology register. When the source operand is an XMM register, the two single-precision floating-point values are contained in the low quadword of the register. When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indef-inite integer value (80000000H) is returned.
42 |CVTPS2PI causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTPS2PI instruction is executed.
43 |In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
44 |DEST[31:0] ← Convert_Single_Precision_Floating_Point_To_Integer(SRC[31:0]); 46 | DEST[63:32] ← Convert_Single_Precision_Floating_Point_To_Integer(SRC[63:32]);47 |
CVTPS2PI:
49 |__m64 _mm_cvtps_pi32(__m128 a)
50 |Invalid, Precision
52 |See Table 22-5, “Exception Conditions for Legacy SIMD/MMX Instructions with XMM and FP Exception,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B.
-------------------------------------------------------------------------------- /html/CVTTPD2PI.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
19 | 66 0F 2C /r 20 |CVTTPD2PI mm, xmm/m128 |
21 | RM | 22 |Valid | 23 |Valid | 24 |Convert two packer double-precision floating-point values from xmm/m128 to two packed signed doubleword integers in mm using truncation. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
RM | 35 |ModRM:reg (w) | 36 |ModRM:r/m (r) | 37 |NA | 38 |NA |
Converts two packed double-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand). The source operand can be an XMM register or a 128-bit memory location. The destination operand is an MMX technology register.
41 |When a conversion is inexact, a truncated (round toward zero) result is returned. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.
42 |This instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTTPD2PI instruction is executed.
43 |In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
44 |DEST[31:0] ← Convert_Double_Precision_Floating_Point_To_Integer32_Truncate(SRC[63:0]); 46 | DEST[63:32] ← Convert_Double_Precision_Floating_Point_To_Integer32_ 47 | Truncate(SRC[127:64]);48 |
CVTTPD1PI:
50 |__m64 _mm_cvttpd_pi32(__m128d a)
51 |Invalid, Precision
53 |See Table 22-4, “Exception Conditions for Legacy SIMD/MMX Instructions with FP Exception and 16-Byte Align-ment,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B.
-------------------------------------------------------------------------------- /html/CVTTPS2PI.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
19 | 0F 2C /r 20 |CVTTPS2PI mm, xmm/m64 |
21 | RM | 22 |Valid | 23 |Valid | 24 |Convert two single-precision floating-point values from xmm/m64 to two signed doubleword signed integers in mm using truncation. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
RM | 35 |ModRM:reg (w) | 36 |ModRM:r/m (r) | 37 |NA | 38 |NA |
Converts two packed single-precision floating-point values in the source operand (second operand) to two packed signed doubleword integers in the destination operand (first operand). The source operand can be an XMM register or a 64-bit memory location. The destination operand is an MMX technology register. When the source operand is an XMM register, the two single-precision floating-point values are contained in the low quadword of the register.
41 |When a conversion is inexact, a truncated (round toward zero) result is returned. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned.
42 |This instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the CVTTPS2PI instruction is executed.
43 |In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
44 |DEST[31:0] ← Convert_Single_Precision_Floating_Point_To_Integer_Truncate(SRC[31:0]); 46 | DEST[63:32] ← Convert_Single_Precision_Floating_Point_To_Integer_Truncate(SRC[63:32]);47 |
CVTTPS2PI:
49 |__m64 _mm_cvttps_pi32(__m128 a)
50 |Invalid, Precision
52 |See Table 22-5, “Exception Conditions for Legacy SIMD/MMX Instructions with XMM and FP Exception,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3B.
-------------------------------------------------------------------------------- /html/EMMS.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |Op/En | 15 |64-Bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
0F 77 | 20 |EMMS | 21 |NP | 22 |Valid | 23 |Valid | 24 |Set the x87 FPU tag word to empty. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Sets the values of all the tags in the x87 FPU tag word to empty (all 1s). This operation marks the x87 FPU data registers (which are aliased to the MMX technology registers) as available for use by x87 FPU floating-point instruc-tions. (See Figure 8-7 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for the format of the x87 FPU tag word.) All other MMX instructions (other than the EMMS instruction) set all the tags in x87 FPU tag word to valid (all 0s).
41 |The EMMS instruction must be used to clear the MMX technology state at the end of all MMX technology procedures or subroutines and before calling other procedures or subroutines that may execute x87 floating-point instructions. If a floating-point instruction loads one of the registers in the x87 FPU data register stack before the x87 FPU tag word has been reset by the EMMS instruction, an x87 floating-point register stack overflow can occur that will result in an x87 floating-point exception or incorrect result.
42 |EMMS operation is the same in non-64-bit modes and 64-bit mode.
43 |x87FPUTagWord ← FFFFH;45 |
void _mm_empty()
47 |None
49 |#UD | 53 |If CR0.EM[bit 2] = 1. |
#NM | 56 |If CR0.TS[bit 3] = 1. |
#MF | 59 |If there is a pending FPU exception. |
#UD | 62 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
65 |Same exceptions as in protected mode.
67 |Same exceptions as in protected mode.
69 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/F2XM1.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
D9 F0 | 19 |F2XM1 | 20 |Valid | 21 |Valid | 22 |Replace ST(0) with (2ST(0) – 1). |
Computes the exponential value of 2 to the power of the source operand minus 1. The source operand is located in register ST(0) and the result is also stored in ST(0). The value of the source operand must lie in the range –1.0 to +1.0. If the source value is outside this range, the result is undefined.
25 |The following table shows the results obtained when computing the exponential value of various classes of numbers, assuming that neither overflow nor underflow occurs.
26 |ST(0) SRC | 30 |ST(0) DEST |
---|---|
− 1.0 to −0 | 33 |− 0.5 to − 0 |
− 0 | 36 |− 0 |
+ 0 | 39 |+ 0 |
+ 0 to +1.0 | 42 |+ 0 to 1.0 |
Values other than 2 can be exponentiated using the following formula:
44 |xy ← 2(y ∗ log2x)
45 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
46 |ST(0) ← (2ST(0) − 1);48 |
C1 | 52 |
53 | Set to 0 if stack underflow occurred. 54 |Set if result was rounded up; cleared otherwise. |
C0, C2, C3 | 57 |Undefined. |
#IS | 62 |Stack underflow occurred. |
#IA | 65 |Source operand is an SNaN value or unsupported format. |
#D
67 |Source is a denormal value.
68 |#U
69 |Result is too small for destination format.
70 |#P
71 |Value cannot be represented exactly in destination format.
72 |#NM | 76 |CR0.EM[bit 2] or CR0.TS[bit 3] = 1. |
#UD | 79 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
82 |Same exceptions as in protected mode.
84 |Same exceptions as in protected mode.
86 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/FABS.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
D9 E1 | 19 |FABS | 20 |Valid | 21 |Valid | 22 |Replace ST with its absolute value. |
Clears the sign bit of ST(0) to create the absolute value of the operand. The following table shows the results obtained when creating the absolute value of various classes of numbers.
25 |ST(0) SRC | 29 |ST(0) DEST |
---|---|
− ∞ | 32 |+ ∞ |
− F | 35 |+ F |
− 0 | 38 |+ 0 |
+ 0 | 41 |+ 0 |
+ F | 44 |+ F |
+ ∞ | 47 |+ ∞ |
NaN | 50 |NaN |
NOTES:
52 |F Means finite floating-point value.
53 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
54 |ST(0) ← |ST(0)|;56 |
C1 | 60 |Set to 0. |
C0, C2, C3 | 63 |Undefined. |
#IS | 68 |Stack underflow occurred. |
#NM | 73 |CR0.EM[bit 2] or CR0.TS[bit 3] = 1. |
#UD | 76 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
79 |Same exceptions as in protected mode.
81 |Same exceptions as in protected mode.
83 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/FCHS.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
D9 E0 | 19 |FCHS | 20 |Valid | 21 |Valid | 22 |Complements sign of ST(0). |
Complements the sign bit of ST(0). This operation changes a positive value into a negative value of equal magni-tude or vice versa. The following table shows the results obtained when changing the sign of various classes of numbers.
25 |ST(0) SRC | 29 |ST(0) DEST |
---|---|
− ∞ | 32 |+ ∞ |
− F | 35 |+ F |
− 0 | 38 |+ 0 |
+ 0 | 41 |− 0 |
+ F | 44 |− F |
+ ∞ | 47 |− ∞ |
NaN | 50 |NaN |
NOTES:
52 |*
53 |F means finite floating-point value.
54 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
55 |SignBit(ST(0)) ← NOT (SignBit(ST(0)));57 |
C1 | 61 |Set to 0. |
C0, C2, C3 | 64 |Undefined. |
#IS | 69 |Stack underflow occurred. |
#NM | 74 |CR0.EM[bit 2] or CR0.TS[bit 3] = 1. |
#UD | 77 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
80 |Same exceptions as in protected mode.
82 |Same exceptions as in protected mode.
84 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/FDECSTP.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
D9 F6 | 19 |FDECSTP | 20 |Valid | 21 |Valid | 22 |Decrement TOP field in FPU status word. |
Subtracts one from the TOP field of the FPU status word (decrements the top-of-stack pointer). If the TOP field contains a 0, it is set to 7. The effect of this instruction is to rotate the stack by one position. The contents of the FPU data registers and tag register are not affected.
25 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
26 |IF TOP = 0 28 | THEN TOP ← 7; 29 | ELSE TOP ← TOP – 1; 30 | FI;31 |
The C1 flag is set to 0. The C0, C2, and C3 flags are undefined.
33 |None.
35 |#NM | 39 |CR0.EM[bit 2] or CR0.TS[bit 3] = 1. |
#MF | 42 |If there is a pending x87 FPU exception. |
#UD | 45 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
48 |Same exceptions as in protected mode.
50 |Same exceptions as in protected mode.
52 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/FFREE.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
DD C0+i | 19 |FFREE ST(i) | 20 |Valid | 21 |Valid | 22 |Sets tag for ST(i) to empty. |
Sets the tag in the FPU tag register associated with register ST(i) to empty (11B). The contents of ST(i) and the FPU stack-top pointer (TOP) are not affected.
25 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
26 |TAG(i) ← 11B;28 |
C0, C1, C2, C3 undefined.
30 |None
32 |#NM | 36 |CR0.EM[bit 2] or CR0.TS[bit 3] = 1. |
#MF | 39 |If there is a pending x87 FPU exception. |
#UD | 42 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
45 |Same exceptions as in protected mode.
47 |Same exceptions as in protected mode.
49 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/FINCSTP.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
D9 F7 | 19 |FINCSTP | 20 |Valid | 21 |Valid | 22 |Increment the TOP field in the FPU status register. |
Adds one to the TOP field of the FPU status word (increments the top-of-stack pointer). If the TOP field contains a 7, it is set to 0. The effect of this instruction is to rotate the stack by one position. The contents of the FPU data registers and tag register are not affected. This operation is not equivalent to popping the stack, because the tag for the previous top-of-stack register is not marked empty.
25 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
26 |IF TOP = 7 28 | THEN TOP ← 0; 29 | ELSE TOP ← TOP + 1; 30 | FI;31 |
The C1 flag is set to 0. The C0, C2, and C3 flags are undefined.
33 |None
35 |#NM | 39 |CR0.EM[bit 2] or CR0.TS[bit 3] = 1. |
#MF | 42 |If there is a pending x87 FPU exception. |
#UD | 45 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
48 |Same exceptions as in protected mode.
50 |Same exceptions as in protected mode.
52 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/FNOP.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
D9 D0 | 19 |FNOP | 20 |Valid | 21 |Valid | 22 |No operation is performed. |
Performs no FPU operation. This instruction takes up space in the instruction stream but does not affect the FPU or machine context, except the EIP register and the FPU Instruction Pointer.
25 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
26 |C0, C1, C2, C3 undefined.
28 |None
30 |#NM | 34 |CR0.EM[bit 2] or CR0.TS[bit 3] = 1. |
#MF | 37 |If there is a pending x87 FPU exception. |
#UD | 40 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
43 |Same exceptions as in protected mode.
45 |Same exceptions as in protected mode.
47 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/FRNDINT.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
D9 FC | 19 |FRNDINT | 20 |Valid | 21 |Valid | 22 |Round ST(0) to an integer. |
Rounds the source value in the ST(0) register to the nearest integral value, depending on the current rounding mode (setting of the RC field of the FPU control word), and stores the result in ST(0).
25 |If the source value is ∞, the value is not changed. If the source value is not an integral value, the floating-point inexact-result exception (#P) is generated.
26 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
27 |ST(0) ← RoundToIntegralValue(ST(0));29 |
C1 | 33 |
34 | Set to 0 if stack underflow occurred. 35 |Set if result was rounded up; cleared otherwise. |
C0, C2, C3 | 38 |Undefined. |
#IS | 43 |Stack underflow occurred. |
#IA | 46 |Source operand is an SNaN value or unsupported format. |
#D
48 |Source operand is a denormal value.
49 |#P
50 |Source operand is not an integral value.
51 |#NM | 55 |CR0.EM[bit 2] or CR0.TS[bit 3] = 1. |
#MF | 58 |If there is a pending x87 FPU exception. |
#UD | 61 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
64 |Same exceptions as in protected mode.
66 |Same exceptions as in protected mode.
68 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/FSQRT.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
D9 FA | 19 |FSQRT | 20 |Valid | 21 |Valid | 22 |Computes square root of ST(0) and stores the result in ST(0). |
Computes the square root of the source value in the ST(0) register and stores the result in ST(0).
25 |The following table shows the results obtained when taking the square root of various classes of numbers, assuming that neither overflow nor underflow occurs.
26 |SRC (ST(0)) | 30 |DEST (ST(0)) |
---|---|
− ∞ | 33 |* |
− F | 36 |* |
− 0 | 39 |− 0 |
+ 0 | 42 |+ 0 |
+ F | 45 |+ F |
+ ∞ | 48 |+ ∞ |
NaN | 51 |NaN |
NOTES:
53 |F Means finite floating-point value.
54 |*
55 |Indicates floating-point invalid-arithmetic-operand (#IA) exception.
56 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
57 |ST(0) ← SquareRoot(ST(0));59 |
C1 | 63 |
64 | Set to 0 if stack underflow occurred. 65 |Set if result was rounded up; cleared otherwise. |
C0, C2, C3 | 68 |Undefined. |
#IS | 73 |Stack underflow occurred. |
#IA | 76 |
77 | Source operand is an SNaN value or unsupported format. 78 |Source operand is a negative value (except for −0). |
#D
80 |Source operand is a denormal value.
81 |#P
82 |Value cannot be represented exactly in destination format.
83 |#NM | 87 |CR0.EM[bit 2] or CR0.TS[bit 3] = 1. |
#MF | 90 |If there is a pending x87 FPU exception. |
#UD | 93 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
96 |Same exceptions as in protected mode.
98 |Same exceptions as in protected mode.
100 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/FXAM.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
D9 E5 | 19 |FXAM | 20 |Valid | 21 |Valid | 22 |Classify value or number in ST(0). |
Examines the contents of the ST(0) register and sets the condition code flags C0, C2, and C3 in the FPU status word to indicate the class of value or number in the register (see the table below).
25 |.
27 |Class | 30 |C3 | 31 |C2 | 32 |C0 |
---|---|---|---|
Unsupported | 35 |0 | 36 |0 | 37 |0 |
NaN | 40 |0 | 41 |0 | 42 |1 |
Normal finite number | 45 |0 | 46 |1 | 47 |0 |
Infinity | 50 |0 | 51 |1 | 52 |1 |
Zero | 55 |1 | 56 |0 | 57 |0 |
Empty | 60 |1 | 61 |0 | 62 |1 |
Denormal number | 65 |1 | 66 |1 | 67 |0 |
The C1 flag is set to the sign of the value in ST(0), regardless of whether the register is empty or full.
69 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
70 |C1 ← sign bit of ST; (* 0 for positive, 1 for negative *) 72 | CASE (class of value or number in ST(0)) OF 73 | Unsupported:C3, C2, C0 ← 000; 74 | NaN: 75 | C3, C2, C0 ← 001; 76 | Normal: 77 | C3, C2, C0 ← 010; 78 | Infinity: 79 | C3, C2, C0 ← 011; 80 | Zero: 81 | C3, C2, C0 ← 100; 82 | Empty: 83 | C3, C2, C0 ← 101; 84 | Denormal: 85 | C3, C2, C0 ← 110; 86 | ESAC;87 |
C1 | 91 |Sign of value in ST(0). |
C0, C2, C3 | 94 |See Table 3-42. |
None
97 |#NM | 101 |CR0.EM[bit 2] or CR0.TS[bit 3] = 1. |
#MF | 104 |If there is a pending x87 FPU exception. |
#UD | 107 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
110 |Same exceptions as in protected mode.
112 |Same exceptions as in protected mode.
114 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/FXCH.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
D9 C8+i | 19 |FXCH ST(i) | 20 |Valid | 21 |Valid | 22 |Exchange the contents of ST(0) and ST(i). |
D9 C9 | 25 |FXCH | 26 |Valid | 27 |Valid | 28 |Exchange the contents of ST(0) and ST(1). |
Exchanges the contents of registers ST(0) and ST(i). If no source operand is specified, the contents of ST(0) and ST(1) are exchanged.
31 |This instruction provides a simple means of moving values in the FPU register stack to the top of the stack [ST(0)], so that they can be operated on by those floating-point instructions that can only operate on values in ST(0). For example, the following instruction sequence takes the square root of the third register from the top of the register stack:
32 |FXCH ST(3);
33 |FSQRT;
34 |FXCH ST(3);
35 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
36 |IF (Number-of-operands) is 1 38 | THEN 39 | temp ← ST(0); 40 | ST(0) ← SRC; 41 | SRC ← temp; 42 | ELSE 43 | temp ← ST(0); 44 | ST(0) ← ST(1); 45 | ST(1) ← temp; 46 | FI;47 |
C1 | 51 |Set to 0. |
C0, C2, C3 | 54 |Undefined. |
#IS | 59 |Stack underflow occurred. |
#NM | 64 |CR0.EM[bit 2] or CR0.TS[bit 3] = 1. |
#MF | 67 |If there is a pending x87 FPU exception. |
#UD | 70 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
73 |Same exceptions as in protected mode.
75 |Same exceptions as in protected mode.
77 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/HLT.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |Op/En | 15 |64-Bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
F4 | 20 |HLT | 21 |NP | 22 |Valid | 23 |Valid | 24 |Halt |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Stops instruction execution and places the processor in a HALT state. An enabled interrupt (including NMI and SMI), a debug exception, the BINIT# signal, the INIT# signal, or the RESET# signal will resume execution. If an interrupt (including NMI) is used to resume execution after a HLT instruction, the saved instruction pointer (CS:EIP) points to the instruction following the HLT instruction.
41 |When a HLT instruction is executed on an Intel 64 or IA-32 processor supporting Intel Hyper-Threading Technology, only the logical processor that executes the instruction is halted. The other logical processors in the physical processor remain active, unless they are each individually halted by executing a HLT instruction.
42 |The HLT instruction is a privileged instruction. When the processor is running in protected or virtual-8086 mode, the privilege level of a program or procedure must be 0 to execute the HLT instruction.
43 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
44 |Enter Halt state;46 |
None
48 |#GP(0) | 52 |If the current privilege level is not 0. |
#UD | 55 |If the LOCK prefix is used. |
None.
58 |Same exceptions as in protected mode.
60 |Same exceptions as in protected mode.
62 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/KADDW_KADDB_KADDQ_KADDD.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.L1.0F.W0 4A /r KADDW k1, k2, k3 | 19 |RVR | 20 |V/V | 21 |AVX512DQ | 22 |Add 16 bits masks in k2 and k3 and place result in k1. |
VEX.L1.66.0F.W0 4A /r KADDB k1, k2, k3 | 25 |RVR | 26 |V/V | 27 |AVX512DQ | 28 |Add 8 bits masks in k2 and k3 and place result in k1. |
VEX.L1.0F.W1 4A /r KADDQ k1, k2, k3 | 31 |RVR | 32 |V/V | 33 |AVX512BW | 34 |Add 64 bits masks in k2 and k3 and place result in k1. |
VEX.L1.66.0F.W1 4A /r KADDD k1, k2, k3 | 37 |RVR | 38 |V/V | 39 |AVX512BW | 40 |Add 32 bits masks in k2 and k3 and place result in k1. |
Op/En
43 |Operand 1
44 |Operand 2
45 |Operand 3
46 |RVR
47 |ModRM:reg (w)
48 |VEX.1vvv (r)
49 |ModRM:r/m (r, ModRM:[7:6] must be 11b)
50 |Adds the vector mask k2 and the vector mask k3, and writes the result into vector mask k1.
52 |KADDW
54 |DEST[15:0] (cid:197) SRC1[15:0] + SRC2[15:0] 55 | DEST[MAX_KL-1:16] (cid:197) 056 |
KADDB
57 |DEST[7:0] (cid:197) SRC1[7:0] + SRC2[7:0] 58 | DEST[MAX_KL-1:8] (cid:197) 059 |
KADDQ
60 |DEST[63:0] (cid:197) SRC1[63:0] + SRC2[63:0] 61 | DEST[MAX_KL-1:64] (cid:197) 062 |
KADDD
63 |DEST[31:0] (cid:197) SRC1[31:0] + SRC2[31:0] 64 | DEST[MAX_KL-1:32] (cid:197) 065 |
None
68 |See Exceptions Type K20.
-------------------------------------------------------------------------------- /html/KANDNW_KANDNB_KANDNQ_KANDND.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.NDS.L1.0F.W0 42 /r KANDNW k1, k2, k3 | 19 |RVR | 20 |V/V | 21 |AVX512F | 22 |Bitwise AND NOT 16 bits masks k2 and k3 and place result in k1. |
VEX.L1.66.0F.W0 42 /r KANDNB k1, k2, k3 | 25 |RVR | 26 |V/V | 27 |AVX512DQ | 28 |Bitwise AND NOT 8 bits masks k1 and k2 and place result in k1. |
VEX.L1.0F.W1 42 /r KANDNQ k1, k2, k3 | 31 |RVR | 32 |V/V | 33 |AVX512BW | 34 |Bitwise AND NOT 64 bits masks k2 and k3 and place result in k1. |
VEX.L1.66.0F.W1 42 /r KANDND k1, k2, k3 | 37 |RVR | 38 |V/V | 39 |AVX512BW | 40 |Bitwise AND NOT 32 bits masks k2 and k3 and place result in k1. |
Op/En | 45 |Operand 1 | 46 |Operand 2 | 47 |Operand 3 |
RVR | 50 |ModRM:reg (w) | 51 |VEX.1vvv (r) | 52 |ModRM:r/m (r, ModRM:[7:6] must be 11b) |
Performs a bitwise AND NOT between the vector mask k2 and the vector mask k3, and writes the result into vector mask k1.
55 |KANDNW
57 |DEST[15:0] (cid:197) (BITWISE NOT SRC1[15:0]) BITWISE AND SRC2[15:0] 58 | DEST[MAX_KL-1:16] (cid:197) 059 |
KANDNB
60 |DEST[7:0] (cid:197) (BITWISE NOT SRC1[7:0]) BITWISE AND SRC2[7:0] 61 | DEST[MAX_KL-1:8] (cid:197) 062 |
KANDNQ
63 |DEST[63:0] (cid:197) (BITWISE NOT SRC1[63:0]) BITWISE AND SRC2[63:0] 64 | DEST[MAX_KL-1:64] (cid:197) 065 |
KANDND
66 |DEST[31:0] (cid:197) (BITWISE NOT SRC1[31:0]) BITWISE AND SRC2[31:0] 67 | DEST[MAX_KL-1:32] (cid:197) 068 |
KANDNW __mmask16 _mm512_kandn(__mmask16 a, __mmask16 b);
70 |None
72 |None
74 |See Exceptions Type K20.
-------------------------------------------------------------------------------- /html/KANDW_KANDB_KANDQ_KANDD.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.NDS.L1.0F.W0 41 /r KANDW k1, k2, k3 | 19 |RVR | 20 |V/V | 21 |AVX512F | 22 |Bitwise AND 16 bits masks k2 and k3 and place result in k1. |
VEX.L1.66.0F.W0 41 /r KANDB k1, k2, k3 | 25 |RVR | 26 |V/V | 27 |AVX512DQ | 28 |Bitwise AND 8 bits masks k2 and k3 and place result in k1. |
VEX.L1.0F.W1 41 /r KANDQ k1, k2, k3 | 31 |RVR | 32 |V/V | 33 |AVX512BW | 34 |Bitwise AND 64 bits masks k2 and k3 and place result in k1. |
VEX.L1.66.0F.W1 41 /r KANDD k1, k2, k3 | 37 |RVR | 38 |V/V | 39 |AVX512BW | 40 |Bitwise AND 32 bits masks k2 and k3 and place result in k1. |
Op/En | 45 |Operand 1 | 46 |Operand 2 | 47 |Operand 3 |
RVR | 50 |ModRM:reg (w) | 51 |VEX.1vvv (r) | 52 |ModRM:r/m (r, ModRM:[7:6] must be 11b) |
Performs a bitwise AND between the vector mask k2 and the vector mask k3, and writes the result into vector mask k1.
55 |KANDW
57 |DEST[15:0] (cid:197) SRC1[15:0] BITWISE AND SRC2[15:0] 58 | DEST[MAX_KL-1:16] (cid:197) 059 |
KANDB
60 |DEST[7:0] (cid:197) SRC1[7:0] BITWISE AND SRC2[7:0] 61 | DEST[MAX_KL-1:8] (cid:197) 062 |
KANDQ
63 |DEST[63:0] (cid:197) SRC1[63:0] BITWISE AND SRC2[63:0] 64 | DEST[MAX_KL-1:64] (cid:197) 065 |
KANDD
66 |DEST[31:0] (cid:197) SRC1[31:0] BITWISE AND SRC2[31:0] 67 | DEST[MAX_KL-1:32] (cid:197) 068 |
KANDW __mmask16 _mm512_kand(__mmask16 a, __mmask16 b);
70 |None
72 |None
74 |See Exceptions Type K20.
-------------------------------------------------------------------------------- /html/KNOTW_KNOTB_KNOTQ_KNOTD.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.L0.0F.W0 44 /r KNOTW k1, k2 | 19 |RR | 20 |V/V | 21 |AVX512F | 22 |Bitwise NOT of 16 bits mask k2. |
VEX.L0.66.0F.W0 44 /r KNOTB k1, k2 | 25 |RR | 26 |V/V | 27 |AVX512DQ | 28 |Bitwise NOT of 8 bits mask k2. |
VEX.L0.0F.W1 44 /r KNOTQ k1, k2 | 31 |RR | 32 |V/V | 33 |AVX512BW | 34 |Bitwise NOT of 64 bits mask k2. |
VEX.L0.66.0F.W1 44 /r KNOTD k1, k2 | 37 |RR | 38 |V/V | 39 |AVX512BW | 40 |Bitwise NOT of 32 bits mask k2. |
Op/En | 45 |Operand 1 | 46 |Operand 2 |
RR | 49 |ModRM:reg (w) | 50 |ModRM:r/m (r, ModRM:[7:6] must be 11b) |
Performs a bitwise NOT of vector mask k2 and writes the result into vector mask k1.
53 |KNOTW
55 |DEST[15:0] (cid:197) BITWISE NOT SRC[15:0] 56 | DEST[MAX_KL-1:16] (cid:197) 057 |
KNOTB
58 |DEST[7:0] (cid:197) BITWISE NOT SRC[7:0] 59 | DEST[MAX_KL-1:8] (cid:197) 060 |
KNOTQ
61 |DEST[63:0] (cid:197) BITWISE NOT SRC[63:0] 62 | DEST[MAX_KL-1:64] (cid:197) 063 |
KNOTD
64 |DEST[31:0] (cid:197) BITWISE NOT SRC[31:0] 65 | DEST[MAX_KL-1:32] (cid:197) 066 |
KNOTW __mmask16 _mm512_knot(__mmask16 a);
68 |None
70 |None
72 |See Exceptions Type K20.
-------------------------------------------------------------------------------- /html/KORW_KORB_KORQ_KORD.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.NDS.L1.0F.W0 45 /r KORW k1, k2, k3 | 19 |RVR | 20 |V/V | 21 |AVX512F | 22 |Bitwise OR 16 bits masks k2 and k3 and place result in k1. |
VEX.L1.66.0F.W0 45 /r KORB k1, k2, k3 | 25 |RVR | 26 |V/V | 27 |AVX512DQ | 28 |Bitwise OR 8 bits masks k2 and k3 and place result in k1. |
VEX.L1.0F.W1 45 /r KORQ k1, k2, k3 | 31 |RVR | 32 |V/V | 33 |AVX512BW | 34 |Bitwise OR 64 bits masks k2 and k3 and place result in k1. |
VEX.L1.66.0F.W1 45 /r KORD k1, k2, k3 | 37 |RVR | 38 |V/V | 39 |AVX512BW | 40 |Bitwise OR 32 bits masks k2 and k3 and place result in k1. |
Op/En | 45 |Operand 1 | 46 |Operand 2 | 47 |Operand 3 |
RVR | 50 |ModRM:reg (w) | 51 |VEX.1vvv (r) | 52 |ModRM:r/m (r, ModRM:[7:6] must be 11b) |
Performs a bitwise OR between the vector mask k2 and the vector mask k3, and writes the result into vector mask k1 (three-operand form).
55 |KORW
57 |DEST[15:0] (cid:197) SRC1[15:0] BITWISE OR SRC2[15:0] 58 | DEST[MAX_KL-1:16] (cid:197) 059 |
KORB
60 |DEST[7:0] (cid:197) SRC1[7:0] BITWISE OR SRC2[7:0] 61 | DEST[MAX_KL-1:8] (cid:197) 062 |
KORQ
63 |DEST[63:0] (cid:197) SRC1[63:0] BITWISE OR SRC2[63:0] 64 | DEST[MAX_KL-1:64] (cid:197) 065 |
KORD
66 |DEST[31:0] (cid:197) SRC1[31:0] BITWISE OR SRC2[31:0] 67 | DEST[MAX_KL-1:32] (cid:197) 068 |
KORW __mmask16 _mm512_kor(__mmask16 a, __mmask16 b);
70 |None
72 |None
74 |See Exceptions Type K20.
-------------------------------------------------------------------------------- /html/KSHIFTLW_KSHIFTLB_KSHIFTLQ_KSHIFTLD.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.L0.66.0F3A.W1 32 /r KSHIFTLW k1, k2, imm8 | 19 |RRI | 20 |V/V | 21 |AVX512F | 22 |Shift left 16 bits in k2 by immediate and write result in k1. |
VEX.L0.66.0F3A.W0 32 /r KSHIFTLB k1, k2, imm8 | 25 |RRI | 26 |V/V | 27 |AVX512DQ | 28 |Shift left 8 bits in k2 by immediate and write result in k1. |
VEX.L0.66.0F3A.W1 33 /r KSHIFTLQ k1, k2, imm8 | 31 |RRI | 32 |V/V | 33 |AVX512BW | 34 |Shift left 64 bits in k2 by immediate and write result in k1. |
VEX.L0.66.0F3A.W0 33 /r KSHIFTLD k1, k2, imm8 | 37 |RRI | 38 |V/V | 39 |AVX512BW | 40 |Shift left 32 bits in k2 by immediate and write result in k1. |
Op/En | 45 |Operand 1 | 46 |Operand 2 | 47 |Operand 3 |
RRI | 50 |ModRM:reg (w) | 51 |ModRM:r/m (r, ModRM:[7:6] must be 11b) | 52 |Imm8 |
Shifts 8/16/32/64 bits in the second operand (source operand) left by the count specified in immediate byte and place the least significant 8/16/32/64 bits of the result in the destination operand. The higher bits of the destination are zero-extended. The destination is set to zero if the count value is greater than 7 (for byte shift), 15 (for word shift), 31 (for doubleword shift) or 63 (for quadword shift).
55 |KSHIFTLW
57 |COUNT (cid:197) imm8[7:0] 58 | DEST[MAX_KL-1:0] (cid:197) 0 59 | IF COUNT <=15 60 | THEN DEST[15:0] (cid:197) SRC1[15:0] << COUNT; 61 | FI;62 |
KSHIFTLB
63 |COUNT (cid:197) imm8[7:0] 64 | DEST[MAX_KL-1:0] (cid:197) 0 65 | IF COUNT <=7 66 | THEN 67 | DEST[7:0] (cid:197) SRC1[7:0] << COUNT; 68 | FI;69 |
KSHIFTLQ
70 |COUNT (cid:197) imm8[7:0] 71 | DEST[MAX_KL-1:0] (cid:197) 0 72 | IF COUNT <=63 73 | THEN 74 | DEST[63:0] (cid:197) SRC1[63:0] << COUNT; 75 | FI;76 |
KSHIFTLD
77 |COUNT (cid:197) imm8[7:0] 78 | DEST[MAX_KL-1:0] (cid:197) 0 79 | IF COUNT <=31 80 | THEN 81 | DEST[31:0] (cid:197) SRC1[31:0] << COUNT; 82 | FI;83 |
Compiler auto generates KSHIFTLW when needed.
85 |None
87 |None
89 |See Exceptions Type K20.
-------------------------------------------------------------------------------- /html/KSHIFTRW_KSHIFTRB_KSHIFTRQ_KSHIFTRD.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.L0.66.0F3A.W1 30 /r KSHIFTRW k1, k2, imm8 | 19 |RRI | 20 |V/V | 21 |AVX512F | 22 |Shift right 16 bits in k2 by immediate and write result in k1. |
VEX.L0.66.0F3A.W0 30 /r KSHIFTRB k1, k2, imm8 | 25 |RRI | 26 |V/V | 27 |AVX512DQ | 28 |Shift right 8 bits in k2 by immediate and write result in k1. |
VEX.L0.66.0F3A.W1 31 /r KSHIFTRQ k1, k2, imm8 | 31 |RRI | 32 |V/V | 33 |AVX512BW | 34 |Shift right 64 bits in k2 by immediate and write result in k1. |
VEX.L0.66.0F3A.W0 31 /r KSHIFTRD k1, k2, imm8 | 37 |RRI | 38 |V/V | 39 |AVX512BW | 40 |Shift right 32 bits in k2 by immediate and write result in k1. |
Op/En | 45 |Operand 1 | 46 |Operand 2 | 47 |Operand 3 |
RRI | 50 |ModRM:reg (w) | 51 |ModRM:r/m (r, ModRM:[7:6] must be 11b) | 52 |Imm8 |
Shifts 8/16/32/64 bits in the second operand (source operand) right by the count specified in immediate and place the least significant 8/16/32/64 bits of the result in the destination operand. The higher bits of the destination are zero-extended. The destination is set to zero if the count value is greater than 7 (for byte shift), 15 (for word shift), 31 (for doubleword shift) or 63 (for quadword shift).
55 |KSHIFTRW
57 |COUNT (cid:197) imm8[7:0] 58 | DEST[MAX_KL-1:0] (cid:197) 0 59 | IF COUNT <=15 60 | THEN DEST[15:0] (cid:197) SRC1[15:0] >> COUNT; 61 | FI;62 |
KSHIFTRB
63 |COUNT (cid:197) imm8[7:0] 64 | DEST[MAX_KL-1:0] (cid:197) 0 65 | IF COUNT <=7 66 | THEN 67 | DEST[7:0] (cid:197) SRC1[7:0] >> COUNT; 68 | FI;69 |
KSHIFTRQ
70 |COUNT (cid:197) imm8[7:0] 71 | DEST[MAX_KL-1:0] (cid:197) 0 72 | IF COUNT <=63 73 | THEN 74 | DEST[63:0] (cid:197) SRC1[63:0] >> COUNT; 75 | FI;76 |
KSHIFTRD
77 |COUNT (cid:197) imm8[7:0] 78 | DEST[MAX_KL-1:0] (cid:197) 0 79 | IF COUNT <=31 80 | THEN 81 | DEST[31:0] (cid:197) SRC1[31:0] >> COUNT; 82 | FI;83 |
Compiler auto generates KSHIFTRW when needed.
85 |None
87 |None
89 |See Exceptions Type K20.
-------------------------------------------------------------------------------- /html/KUNPCKBW_KUNPCKWD_KUNPCKDQ.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.NDS.L1.66.0F.W0 4B /r KUNPCKBW k1, k2, k3 | 19 |RVR | 20 |V/V | 21 |AVX512F | 22 |Unpack and interleave 8 bits masks in k2 and k3 and write word result in k1. |
VEX.NDS.L1.0F.W0 4B /r KUNPCKWD k1, k2, k3 | 25 |RVR | 26 |V/V | 27 |AVX512BW | 28 |Unpack and interleave 16 bits in k2 and k3 and write double-word result in k1. |
VEX.NDS.L1.0F.W1 4B /r KUNPCKDQ k1, k2, k3 | 31 |RVR | 32 |V/V | 33 |AVX512BW | 34 |Unpack and interleave 32 bits masks in k2 and k3 and write quadword result in k1. |
Op/En | 39 |Operand 1 | 40 |Operand 2 | 41 |Operand 3 |
RVR | 44 |ModRM:reg (w) | 45 |VEX.1vvv (r) | 46 |ModRM:r/m (r, ModRM:[7:6] must be 11b) |
Unpacks the lower 8/16/32 bits of the second and third operands (source operands) into the low part of the first operand (destination operand), starting from the low bytes. The result is zero-extended in the destination.
49 |KUNPCKBW
51 |DEST[7:0] (cid:197) SRC2[7:0] 52 | DEST[15:8] (cid:197) SRC1[7:0] 53 | DEST[MAX_KL-1:16] (cid:197) 054 |
KUNPCKWD
55 |DEST[15:0] (cid:197) SRC2[15:0] 56 | DEST[31:16] (cid:197) SRC1[15:0] DEST[MAX_KL-1:32] (cid:197) 057 |
KUNPCKDQ
58 |DEST[31:0] (cid:197) SRC2[31:0] 59 | DEST[63:32] (cid:197) SRC1[31:0] DEST[MAX_KL-1:64] (cid:197) 060 |
KUNPCKBW __mmask16 _mm512_kunpackb(__mmask16 a, __mmask16 b);
62 |KUNPCKDQ __mmask64 _mm512_kunpackd(__mmask64 a, __mmask64 b);
63 |KUNPCKWD __mmask32 _mm512_kunpackw(__mmask32 a, __mmask32 b);
64 |None
66 |None
68 |See Exceptions Type K20.
-------------------------------------------------------------------------------- /html/KXNORW_KXNORB_KXNORQ_KXNORD.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.NDS.L1.0F.W0 46 /r KXNORW k1, k2, k3 | 19 |RVR | 20 |V/V | 21 |AVX512F | 22 |Bitwise XNOR 16 bits masks k2 and k3 and place result in k1. |
VEX.L1.66.0F.W0 46 /r KXNORB k1, k2, k3 | 25 |RVR | 26 |V/V | 27 |AVX512DQ | 28 |Bitwise XNOR 8 bits masks k2 and k3 and place result in k1. |
VEX.L1.0F.W1 46 /r KXNORQ k1, k2, k3 | 31 |RVR | 32 |V/V | 33 |AVX512BW | 34 |Bitwise XNOR 64 bits masks k2 and k3 and place result in k1. |
VEX.L1.66.0F.W1 46 /r KXNORD k1, k2, k3 | 37 |RVR | 38 |V/V | 39 |AVX512BW | 40 |Bitwise XNOR 32 bits masks k2 and k3 and place result in k1. |
Op/En | 45 |Operand 1 | 46 |Operand 2 | 47 |Operand 3 |
RVR | 50 |ModRM:reg (w) | 51 |VEX.1vvv (r) | 52 |ModRM:r/m (r, ModRM:[7:6] must be 11b) |
Performs a bitwise XNOR between the vector mask k2 and the vector mask k3, and writes the result into vector mask k1 (three-operand form).
55 |KXNORW
57 |DEST[15:0] (cid:197) NOT (SRC1[15:0] BITWISE XOR SRC2[15:0]) 58 | DEST[MAX_KL-1:16] (cid:197) 059 |
KXNORB
60 |DEST[7:0] (cid:197) NOT (SRC1[7:0] BITWISE XOR SRC2[7:0]) 61 | DEST[MAX_KL-1:8] (cid:197) 062 |
KXNORQ
63 |DEST[63:0] (cid:197) NOT (SRC1[63:0] BITWISE XOR SRC2[63:0]) 64 | DEST[MAX_KL-1:64] (cid:197) 065 |
KXNORD
66 |DEST[31:0] (cid:197) NOT (SRC1[31:0] BITWISE XOR SRC2[31:0]) 67 | DEST[MAX_KL-1:32] (cid:197) 068 |
KXNORW __mmask16 _mm512_kxnor(__mmask16 a, __mmask16 b);
70 |None
72 |None
74 |See Exceptions Type K20.
-------------------------------------------------------------------------------- /html/KXORW_KXORB_KXORQ_KXORD.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.NDS.L1.0F.W0 47 /r KXORW k1, k2, k3 | 19 |RVR | 20 |V/V | 21 |AVX512F | 22 |Bitwise XOR 16 bits masks k2 and k3 and place result in k1. |
VEX.L1.66.0F.W0 47 /r KXORB k1, k2, k3 | 25 |RVR | 26 |V/V | 27 |AVX512DQ | 28 |Bitwise XOR 8 bits masks k2 and k3 and place result in k1. |
VEX.L1.0F.W1 47 /r KXORQ k1, k2, k3 | 31 |RVR | 32 |V/V | 33 |AVX512BW | 34 |Bitwise XOR 64 bits masks k2 and k3 and place result in k1. |
VEX.L1.66.0F.W1 47 /r KXORD k1, k2, k3 | 37 |RVR | 38 |V/V | 39 |AVX512BW | 40 |Bitwise XOR 32 bits masks k2 and k3 and place result in k1. |
Op/En | 45 |Operand 1 | 46 |Operand 2 | 47 |Operand 3 |
RVR | 50 |ModRM:reg (w) | 51 |VEX.1vvv (r) | 52 |ModRM:r/m (r, ModRM:[7:6] must be 11b) |
Performs a bitwise XOR between the vector mask k2 and the vector mask k3, and writes the result into vector mask k1 (three-operand form).
55 |KXORW
57 |DEST[15:0] (cid:197) SRC1[15:0] BITWISE XOR SRC2[15:0] 58 | DEST[MAX_KL-1:16] (cid:197) 059 |
KXORB
60 |DEST[7:0] (cid:197) SRC1[7:0] BITWISE XOR SRC2[7:0] 61 | DEST[MAX_KL-1:8] (cid:197) 062 |
KXORQ
63 |DEST[63:0] (cid:197) SRC1[63:0] BITWISE XOR SRC2[63:0] 64 | DEST[MAX_KL-1:64] (cid:197) 065 |
KXORD
66 |DEST[31:0] (cid:197) SRC1[31:0] BITWISE XOR SRC2[31:0] 67 | DEST[MAX_KL-1:32] (cid:197) 068 |
KXORW __mmask16 _mm512_kxor(__mmask16 a, __mmask16 b);
70 |None
72 |None
74 |See Exceptions Type K20.
-------------------------------------------------------------------------------- /html/LAHF.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |Op/En | 15 |64-Bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
9F | 20 |LAHF | 21 |NP | 22 |Invalid* | 23 |Valid | 24 |Load: AH ← EFLAGS(SF:ZF:0:AF:0:PF:1:CF). |
NOTES: *Valid in specific steppings. See Description section.
26 |Op/En | 30 |Operand 1 | 31 |Operand 2 | 32 |Operand 3 | 33 |Operand 4 |
NP | 36 |NA | 37 |NA | 38 |NA | 39 |NA |
This instruction executes as described above in compatibility mode and legacy mode. It is valid in 64-bit mode only if CPUID.80000001H:ECX.LAHF-SAHF[bit 0] = 1.
42 |IF 64-Bit Mode 44 | THEN 45 | IF CPUID.80000001H:ECX.LAHF-SAHF[bit 0] = 1; 46 | THEN AH ← RFLAGS(SF:ZF:0:AF:0:PF:1:CF); ELSE #UD; 47 | FI; 48 | ELSE 49 | AH ← EFLAGS(SF:ZF:0:AF:0:PF:1:CF); 50 | FI;51 |
None. The state of the flags in the EFLAGS register is not affected.
53 |#UD | 57 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
60 |Same exceptions as in protected mode.
62 |Same exceptions as in protected mode.
64 |#UD | 68 |
69 | If CPUID.80000001H:ECX.LAHF-SAHF[bit 0] = 0. 70 |If the LOCK prefix is used. |
Opcode/Instruction | 13 |Op/En | 14 |64/32-bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | F2 0F F0 /r 20 |LDDQU xmm1, mem |
21 | RM | 22 |V/V | 23 |SSE3 | 24 |Load unaligned data from mem and return double quadword in xmm1. |
27 | VEX.128.F2.0F.WIG F0 /r 28 |VLDDQU xmm1, m128 |
29 | RM | 30 |V/V | 31 |AVX | 32 |Load unaligned packed integer values from mem to xmm1. |
35 | VEX.256.F2.0F.WIG F0 /r 36 |VLDDQU ymm1, m256 |
37 | RM | 38 |V/V | 39 |AVX | 40 |Load unaligned packed integer values from mem to ymm1. |
Op/En | 45 |Operand 1 | 46 |Operand 2 | 47 |Operand 3 | 48 |Operand 4 |
RM | 51 |ModRM:reg (w) | 52 |ModRM:r/m (r) | 53 |NA | 54 |NA |
The instruction is functionally similar to (V)MOVDQU ymm/xmm, m256/m128 for loading from memory. That is: 32/16 bytes of data starting at an address specified by the source memory operand (second operand) are fetched from memory and placed in a destination register (first operand). The source operand need not be aligned on a 32/16-byte boundary. Up to 64/32 bytes may be loaded from memory; this is implementation dependent.
57 |This instruction may improve performance relative to (V)MOVDQU if the source operand crosses a cache line boundary. In situations that require the data loaded by (V)LDDQU be modified and stored to the same location, use (V)MOVDQU or (V)MOVDQA instead of (V)LDDQU. To move a double quadword to or from memory locations that are known to be aligned on 16-byte boundaries, use the (V)MOVDQA instruction.
58 |In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
60 |Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD.
61 |LDDQU (128-bit Legacy SSE version)
63 |DEST[127:0] (cid:197) SRC[127:0] 64 | DEST[VLMAX-1:128] (Unmodified)65 |
VLDDQU (VEX.128 encoded version)
66 |DEST[127:0] (cid:197) SRC[127:0] 67 | DEST[VLMAX-1:128] (cid:197) 068 |
VLDDQU (VEX.256 encoded version)
69 |DEST[255:0] (cid:197) SRC[255:0]70 |
LDDQU:
72 |__m128i _mm_lddqu_si128 (__m128i * p);
73 |VLDDQU: __m256i _mm256_lddqu_si256 (__m256i * p);
74 |None
76 |See Exceptions Type 4;
78 |Note treatment of #AC varies.
-------------------------------------------------------------------------------- /html/LDMXCSR.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32-bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | 0F AE /2 20 |LDMXCSR m32 |
21 | M | 22 |V/V | 23 |SSE | 24 |Load MXCSR register from m32. |
27 | VEX.LZ.0F.WIG AE /2 28 |VLDMXCSR m32 |
29 | M | 30 |V/V | 31 |AVX | 32 |Load MXCSR register from m32. |
Op/En | 37 |Operand 1 | 38 |Operand 2 | 39 |Operand 3 | 40 |Operand 4 |
M | 43 |ModRM:r/m (r) | 44 |NA | 45 |NA | 46 |NA |
Loads the source operand into the MXCSR control/status register. The source operand is a 32-bit memory location. See “MXCSR Control and Status Register” in Chapter 10, of the Intel® 64 and IA-32 Architectures Software Devel-oper’s Manual, Volume 1, for a description of the MXCSR register and its contents.
49 |The LDMXCSR instruction is typically used in conjunction with the (V)STMXCSR instruction, which stores the contents of the MXCSR register in memory.
50 |The default MXCSR value at reset is 1F80H.
51 |If a (V)LDMXCSR instruction clears a SIMD floating-point exception mask bit and sets the corresponding exception flag bit, a SIMD floating-point exception will not be immediately generated. The exception will be generated only upon the execution of the next instruction that meets both conditions below:
52 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
53 |If VLDMXCSR is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an #UD exception.
54 |Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.
55 |MXCSR ← m32;57 |
_mm_setcsr(unsigned int i)
59 |None
61 |See Exceptions Type 5; additionally
63 |#GP | 66 |For an attempt to set reserved bits in MXCSR. |
#UD | 69 |If VEX.vvvv ≠ 1111B. |
Opcode | 13 |Instruction | 14 |Op/En | 15 |64-Bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
0F AE E8 | 20 |LFENCE | 21 |NP | 22 |Valid | 23 |Valid | 24 |Serializes load operations. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Performs a serializing operation on all load-from-memory instructions that were issued prior the LFENCE instruc-tion. Specifically, LFENCE does not execute until all prior instructions have completed locally, and no later instruc-tion begins execution until LFENCE completes. In particular, an instruction that loads from memory and that precedes an LFENCE receives data from memory prior to completion of the LFENCE. (An LFENCE that follows an instruction that stores to memory might complete before the data being stored have become globally visible.) Instructions following an LFENCE may be fetched from memory before the LFENCE, but they will not execute until the LFENCE completes.
41 |Weakly ordered memory types can be used to achieve higher processor performance through such techniques as out-of-order issue and speculative reads. The degree to which a consumer of data recognizes or knows that the data is weakly ordered varies among applications and may be unknown to the producer of this data. The LFENCE instruction provides a performance-efficient way of ensuring load ordering between routines that produce weakly-ordered results and routines that consume that data.
42 |Processors are free to fetch and cache data speculatively from regions of system memory that use the WB, WC, and WT memory types. This speculative fetching can occur at any time and is not tied to instruction execution. Thus, it is not ordered with respect to executions of the LFENCE instruction; data can be brought into the caches specula-tively just before, during, or after the execution of an LFENCE instruction.
43 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
44 |Specification of the instruction's opcode above indicates a ModR/M byte of E8. For this instruction, the processor ignores the r/m field of the ModR/M byte. Thus, LFENCE is encoded by any opcode of the form 0F AE Ex, where x is in the range 8-F.
45 |Wait_On_Following_Instructions_Until(preceding_instructions_complete);47 |
void _mm_lfence(void)
49 |#UD
51 |If CPUID.01H:EDX.SSE2[bit 26] = 0.
52 |If the LOCK prefix is used.
-------------------------------------------------------------------------------- /html/MOVDQ2Q.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |Op/En | 15 |64-Bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
F2 0F D6 /r | 20 |MOVDQ2Q mm, xmm | 21 |RM | 22 |Valid | 23 |Valid | 24 |Move low quadword from xmm to mmx register. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
RM | 35 |ModRM:reg (w) | 36 |ModRM:r/m (r) | 37 |NA | 38 |NA |
Moves the low quadword from the source operand (second operand) to the destination operand (first operand). The source operand is an XMM register and the destination operand is an MMX technology register.
41 |This instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the MOVDQ2Q instruction is executed.
42 |In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
43 |DEST ← SRC[63:0];45 |
MOVDQ2Q:
47 |__m64 _mm_movepi64_pi64 ( __m128i a)
48 |None.
50 |#NM | 54 |If CR0.TS[bit 3] = 1. |
#UD | 57 |
58 | If CR0.EM[bit 2] = 1. 59 |If CR4.OSFXSR[bit 9] = 0. 60 |If CPUID.01H:EDX.SSE2[bit 26] = 0. 61 |If the LOCK prefix is used. |
#MF | 64 |If there is a pending x87 FPU exception. |
Same exceptions as in protected mode.
67 |Same exceptions as in protected mode.
69 |Same exceptions as in protected mode.
71 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/MOVNTQ.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |Op/En | 15 |64-Bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
0F E7 /r | 20 |MOVNTQ m64, mm | 21 |MR | 22 |Valid | 23 |Valid | 24 |Move quadword from mm to m64 using non-temporal hint. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
MR | 35 |ModRM:r/m (w) | 36 |ModRM:reg (r) | 37 |NA | 38 |NA |
Moves the quadword in the source operand (second operand) to the destination operand (first operand) using a non-temporal hint to minimize cache pollution during the write to memory. The source operand is an MMX tech-nology register, which is assumed to contain packed integer data (packed bytes, words, or doublewords). The destination operand is a 64-bit memory location.
41 |The non-temporal hint is implemented by using a write combining (WC) memory type protocol when writing the data to memory. Using this protocol, the processor does not write the data into the cache hierarchy, nor does it fetch the corresponding cache line from memory into the cache hierarchy. The memory type of the region being written to can override the non-temporal hint, if the memory address specified for the non-temporal store is in an uncacheable (UC) or write protected (WP) memory region. For more information on non-temporal stores, see “Caching of Temporal vs. Non-Temporal Data” in Chapter 10 in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1.
42 |Because the WC protocol uses a weakly-ordered memory consistency model, a fencing operation implemented with the SFENCE or MFENCE instruction should be used in conjunction with MOVNTQ instructions if multiple processors might use different memory types to read/write the destination memory locations.
43 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
44 |DEST ← SRC;46 |
MOVNTQ:
48 |void _mm_stream_pi(__m64 * p, __m64 a)
49 |None.
51 |See Table 22-8, “Exception Conditions for Legacy SIMD/MMX Instructions without FP Exception,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A.
-------------------------------------------------------------------------------- /html/MOVQ2DQ.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |Op/En | 15 |64-Bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
F3 0F D6 /r | 20 |MOVQ2DQ xmm, mm | 21 |RM | 22 |Valid | 23 |Valid | 24 |Move quadword from mmx to low quadword of xmm. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
RM | 35 |ModRM:reg (w) | 36 |ModRM:r/m (r) | 37 |NA | 38 |NA |
Moves the quadword from the source operand (second operand) to the low quadword of the destination operand (first operand). The source operand is an MMX technology register and the destination operand is an XMM register.
41 |This instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all 0s [valid]). If this instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before the MOVQ2DQ instruction is executed.
42 |In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15).
43 |DEST[63:0] ← SRC[63:0]; 45 | DEST[127:64] ← 00000000000000000H;46 |
MOVQ2DQ:
48 |__128i _mm_movpi64_pi64 ( __m64 a)
49 |None.
51 |#NM | 55 |If CR0.TS[bit 3] = 1. |
#UD | 58 |
59 | If CR0.EM[bit 2] = 1. 60 |If CR4.OSFXSR[bit 9] = 0. 61 |If CPUID.01H:EDX.SSE2[bit 26] = 0. 62 |If the LOCK prefix is used. |
#MF | 65 |If there is a pending x87 FPU exception. |
Same exceptions as in protected mode.
68 |Same exceptions as in protected mode.
70 |Same exceptions as in protected mode.
72 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/MULX.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 -bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | VEX.NDD.LZ.F2.0F38.W0 F6 /r 20 |MULX r32a, r32b, r/m32 |
21 | RVM | 22 |V/V | 23 |BMI2 | 24 |Unsigned multiply of r/m32 with EDX without affecting arithmetic flags. |
27 | VEX.NDD.LZ.F2.0F38.W1 F6 /r 28 |MULX r64a, r64b, r/m64 |
29 | RVM | 30 |V/N.E. | 31 |BMI2 | 32 |Unsigned multiply of r/m64 with RDX without affecting arithmetic flags. |
37 | Op/En 38 |RVM |
39 |
40 | Operand 1 41 |ModRM:reg (w) |
42 |
43 | Operand 2 44 |VEX.vvvv (w) |
45 |
46 | Operand 3 47 |ModRM:r/m (r) |
48 |
49 | Operand 4 50 |RDX/EDX is implied 64/32 bits 51 |source |
Performs an unsigned multiplication of the implicit source operand (EDX/RDX) and the specified source operand (the third operand) and stores the low half of the result in the second destination (second operand), the high half of the result in the first destination operand (first operand), without reading or writing the arithmetic flags. This enables efficient programming where the software can interleave add with carry operations and multiplications.
54 |If the first and second operand are identical, it will contain the high half of the multiplication result.
55 |This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.
56 |// DEST1: ModRM:reg 58 | // DEST2: VEX.vvvv 59 | IF (OperandSize = 32) 60 | SRC1 ← EDX; 61 | DEST2 ← (SRC1*SRC2)[31:0]; 62 | DEST1 ← (SRC1*SRC2)[63:32]; 63 | ELSE IF (OperandSize = 64) 64 | SRC1 ← RDX; 65 | DEST2 ← (SRC1*SRC2)[63:0]; 66 | DEST1 ← (SRC1*SRC2)[127:64]; 67 | FI68 |
None
70 |Auto-generated from high-level language when possible.
72 |unsigned int mulx_u32(unsigned int a, unsigned int b, unsigned int * hi);
73 |unsigned __int64 mulx_u64(unsigned __int64 a, unsigned __int64 b, unsigned __int64 * hi);
74 |None
76 |See Section 2.5.1, “Exception Conditions for VEX-Encoded GPR Instructions”, Table 2-29; additionally
78 |#UD | 81 |If VEX.W = 1. |
Opcode | 13 |Instruction | 14 |Op/En | 15 |64-Bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
F3 90 | 20 |PAUSE | 21 |NP | 22 |Valid | 23 |Valid | 24 |Gives hint to processor that improves performance of spin-wait loops. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Improves the performance of spin-wait loops. When executing a “spin-wait loop,” processors will suffer a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops.
41 |An additional function of the PAUSE instruction is to reduce the power consumed by a processor while executing a spin loop. A processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor’s power consumption.
42 |This instruction was introduced in the Pentium 4 processors, but is backward compatible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP instruction. The Pentium 4 and Intel Xeon processors implement the PAUSE instruction as a delay. The delay is finite and can be zero for some processors. This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying no-op operation).
43 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
44 |Execute_Next_Instruction(DELAY);46 |
None.
48 |#UD | 52 |If the LOCK prefix is used. |
Opcode/Instruction | 13 |Op/En | 14 |64-Bit Mode | 15 |Compat/Leg Mode | 16 |Description |
---|---|---|---|---|
19 | 0F 70 /r ib 20 |PSHUFW mm1, mm2/m64, imm8 |
21 | RMI | 22 |Valid | 23 |Valid | 24 |Shuffle the words in mm2/m64 based on the encoding in imm8 and store the result in mm1. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
RMI | 35 |ModRM:reg (w) | 36 |ModRM:r/m (r) | 37 |imm8 | 38 |NA |
Copies words from the source operand (second operand) and inserts them in the destination operand (first operand) at word locations selected with the order operand (third operand). This operation is similar to the opera-tion used by the PSHUFD instruction, which is illustrated in Figure 4-16. For the PSHUFW instruction, each 2-bit field in the order operand selects the contents of one word location in the destination operand. The encodings of the order operand fields select words from the source operand to be copied to the destination operand.
41 |The source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an MMX technology register. The order operand is an 8-bit immediate. Note that this instruction permits a word in the source operand to be copied to more than one word location in the destination operand.
42 |In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers (XMM8-XMM15).
43 |DEST[15:0] ← (SRC >> (ORDER[1:0] * 16))[15:0]; 45 | DEST[31:16] ← (SRC >> (ORDER[3:2] * 16))[15:0]; 46 | DEST[47:32] ← (SRC >> (ORDER[5:4] * 16))[15:0]; 47 | DEST[63:48] ← (SRC >> (ORDER[7:6] * 16))[15:0];48 |
PSHUFW:
50 |__m64 _mm_shuffle_pi16(__m64 a, int n)
51 |None.
53 |None.
55 |See Table 22-7, “Exception Conditions for SIMD/MMX Instructions with Memory Reference,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3A.
-------------------------------------------------------------------------------- /html/RDPID.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32-bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
F3 0F C7 /7 RDPID r32 | 19 |M | 20 |N.E./V | 21 |RDPID | 22 |Read IA32_TSC_AUX into r32. |
F3 0F C7 /7 RDPID r64 | 25 |M | 26 |V/N.E. | 27 |RDPID | 28 |Read IA32_TSC_AUX into r64. |
Op/En | 33 |Operand 1 | 34 |Operand 2 | 35 |Operand 3 | 36 |Operand 4 |
M | 39 |ModRM:r/m (w) | 40 |NA | 41 |NA | 42 |NA |
Reads the value of the IA32_TSC_AUX MSR (address C0000103H) into the destination register. The value of CS.D and operand-size prefixes (66H and REX.W) do not affect the behavior of the RDPID instruction.
45 |DEST ← IA32_TSC_AUX47 |
None.
49 |#UD | 53 |
54 | If the LOCK prefix is used. 55 |If the F2 prefix is used. 56 |If CPUID.7H.0:ECX.RDPID[bit 22] = 0. |
Same exceptions as in protected mode.
59 |Same exceptions as in protected mode.
61 |Same exceptions as in protected mode.
63 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/RDPKRU.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode* | 13 |Instruction | 14 |Op/En | 15 |64/32bit Mode Support | 16 |CPUID Feature Flag | 17 |Description |
---|---|---|---|---|---|
0F 01 EE | 20 |RDPKRU | 21 |NP | 22 |V/V | 23 |OSPKE | 24 |Reads PKRU into EAX. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Reads the value of PKRU into EAX and clears EDX. ECX must be 0 when RDPKRU is executed; otherwise, a general-protection exception (#GP) occurs.
41 |RDPKRU can be executed only if CR4.PKE = 1; otherwise, an invalid-opcode exception (#UD) occurs. Software can discover the value of CR4.PKE by examining CPUID.(EAX=07H,ECX=0H):ECX.OSPKE [bit 4].
42 |On processors that support the Intel 64 Architecture, the high-order 32-bits of RCX are ignored and the high-order 32-bits of RDX and RAX are cleared.
43 |IF (ECX = 0) 45 | THEN 46 | EAX ← PKRU; 47 | EDX ← 0; 48 | ELSE #GP(0); 49 | FI;50 |
None.
52 |RDPKRU:
54 |uint32_t _rdpkru_u32(void);
55 |#GP(0) | 59 |If ECX ≠ 0 |
#UD | 62 |
63 | If the LOCK prefix is used. 64 |If CR4.PKE = 0. |
Same exceptions as in protected mode.
67 |Same exceptions as in protected mode.
69 |Same exceptions as in protected mode.
71 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/RORX.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 -bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | VEX.LZ.F2.0F3A.W0 F0 /r ib 20 |RORX r32, r/m32, imm8 |
21 | RMI | 22 |V/V | 23 |BMI2 | 24 |Rotate 32-bit r/m32 right imm8 times without affecting arithmetic flags. |
27 | VEX.LZ.F2.0F3A.W1 F0 /r ib 28 |RORX r64, r/m64, imm8 |
29 | RMI | 30 |V/N.E. | 31 |BMI2 | 32 |Rotate 64-bit r/m64 right imm8 times without affecting arithmetic flags. |
Op/En | 37 |Operand 1 | 38 |Operand 2 | 39 |Operand 3 | 40 |Operand 4 |
RMI | 43 |ModRM:reg (w) | 44 |ModRM:r/m (r) | 45 |Imm8 | 46 |NA |
Rotates the bits of second operand right by the count value specified in imm8 without affecting arithmetic flags. The RORX instruction does not read or write the arithmetic flags.
49 |This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit mode. In 64-bit mode operand size 64 requires VEX.W1. VEX.W1 is ignored in non-64-bit modes. An attempt to execute this instruction with VEX.L not equal to 0 will cause #UD.
50 |IF (OperandSize = 32) 52 | y ← imm8 AND 1FH; 53 | DEST ← (SRC >> y) | (SRC << (32-y)); 54 | ELSEIF (OperandSize = 64 ) 55 | y ← imm8 AND 3FH; 56 | DEST ← (SRC >> y) | (SRC << (64-y)); 57 | ENDIF58 |
None
60 |Auto-generated from high-level language.
62 |None
64 |See Section 2.5.1, “Exception Conditions for VEX-Encoded GPR Instructions”, Table 2-29; additionally
66 |#UD | 69 |If VEX.W = 1. |
Opcode* | 13 |Instruction | 14 |Op/En | 15 |64-Bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
0F AA | 20 |RSM | 21 |NP | 22 |Valid | 23 |Valid | 24 |Resume operation of interrupted program. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Returns program control from system management mode (SMM) to the application program or operating-system procedure that was interrupted when the processor received an SMM interrupt. The processor’s state is restored from the dump created upon entering SMM. If the processor detects invalid state information during state restora-tion, it enters the shutdown state. The following invalid information can cause a shutdown:
41 |The contents of the model-specific registers are not affected by a return from SMM.
42 |The SMM state map used by RSM supports resuming processor context for non-64-bit modes and 64-bit mode.
43 |See Chapter 34, “System Management Mode,” in the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 3C, for more information about SMM and the behavior of the RSM instruction.
44 |ReturnFromSMM; 46 | IF (IA-32e mode supported) or (CPUID DisplayFamily_DisplayModel = 06H_0CH ) 47 | THEN 48 | ProcessorState ← Restore(SMMDump(IA-32e SMM STATE MAP)); 49 | Else 50 | ProcessorState ← Restore(SMMDump(Non-32-Bit-Mode SMM STATE MAP)); 51 | FI52 |
All.
54 |#UD | 58 |
59 | If an attempt is made to execute this instruction when the processor is not in SMM. 60 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
63 |Same exceptions as in protected mode.
65 |Same exceptions as in protected mode.
67 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/SAHF.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode* | 13 |Instruction | 14 |Op/En | 15 |64-Bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
9E | 20 |SAHF | 21 |NP | 22 |Invalid* | 23 |Valid | 24 |Loads SF, ZF, AF, PF, and CF from AH into EFLAGS register. |
NOTES: * Valid in specific steppings. See Description section.
26 |Op/En | 30 |Operand 1 | 31 |Operand 2 | 32 |Operand 3 | 33 |Operand 4 |
NP | 36 |NA | 37 |NA | 38 |NA | 39 |NA |
Loads the SF, ZF, AF, PF, and CF flags of the EFLAGS register with values from the corresponding bits in the AH register (bits 7, 6, 4, 2, and 0, respectively). Bits 1, 3, and 5 of register AH are ignored; the corresponding reserved bits (1, 3, and 5) in the EFLAGS register remain as shown in the “Operation” section below.
42 |This instruction executes as described above in compatibility mode and legacy mode. It is valid in 64-bit mode only if CPUID.80000001H:ECX.LAHF-SAHF[bit 0] = 1.
43 |IF IA-64 Mode 45 | THEN 46 | IF CPUID.80000001H.ECX[0] = 1; 47 | THEN 48 | RFLAGS(SF:ZF:0:AF:0:PF:1:CF) ← AH; 49 | ELSE 50 | #UD; 51 | FI 52 | ELSE 53 | EFLAGS(SF:ZF:0:AF:0:PF:1:CF) ← AH; 54 | FI;55 |
The SF, ZF, AF, PF, and CF flags are loaded with values from the AH register. Bits 1, 3, and 5 of the EFLAGS register are unaffected, with the values remaining 1, 0, and 0, respectively.
57 |None.
59 |None.
61 |None.
63 |None.
65 |#UD | 69 |
70 | If CPUID.80000001H.ECX[0] = 0. 71 |If the LOCK prefix is used. |
Opcode* | 13 |Instruction | 14 |Op/En | 15 |64-Bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
0F AE F8 | 20 |SFENCE | 21 |NP | 22 |Valid | 23 |Valid | 24 |Serializes store operations. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Performs a serializing operation on all store-to-memory instructions that were issued prior the SFENCE instruction. This serializing operation guarantees that every store instruction that precedes the SFENCE instruction in program order becomes globally visible before any store instruction that follows the SFENCE instruction. The SFENCE instruction is ordered with respect to store instructions, other SFENCE instructions, any LFENCE and MFENCE instructions, and any serializing instructions (such as the CPUID instruction). It is not ordered with respect to load instructions.
41 |Weakly ordered memory types can be used to achieve higher processor performance through such techniques as out-of-order issue, write-combining, and write-collapsing. The degree to which a consumer of data recognizes or knows that the data is weakly ordered varies among applications and may be unknown to the producer of this data. The SFENCE instruction provides a performance-efficient way of ensuring store ordering between routines that produce weakly-ordered results and routines that consume this data.
42 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
43 |Specification of the instruction's opcode above indicates a ModR/M byte of F8. For this instruction, the processor ignores the r/m field of the ModR/M byte. Thus, SFENCE is encoded by any opcode of the form 0F AE Fx, where x is in the range 8-F.
44 |Wait_On_Following_Stores_Until(preceding_stores_globally_visible);46 |
void _mm_sfence(void)
48 |#UD
50 |If CPUID.01H:EDX.SSE[bit 25] = 0.
51 |If the LOCK prefix is used.
-------------------------------------------------------------------------------- /html/SHA1MSG1.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | 0F 38 C9 /r 20 |SHA1MSG1 xmm1, xmm2/m128 |
21 | RM | 22 |V/V | 23 |SHA | 24 |Performs an intermediate calculation for the next four SHA1 message dwords using previous message dwords from xmm1 and xmm2/m128, storing the result in xmm1. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 |
RM | 34 |ModRM:reg (r, w) | 35 |ModRM:r/m (r) | 36 |NA |
Description
38 |The SHA1MSG1 instruction is one of two SHA1 message scheduling instructions. The instruction performs an inter-mediate calculation for the next four SHA1 message dwords.
39 |Operation
40 |SHA1MSG1
41 |W0 (cid:197) SRC1[127:96] ;
42 |W1 (cid:197) SRC1[95:64] ;
43 |W2 (cid:197) SRC1[63: 32] ;
44 |W3 (cid:197) SRC1[31: 0] ;
45 |W4 (cid:197) SRC2[127:96] ;
46 |W5 (cid:197) SRC2[95:64] ;
47 |DEST[127:96] (cid:197) W2 XOR W0;
48 |DEST[95:64] (cid:197) W3 XOR W1;
49 |DEST[63:32] (cid:197) W4 XOR W2;
50 |DEST[31:0] (cid:197) W5 XOR W3;
51 |Intel C/C++ Compiler Intrinsic Equivalent
52 |SHA1MSG1: __m128i _mm_sha1msg1_epu32(__m128i, __m128i);
53 |Flags Affected
54 |None
55 |SIMD Floating-Point Exceptions
56 |None
57 |Other Exceptions
58 |See Exceptions Type 4.
-------------------------------------------------------------------------------- /html/SHA1MSG2.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | 0F 38 CA /r 20 |SHA1MSG2 xmm1, xmm2/m128 |
21 | RM | 22 |V/V | 23 |SHA | 24 |Performs the final calculation for the next four SHA1 message dwords using intermediate results from xmm1 and the previous message dwords from xmm2/m128, storing the result in xmm1. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 |
RM | 34 |ModRM:reg (r, w) | 35 |ModRM:r/m (r) | 36 |NA |
Description
38 |The SHA1MSG2 instruction is one of two SHA1 message scheduling instructions. The instruction performs the final calculation to derive the next four SHA1 message dwords.
39 |Operation
40 |SHA1MSG2
41 |W13 (cid:197) SRC2[95:64] ;
42 |W14 (cid:197) SRC2[63: 32] ;
43 |W15 (cid:197) SRC2[31: 0] ;
44 |W16 (cid:197) (SRC1[127:96] XOR W13 ) ROL 1;
45 |W17 (cid:197) (SRC1[95:64] XOR W14) ROL 1;
46 |W18 (cid:197) (SRC1[63: 32] XOR W15) ROL 1;
47 |W19 (cid:197) (SRC1[31: 0] XOR W16) ROL 1;
48 |DEST[127:96] (cid:197) W16;
49 |DEST[95:64] (cid:197) W17;
50 |DEST[63:32] (cid:197) W18;
51 |DEST[31:0] (cid:197) W19;
52 |Intel C/C++ Compiler Intrinsic Equivalent
53 |SHA1MSG2: __m128i _mm_sha1msg2_epu32(__m128i, __m128i);
54 |Flags Affected
55 |None
56 |SIMD Floating-Point Exceptions
57 |None
58 |Other Exceptions
59 |See Exceptions Type 4.
-------------------------------------------------------------------------------- /html/SHA1NEXTE.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | 0F 38 C8 /r 20 |SHA1NEXTE xmm1, xmm2/m128 |
21 | RM | 22 |V/V | 23 |SHA | 24 |Calculates SHA1 state variable E after four rounds of operation from the current SHA1 state variable A in xmm1. The calculated value of the SHA1 state variable E is added to the scheduled dwords in xmm2/m128, and stored with some of the scheduled dwords in xmm1. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 |
RM | 34 |ModRM:reg (r, w) | 35 |ModRM:r/m (r) | 36 |NA |
Description
38 |The SHA1NEXTE calculates the SHA1 state variable E after four rounds of operation from the current SHA1 state variable A in the destination operand. The calculated value of the SHA1 state variable E is added to the source operand, which contains the scheduled dwords.
39 |Operation
40 |SHA1NEXTE
41 |TMP (cid:197) (SRC1[127:96] ROL 30);
42 |DEST[127:96] (cid:197) SRC2[127:96] + TMP;
43 |DEST[95:64] (cid:197) SRC2[95:64];
44 |DEST[63:32] (cid:197) SRC2[63:32];
45 |DEST[31:0] (cid:197) SRC2[31:0];
46 |Intel C/C++ Compiler Intrinsic Equivalent
47 |SHA1NEXTE: __m128i _mm_sha1nexte_epu32(__m128i, __m128i);
48 |Flags Affected
49 |None
50 |SIMD Floating-Point Exceptions
51 |None
52 |Other Exceptions
53 |See Exceptions Type 4.
-------------------------------------------------------------------------------- /html/SHA256MSG1.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | 0F 38 CC /r 20 |SHA256MSG1 xmm1, xmm2/m128 |
21 | RM | 22 |V/V | 23 |SHA | 24 |Performs an intermediate calculation for the next four SHA256 message dwords using previous message dwords from xmm1 and xmm2/m128, storing the result in xmm1. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 |
RM | 34 |ModRM:reg (r, w) | 35 |ModRM:r/m (r) | 36 |NA |
Description
38 |The SHA256MSG1 instruction is one of two SHA256 message scheduling instructions. The instruction performs an intermediate calculation for the next four SHA256 message dwords.
39 |Operation
40 |SHA256MSG1
41 |W4 (cid:197) SRC2[31: 0] ;
42 |W3 (cid:197) SRC1[127:96] ;
43 |W2 (cid:197) SRC1[95:64] ;
44 |W1 (cid:197) SRC1[63: 32] ;
45 |W0 (cid:197) SRC1[31: 0] ;
46 |DEST[127:96] (cid:197) W3 + σ0( W4);
47 |DEST[95:64] (cid:197) W2 + σ0( W3);
48 |DEST[63:32] (cid:197) W1 + σ0( W2);
49 |DEST[31:0] (cid:197) W0 + σ0( W1);
50 |Intel C/C++ Compiler Intrinsic Equivalent
51 |SHA256MSG1: __m128i _mm_sha256msg1_epu32(__m128i, __m128i);
52 |Flags Affected
53 |None
54 |SIMD Floating-Point Exceptions
55 |None
56 |Other Exceptions
57 |See Exceptions Type 4.
-------------------------------------------------------------------------------- /html/SHA256MSG2.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | 0F 38 CD /r 20 |SHA256MSG2 xmm1, xmm2/m128 |
21 | RM | 22 |V/V | 23 |SHA | 24 |Performs the final calculation for the next four SHA256 message dwords using previous message dwords from xmm1 and xmm2/m128, storing the result in xmm1. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 |
RM | 34 |ModRM:reg (r, w) | 35 |ModRM:r/m (r) | 36 |NA |
Description
38 |The SHA256MSG2 instruction is one of two SHA2 message scheduling instructions. The instruction performs the final calculation for the next four SHA256 message dwords.
39 |Operation
40 |SHA256MSG2
41 |W14 (cid:197) SRC2[95:64] ;
42 |W15 (cid:197) SRC2[127:96] ;
43 |W16 (cid:197) SRC1[31: 0] + σ1( W14) ;
44 |W17 (cid:197) SRC1[63: 32] + σ1( W15) ;
45 |W18 (cid:197) SRC1[95: 64] + σ1( W16) ;
46 |W19 (cid:197) SRC1[127: 96] + σ1( W17) ;
47 |DEST[127:96] (cid:197) W19 ;
48 |DEST[95:64] (cid:197) W18 ;
49 |DEST[63:32] (cid:197) W17 ;
50 |DEST[31:0] (cid:197) W16;
51 |Intel C/C++ Compiler Intrinsic Equivalent
52 |SHA256MSG2 : __m128i _mm_sha256msg2_epu32(__m128i, __m128i);
53 |Flags Affected
54 |None
55 |SIMD Floating-Point Exceptions
56 |None
57 |Other Exceptions
58 |See Exceptions Type 4.
-------------------------------------------------------------------------------- /html/SHA256RNDS2.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | 0F 38 CB /r 20 |SHA256RNDS2 xmm1, xmm2/m128, <XMM0> |
21 | RM0 | 22 |V/V | 23 |SHA | 24 |Perform 2 rounds of SHA256 operation using an initial SHA256 state (C,D,G,H) from xmm1, an initial SHA256 state (A,B,E,F) from xmm2/m128, and a pre-computed sum of the next 2 round mes-sage dwords and the corresponding round constants from the implicit operand XMM0, storing the updated SHA256 state (A,B,E,F) result in xmm1. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 |
RMI | 34 |ModRM:reg (r, w) | 35 |ModRM:r/m (r) | 36 |Implicit XMM0 (r) |
Description
38 |The SHA256RNDS2 instruction performs 2 rounds of SHA256 operation using an initial SHA256 state (C,D,G,H) from the first operand, an initial SHA256 state (A,B,E,F) from the second operand, and a pre-computed sum of the next 2 round message dwords and the corresponding round constants from the implicit operand xmm0. Note that only the two lower dwords of XMM0 are used by the instruction.
39 |The updated SHA256 state (A,B,E,F) is written to the first operand, and the second operand can be used as the updated state (C,D,G,H) in later rounds.
40 |Operation
41 |SHA256RNDS2
42 |A_0 (cid:197) SRC2[127:96];
43 |B_0 (cid:197) SRC2[95:64];
44 |C_0 (cid:197) SRC1[127:96];
45 |D_0 (cid:197) SRC1[95:64];
46 |E_0 (cid:197) SRC2[63:32];
47 |F_0 (cid:197) SRC2[31:0];
48 |G_0 (cid:197) SRC1[63:32];
49 |H_0 (cid:197) SRC1[31:0];
50 |WK0 (cid:197) XMM0[31: 0];
51 |WK1 (cid:197) XMM0[63: 32];
52 |FOR i = 0 to 1
53 |A_(i +1) (cid:197) Ch (E_i, F_i, G_i) +Σ1( E_i) +WKi+ H_i + Maj(A_i , B_i, C_i) +Σ0( A_i);
54 |B_(i +1) (cid:197) A_i;
55 |C_(i +1) (cid:197) B_i ;
56 |D_(i +1) (cid:197) C_i;
57 |E_(i +1) (cid:197) Ch (E_i, F_i, G_i) +Σ1( E_i) +WKi+ H_i + D_i;
58 |F_(i +1) (cid:197) E_i ;
59 |G_(i +1) (cid:197) F_i;
60 |H_(i +1) (cid:197) G_i;
61 |ENDFOR
62 |DEST[127:96] (cid:197) A_2;
63 |DEST[95:64] (cid:197) B_2;
64 |DEST[63:32] (cid:197) E_2;
65 |DEST[31:0] (cid:197) F_2;
66 |Intel C/C++ Compiler Intrinsic Equivalent
67 |SHA256RNDS2: __m128i _mm_sha256rnds2_epu32(__m128i, __m128i, __m128i);
68 |Flags Affected
69 |None
70 |SIMD Floating-Point Exceptions
71 |None
72 |Other Exceptions
73 |See Exceptions Type 4.
-------------------------------------------------------------------------------- /html/STAC.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |Op/En | 15 |64-bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
0F 01 CB | 20 |STAC | 21 |NP | 22 |Valid | 23 |Valid | 24 |Set the AC flag in the EFLAGS register. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Sets the AC flag bit in EFLAGS register. This may enable alignment checking of user-mode data accesses. This allows explicit supervisor-mode data accesses to user-mode pages even if the SMAP bit is set in the CR4 register.
41 |This instruction's operation is the same in non-64-bit modes and 64-bit mode. Attempts to execute STAC when CPL > 0 cause #UD.
42 |EFLAGS.AC ← 1;44 |
AC set. Other flags are unaffected.
46 |#UD | 50 |
51 | If the LOCK prefix is used. 52 |If the CPL > 0. 53 |If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0. |
#UD | 58 |
59 | If the LOCK prefix is used. 60 |If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0. |
#UD | 65 |The STAC instruction is not recognized in virtual-8086 mode. |
#UD | 70 |
71 | If the LOCK prefix is used. 72 |If the CPL > 0. 73 |If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0. |
#UD | 78 |
79 | If the LOCK prefix is used. 80 |If the CPL > 0. 81 |If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0. |
Opcode | 13 |Instruction | 14 |Op/En | 15 |64-Bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
F9 | 20 |STC | 21 |NP | 22 |Valid | 23 |Valid | 24 |Set CF flag. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Sets the CF flag in the EFLAGS register. Operation is the same in all modes.
41 |CF ← 1;43 |
The CF flag is set. The OF, ZF, SF, AF, and PF flags are unaffected.
45 |#UD
47 |If the LOCK prefix is used.
-------------------------------------------------------------------------------- /html/STD.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |Op/En | 15 |64-bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
FD | 20 |STD | 21 |NP | 22 |Valid | 23 |Valid | 24 |Set DF flag. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Sets the DF flag in the EFLAGS register. When the DF flag is set to 1, string operations decrement the index regis-ters (ESI and/or EDI). Operation is the same in all modes.
41 |DF ← 1;43 |
The DF flag is set. The CF, OF, ZF, SF, AF, and PF flags are unaffected.
45 |#UD
47 |If the LOCK prefix is used.
-------------------------------------------------------------------------------- /html/STMXCSR.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode*/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | 0F AE /3 20 |STMXCSR m32 |
21 | M | 22 |V/V | 23 |SSE | 24 |Store contents of MXCSR register to m32. |
27 | VEX.LZ.0F.WIG AE /3 28 |VSTMXCSR m32 |
29 | M | 30 |V/V | 31 |AVX | 32 |Store contents of MXCSR register to m32. |
Op/En | 37 |Operand 1 | 38 |Operand 2 | 39 |Operand 3 | 40 |Operand 4 |
M | 43 |ModRM:r/m (w) | 44 |NA | 45 |NA | 46 |NA |
Stores the contents of the MXCSR control and status register to the destination operand. The destination operand is a 32-bit memory location. The reserved bits in the MXCSR register are stored as 0s.
49 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
50 |VEX.L must be 0, otherwise instructions will #UD.
51 |Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD.
52 |m32 ← MXCSR;54 |
_mm_getcsr(void)
56 |None.
58 |See Exceptions Type 5; additionally
60 |#UD | 63 |
64 | If VEX.L= 1, 65 |If VEX.vvvv ≠ 1111B. |
Opcode | 13 |Instruction | 14 |Op/En | 15 |64-Bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
0F 0B | 20 |UD2 | 21 |NP | 22 |Valid | 23 |Valid | 24 |Raise invalid opcode exception. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Generates an invalid opcode exception. This instruction is provided for software testing to explicitly generate an invalid opcode exception. The opcode for this instruction is reserved for this purpose.
41 |Other than raising the invalid opcode exception, this instruction has no effect on processor state or memory.
42 |Even though it is the execution of the UD2 instruction that causes the invalid opcode exception, the instruction pointer saved by delivery of the exception references the UD2 instruction (and not the following instruction).
43 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
44 |#UD (* Generates invalid opcode exception *);46 |
None.
48 |#UD
50 |Raises an invalid opcode exception in all operating modes.
-------------------------------------------------------------------------------- /html/VPERM2F128.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
VEX.NDS.256.66.0F3A.W0 06 /r ib VPERM2F128 ymm1, ymm2, ymm3/m256, imm8 | 19 |RVMI | 20 |V/V | 21 |AVX | 22 |Permute 128-bit floating-point fields in ymm2 and ymm3/mem using controls from imm8 and store result in ymm1. |
Op/En | 27 |Operand 1 | 28 |Operand 2 | 29 |Operand 3 | 30 |Operand 4 |
RVMI | 33 |ModRM:reg (w) | 34 |VEX.vvvv (r) | 35 |ModRM:r/m (r) | 36 |imm8 |
Permute 128 bit floating-point-containing fields from the first source operand (second operand) and second source operand (third operand) using bits in the 8-bit immediate and store results in the destination operand (first operand). The first source operand is a YMM register, the second source operand is a YMM register or a 256-bit memory location, and the destination operand is a YMM register.
39 |Y1
40 |Y0
41 |SRC2
42 |X1
43 |X0
44 |SRC1
45 |X0, X1, Y0, or Y1
46 |DEST
47 |X0, X1, Y0, or Y1
48 |Imm8[1:0] select the source for the first destination 128-bit field, imm8[5:4] select the source for the second destination field. If imm8[3] is set, the low 128-bit field is zeroed. If imm8[7] is set, the high 128-bit field is zeroed.
50 |VEX.L must be 1, otherwise the instruction will #UD.
51 |VPERM2F128
53 |CASE IMM8[1:0] of 54 | 0: DEST[127:0] (cid:197) SRC1[127:0] 55 | 1: DEST[127:0] (cid:197) SRC1[255:128] 56 | 2: DEST[127:0] (cid:197) SRC2[127:0] 57 | 3: DEST[127:0] (cid:197) SRC2[255:128] 58 | ESAC 59 | CASE IMM8[5:4] of 60 | 0: DEST[255:128] (cid:197) SRC1[127:0] 61 | 1: DEST[255:128] (cid:197) SRC1[255:128] 62 | 2: DEST[255:128] (cid:197) SRC2[127:0] 63 | 3: DEST[255:128] (cid:197) SRC2[255:128] 64 | ESAC 65 | IF (imm8[3]) 66 | DEST[127:0] (cid:197) 0 67 | FI 68 | IF (imm8[7]) 69 | DEST[VLMAX-1:128] (cid:197) 0 70 | FI71 |
VPERM2F128:
73 |__m256 _mm256_permute2f128_ps (__m256 a, __m256 b, int control)
74 |VPERM2F128:
75 |__m256d _mm256_permute2f128_pd (__m256d a, __m256d b, int control)
76 |VPERM2F128:
77 |__m256i _mm256_permute2f128_si256 (__m256i a, __m256i b, int control)
78 |None.
80 |See Exceptions Type 6; additionally
82 |#UD | 85 |
86 | If VEX.L = 0 87 |If VEX.W = 1. |
Opcode/Instruction | 13 |Op/En | 14 |64/32 -bit Mode | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | VEX.NDS.256.66.0F3A.W0 46 /r ib 20 |VPERM2I128 ymm1, ymm2, ymm3/m256, imm8 |
21 | RVMI | 22 |V/V | 23 |AVX2 | 24 |Permute 128-bit integer data in ymm2 and ymm3/mem using controls from imm8 and store result in ymm1. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
RVMI | 35 |ModRM:reg (w) | 36 |VEX.vvvv | 37 |ModRM:r/m (r) | 38 |Imm8 |
Permute 128 bit integer data from the first source operand (second operand) and second source operand (third operand) using bits in the 8-bit immediate and store results in the destination operand (first operand). The first source operand is a YMM register, the second source operand is a YMM register or a 256-bit memory location, and the destination operand is a YMM register.
41 |Y1
42 |Y0
43 |SRC2
44 |X1
45 |X0
46 |SRC1
47 |X0, X1, Y0, or Y1
48 |DEST
49 |X0, X1, Y0, or Y1
50 |Imm8[1:0] select the source for the first destination 128-bit field, imm8[5:4] select the source for the second destination field. If imm8[3] is set, the low 128-bit field is zeroed. If imm8[7] is set, the high 128-bit field is zeroed.
52 |VEX.L must be 1, otherwise the instruction will #UD.
53 |VPERM2I128
55 |CASE IMM8[1:0] of 56 | 0: DEST[127:0] (cid:197) SRC1[127:0] 57 | 1: DEST[127:0] (cid:197) SRC1[255:128] 58 | 2: DEST[127:0] (cid:197) SRC2[127:0] 59 | 3: DEST[127:0] (cid:197) SRC2[255:128] 60 | ESAC 61 | CASE IMM8[5:4] of 62 | 0: DEST[255:128] (cid:197) SRC1[127:0] 63 | 1: DEST[255:128] (cid:197) SRC1[255:128] 64 | 2: DEST[255:128] (cid:197) SRC2[127:0] 65 | 3: DEST[255:128] (cid:197) SRC2[255:128] 66 | ESAC 67 | IF (imm8[3]) 68 | DEST[127:0] (cid:197) 0 69 | FI 70 | IF (imm8[7]) 71 | DEST[255:128] (cid:197) 0 72 | FI73 |
VPERM2I128: __m256i _mm256_permute2x128_si256 (__m256i a, __m256i b, int control)
75 |None
77 |See Exceptions Type 6; additionally
79 |#UD | 82 |
83 | If VEX.L = 0, 84 |If VEX.W = 1. |
Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | VEX.256.0F.WIG 77 20 |VZEROALL |
21 | NP | 22 |V/V | 23 |AVX | 24 |Zero all YMM registers. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
The instruction zeros contents of all XMM or YMM registers.
41 |Note: VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD. In Compatibility and legacy 32-bit mode only the lower 8 registers are modified.
42 |VZEROALL (VEX.256 encoded version)
44 |IF (64-bit mode) 45 | YMM0[VLMAX-1:0] (cid:197) 0 46 | YMM1[VLMAX-1:0] (cid:197) 0 47 | YMM2[VLMAX-1:0] (cid:197) 0 48 | YMM3[VLMAX-1:0] (cid:197) 0 49 | YMM4[VLMAX-1:0] (cid:197) 0 50 | YMM5[VLMAX-1:0] (cid:197) 0 51 | YMM6[VLMAX-1:0] (cid:197) 0 52 | YMM7[VLMAX-1:0] (cid:197) 0 53 | YMM8[VLMAX-1:0] (cid:197) 0 54 | YMM9[VLMAX-1:0] (cid:197) 0 55 | YMM10[VLMAX-1:0] (cid:197) 0 56 | YMM11[VLMAX-1:0] (cid:197) 0 57 | YMM12[VLMAX-1:0] (cid:197) 0 58 | YMM13[VLMAX-1:0] (cid:197) 0 59 | YMM14[VLMAX-1:0] (cid:197) 0 60 | YMM15[VLMAX-1:0] (cid:197) 0 61 | ELSE 62 | YMM0[VLMAX-1:0] (cid:197) 0 63 | YMM1[VLMAX-1:0] (cid:197) 0 64 | YMM2[VLMAX-1:0] (cid:197) 0 65 | YMM3[VLMAX-1:0] (cid:197) 0 66 | YMM4[VLMAX-1:0] (cid:197) 0 67 | YMM5[VLMAX-1:0] (cid:197) 0 68 | YMM6[VLMAX-1:0] (cid:197) 0 69 | YMM7[VLMAX-1:0] (cid:197) 0 70 | YMM8-15: Unmodified 71 | FI72 |
VZEROALL:
74 |_mm256_zeroall()
75 |None.
77 |See Exceptions Type 8.
-------------------------------------------------------------------------------- /html/VZEROUPPER.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32 bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | VEX.128.0F.WIG 77 20 |VZEROUPPER |
21 | NP | 22 |V/V | 23 |AVX | 24 |Zero upper 128 bits of all YMM registers. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
The instruction zeros the bits in position 128 and higher of all YMM registers. The lower 128-bits of the registers (the corresponding XMM registers) are unmodified.
41 |This instruction is recommended when transitioning between AVX and legacy SSE code - it will eliminate perfor-mance penalties caused by false dependencies.
42 |Note: VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD. In Compatibility and legacy 32-bit mode only the lower 8 registers are modified.
43 |VZEROUPPER
45 |IF (64-bit mode) 46 | YMM0[VLMAX-1:128] (cid:197) 0 47 | YMM1[VLMAX-1:128] (cid:197) 0 48 | YMM2[VLMAX-1:128] (cid:197) 0 49 | YMM3[VLMAX-1:128] (cid:197) 0 50 | YMM4[VLMAX-1:128] (cid:197) 0 51 | YMM5[VLMAX-1:128] (cid:197) 0 52 | YMM6[VLMAX-1:128] (cid:197) 0 53 | YMM7[VLMAX-1:128] (cid:197) 0 54 | YMM8[VLMAX-1:128] (cid:197) 0 55 | YMM9[VLMAX-1:128] (cid:197) 0 56 | YMM10[VLMAX-1:128] (cid:197) 0 57 | YMM11[VLMAX-1:128] (cid:197) 0 58 | YMM12[VLMAX-1:128] (cid:197) 0 59 | YMM13[VLMAX-1:128] (cid:197) 0 60 | YMM14[VLMAX-1:128] (cid:197) 0 61 | YMM15[VLMAX-1:128] (cid:197) 0 62 | ELSE 63 | YMM0[VLMAX-1:128] (cid:197) 0 64 | YMM1[VLMAX-1:128] (cid:197) 0 65 | YMM2[VLMAX-1:128] (cid:197) 0 66 | YMM3[VLMAX-1:128] (cid:197) 0 67 | YMM4[VLMAX-1:128] (cid:197) 0 68 | YMM5[VLMAX-1:128] (cid:197) 0 69 | YMM6[VLMAX-1:128] (cid:197) 0 70 | YMM7[VLMAX-1:128] (cid:197) 0 71 | YMM8-15: unmodified 72 | FI73 |
VZEROUPPER:
75 |_mm256_zeroupper()
76 |None.
78 |See Exceptions Type 8.
-------------------------------------------------------------------------------- /html/WAIT_FWAIT.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode | 13 |Instruction | 14 |Op/En | 15 |64-Bit Mode | 16 |Compat/Leg Mode | 17 |Description |
---|---|---|---|---|---|
9B | 20 |WAIT | 21 |NP | 22 |Valid | 23 |Valid | 24 |Check pending unmasked floating-point exceptions. |
9B | 27 |FWAIT | 28 |NP | 29 |Valid | 30 |Valid | 31 |Check pending unmasked floating-point exceptions. |
Op/En | 36 |Operand 1 | 37 |Operand 2 | 38 |Operand 3 | 39 |Operand 4 |
NP | 42 |NA | 43 |NA | 44 |NA | 45 |NA |
Causes the processor to check for and handle pending, unmasked, floating-point exceptions before proceeding. (FWAIT is an alternate mnemonic for WAIT.)
48 |This instruction is useful for synchronizing exceptions in critical sections of code. Coding a WAIT instruction after a floating-point instruction ensures that any unmasked floating-point exceptions the instruction may raise are handled before the processor can modify the instruction’s results. See the section titled “Floating-Point Exception Synchronization” in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer’s Manual, Volume 1, for more information on using the WAIT/FWAIT instruction.
49 |This instruction’s operation is the same in non-64-bit modes and 64-bit mode.
50 |CheckForPendingUnmaskedFloatingPointExceptions;52 |
The C0, C1, C2, and C3 flags are undefined.
54 |None.
56 |#NM | 60 |If CR0.MP[bit 1] = 1 and CR0.TS[bit 3] = 1. |
#UD | 63 |If the LOCK prefix is used. |
Same exceptions as in protected mode.
66 |Same exceptions as in protected mode.
68 |Same exceptions as in protected mode.
70 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/WRPKRU.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode* | 13 |Instruction | 14 |Op/En | 15 |64/32bit Mode Support | 16 |CPUID Feature Flag | 17 |Description |
---|---|---|---|---|---|
0F 01 EF | 20 |WRPKRU | 21 |NP | 22 |V/V | 23 |OSPKE | 24 |Writes EAX into PKRU. |
Op/En | 29 |Operand 1 | 30 |Operand 2 | 31 |Operand 3 | 32 |Operand 4 |
NP | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
Writes the value of EAX into PKRU. ECX and EDX must be 0 when WRPKRU is executed; otherwise, a general-protection exception (#GP) occurs.
41 |WRPKRU can be executed only if CR4.PKE = 1; otherwise, an invalid-opcode exception (#UD) occurs. Software can discover the value of CR4.PKE by examining CPUID.(EAX=07H,ECX=0H):ECX.OSPKE [bit 4].
42 |On processors that support the Intel 64 Architecture, the high-order 32-bits of RCX, RDX and RAX are ignored.
43 |IF (ECX = 0 AND EDX = 0) 45 | THEN PKRU ← EAX; 46 | ELSE #GP(0); 47 | FI;48 |
None.
50 |WRPKRU:
52 |void _wrpkru(uint32_t);
53 |#GP(0) | 57 |
58 | If ECX ≠ 0. 59 |If EDX ≠ 0. |
#UD | 62 |
63 | If the LOCK prefix is used. 64 |If CR4.PKE = 0. |
Same exceptions as in protected mode.
67 |Same exceptions as in protected mode.
69 |Same exceptions as in protected mode.
71 |Same exceptions as in protected mode.
-------------------------------------------------------------------------------- /html/XABORT.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | 7 |Opcode/Instruction | 13 |Op/En | 14 |64/32bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | C6 F8 ib 20 |XABORT imm8 |
21 | A | 22 |V/V | 23 |RTM | 24 |Causes an RTM abort if in RTM execution |
Op/En | 29 |Operand 1 | 30 |Operand2 | 31 |Operand3 | 32 |Operand4 |
A | 35 |imm8 | 36 |NA | 37 |NA | 38 |NA |
XABORT forces an RTM abort. Following an RTM abort, the logical processor resumes execution at the fallback address computed through the outermost XBEGIN instruction. The EAX register is updated to reflect an XABORT instruction caused the abort, and the imm8 argument will be provided in bits 31:24 of EAX.
41 |XABORT
43 |IF RTM_ACTIVE = 0 44 | THEN 45 | Treat as NOP; 46 | ELSE 47 | GOTO RTM_ABORT_PROCESSING; 48 | FI; 49 | (* For any RTM abort condition encountered during RTM execution *) 50 | RTM_ABORT_PROCESSING: 51 | Restore architectural register state; 52 | Discard memory updates performed in transaction; 53 | Update EAX with status and XABORT argument; 54 | RTM_NEST_COUNT ← 0; 55 | RTM_ACTIVE ← 0; 56 | IF 64-bit Mode 57 | THEN 58 | RIP ← fallbackRIP; 59 | ELSE 60 | EIP ← fallbackEIP; 61 | FI; 62 | END63 |
None
65 |XABORT:
67 |void _xabort( unsigned int);
68 |None
70 |#UD | 74 |
75 | CPUID.(EAX=7, ECX=0):EBX.RTM[bit 11] = 0. 76 |If LOCK prefix is used. |
Opcode/Instruction | 13 |Op/En | 14 |64/32bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | 0F 01 D5 20 |XEND |
21 | A | 22 |V/V | 23 |RTM | 24 |Specifies the end of an RTM code region. |
Op/En | 29 |Operand 1 | 30 |Operand2 | 31 |Operand3 | 32 |Operand4 |
A | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
The instruction marks the end of an RTM code region. If this corresponds to the outermost scope (that is, including this XEND instruction, the number of XBEGIN instructions is the same as number of XEND instructions), the logical processor will attempt to commit the logical processor state atomically. If the commit fails, the logical processor will rollback all architectural register and memory updates performed during the RTM execution. The logical processor will resume execution at the fallback address computed from the outermost XBEGIN instruction. The EAX register is updated to reflect RTM abort information.
41 |XEND executed outside a transactional region will cause a #GP (General Protection Fault).
42 |XEND
44 |IF (RTM_ACTIVE = 0) THEN 45 | SIGNAL #GP 46 | ELSE 47 | RTM_NEST_COUNT-- 48 | IF (RTM_NEST_COUNT = 0) THEN 49 | Try to commit transaction 50 | IF fail to commit transactional execution 51 | THEN 52 | GOTO RTM_ABORT_PROCESSING; 53 | ELSE (* commit success *) 54 | RTM_ACTIVE ← 0 55 | FI; 56 | FI; 57 | FI; 58 | (* For any RTM abort condition encountered during RTM execution *) 59 | RTM_ABORT_PROCESSING: 60 | Restore architectural register state 61 | Discard memory updates performed in transaction 62 | Update EAX with status 63 | RTM_NEST_COUNT ← 0 64 | RTM_ACTIVE ← 0 65 | IF 64-bit Mode 66 | THEN 67 | RIP ← fallbackRIP 68 | ELSE 69 | EIP ← fallbackEIP 70 | FI; 71 | END72 |
None
74 |XEND:
76 |void _xend( void );
77 |None
79 |#UD | 83 |
84 | CPUID.(EAX=7, ECX=0):EBX.RTM[bit 11] = 0. 85 |If LOCK or 66H or F2H or F3H prefix is used. |
#GP(0) | 88 |If RTM_ACTIVE = 0. |
Opcode/Instruction | 13 |Op/En | 14 |64/32bit Mode Support | 15 |CPUID Feature Flag | 16 |Description |
---|---|---|---|---|
19 | 0F 01 D6 20 |XTEST |
21 | A | 22 |V/V | 23 |HLE or RTM | 24 |Test if executing in a transactional region |
Op/En | 29 |Operand 1 | 30 |Operand2 | 31 |Operand3 | 32 |Operand4 |
A | 35 |NA | 36 |NA | 37 |NA | 38 |NA |
The XTEST instruction queries the transactional execution status. If the instruction executes inside a transaction-ally executing RTM region or a transactionally executing HLE region, then the ZF flag is cleared, else it is set.
41 |XTEST
43 |IF (RTM_ACTIVE = 1 OR HLE_ACTIVE = 1) 44 | THEN 45 | ZF ← 0 46 | ELSE 47 | ZF ← 1 48 | FI;49 |
The ZF flag is cleared if the instruction is executed transactionally; otherwise it is set to 1. The CF, OF, SF, PF, and AF, flags are cleared.
51 |XTEST:
53 |int _xtest( void );
54 |None
56 |#UD | 60 |
61 | CPUID.(EAX=7, ECX=0):HLE[bit 4] = 0 and CPUID.(EAX=7, ECX=0):RTM[bit 11] = 0. 62 |If LOCK or 66H or F2H or F3H prefix is used. |