My question relates to reg-reg encoded instructions encoded in EVEX, without semantic rounding , which allow you to control SAE (suppress all exceptions) such as VMIN *, VCVTT *, VGETEXT *, VREDUCE *, VRANGE *, etc. Intel only announces SAE awareness with a total length of 512 bits, for example.
VMINPD xmm1 {k1}{z}, xmm2, xmm3 VMINPD ymm1 {k1}{z}, ymm2, ymm3 VMINPD zmm1 {k1}{z}, zmm2, zmm3{sae}
but I see no reason SAE cannot be applied to instructions that use the xmm or ymm registers.
In Chapter 4.6.4 Intel's instruction set programming reference, Table 4-7 says that in instructions without rounding the semantic bit, EVEX.b indicates that SAE is applied, and the EVEX.L'L bits indicate the explicit length of the vector:
00b: 128bit (XMM) 01b: 256bit (YMM) 10b: 512bit (ZMM) 11b: reserved
therefore, their combination must be legal.
However, NASM collects vminpd zmm1,zmm2,zmm3,{sae} as 62F1ED185DCB, that is, EVEX.L'L = 00b, EVEX.b = 1, which is disassembled by NDISASM 2.12 as vminpd xmm1,xmm2,xmm3
NASM refuses to vminpd ymm1,ymm2,ymm3,{sae} and NDISASM disassemblies 62F1ED385DCB (EVEX.L'L = 01b, EVEX.b = 1) as vminpd xmm1,xmm2,xmm3
I wonder how the Knights Landing CPU VMINPD ymm1, ymm2, ymm3{sae} (assembled as 62F1ED385DCB, EVEX.L'L = 01b, EVEX.b = 1) :
- The CPU throws an exception. Table 4-7 Intel dod is misleading.
- SAE is valid, the CPU only works with xmm, just like in a scalar operation. NASM and NDISASM are doing everything right, Intel documentation is wrong.
- SAE is ignored, the CPU works with 256 bits according to the VMINPD specification in the Intel document. NASM and NDISASM are wrong.
- SAE is active, the CPU works with 256 bits, as indicated in the instruction code. NASM and NDISASM are wrong; Intel doc needs to further decorate xmm / ymm instructions with {sae}.
- SAE is valid, the CPU works with an implied 512-bit full vector size, regardless of EVEX.L'L, just as if static rounding {er} were allowed. NDISASM and Intel doc Table 4-7 is incorrect.