AVX512 Vector Length and SAE Control

Question

AVX512 Vector Length and SAE Control

My question relates to reg-reg encoded instructions encoded in EVEX, without semantic rounding , which allow you to control SAE (suppress all exceptions) such as VMIN *, VCVTT *, VGETEXT *, VREDUCE *, VRANGE *, etc. Intel only announces SAE awareness with a total length of 512 bits, for example.

VMINPD xmm1 {k1}{z}, xmm2, xmm3 VMINPD ymm1 {k1}{z}, ymm2, ymm3 VMINPD zmm1 {k1}{z}, zmm2, zmm3{sae}

but I see no reason SAE cannot be applied to instructions that use the xmm or ymm registers.

In Chapter 4.6.4 Intel's instruction set programming reference, Table 4-7 says that in instructions without rounding the semantic bit, EVEX.b indicates that SAE is applied, and the EVEX.L'L bits indicate the explicit length of the vector:

 00b: 128bit (XMM) 01b: 256bit (YMM) 10b: 512bit (ZMM) 11b: reserved

therefore, their combination must be legal.

However, NASM collects vminpd zmm1,zmm2,zmm3,{sae} as 62F1ED185DCB, that is, EVEX.L'L = 00b, EVEX.b = 1, which is disassembled by NDISASM 2.12 as vminpd xmm1,xmm2,xmm3

NASM refuses to vminpd ymm1,ymm2,ymm3,{sae} and NDISASM disassemblies 62F1ED385DCB (EVEX.L'L = 01b, EVEX.b = 1) as vminpd xmm1,xmm2,xmm3

I wonder how the Knights Landing CPU VMINPD ymm1, ymm2, ymm3{sae} (assembled as 62F1ED385DCB, EVEX.L'L = 01b, EVEX.b = 1) :

The CPU throws an exception. Table 4-7 Intel dod is misleading.
SAE is valid, the CPU only works with xmm, just like in a scalar operation. NASM and NDISASM are doing everything right, Intel documentation is wrong.
SAE is ignored, the CPU works with 256 bits according to the VMINPD specification in the Intel document. NASM and NDISASM are wrong.
SAE is active, the CPU works with 256 bits, as indicated in the instruction code. NASM and NDISASM are wrong; Intel doc needs to further decorate xmm / ymm instructions with {sae}.
SAE is valid, the CPU works with an implied 512-bit full vector size, regardless of EVEX.L'L, just as if static rounding {er} were allowed. NDISASM and Intel doc Table 4-7 is incorrect.

+5

assembly x86 avx512

vitsoft Apr 23 '16 at 17:58

source share

1 answer

Ross ridge · Answer 1 · 2016-08-15T18:20:55+0000

Your VMINPD ymm1, ymm2, ymm3{sae} instruction VMINPD ymm1, ymm2, ymm3{sae} is incorrect. According to the MINPD instruction set instruction sheet, only the following encodings are allowed in the Intel Architecture Setpoint Programming Guide (February 2016) :

 66 0F 5D /r MINPD xmm1, xmm2/m128 VEX.NDS.128.66.0F.WIG 5D /r VMINPD xmm1, xmm2, xmm3/m128 VEX.NDS.256.66.0F.WIG 5D /r VMINPD ymm1, ymm2, ymm3/m256 EVEX.NDS.128.66.0F.W1 5D /r VMINPD xmm1 {k1}{z}, xmm2, xmm3/m128/m64bcst EVEX.NDS.256.66.0F.W1 5D /r VMINPD ymm1 {k1}{z}, ymm2, ymm3/m256/m64bcst EVEX.NDS.512.66.0F.W1 5D /r VMINPD zmm1 {k1}{z}, zmm2, zmm3/m512/m64bcst{sae}

Please note that only the latest version is displayed with the suffix {sae} , which means that this is the only form of instruction with which you can use it. Just because bits exist to encode a particular instruction does not mean that it is valid.

Also note that in Section 4.6.3, SAE Support in EVEX, it is clear that SAE does not apply to 128-bit or 256-bit vectors:

The EVEX encoding system allows you to execute floating point arithmetic instructions without rounding semantics for encoding with the SAE attribute. This feature applies to scalar and 512-bit length vectors, for registration only, setting EVEX.b. When EVEX.b is installed, it means "suppress all exceptions." [...]

I am not sure, however, whether your manual instruction will throw an Invalid Opcode exception if the EVEX.b bit is simply ignored or if the EVEX.L'L bit is ignored. EVEX encoded VMINPD instructions belong to an E2 exception class, and according to Table 4-17, E2 class exception conditions, an instruction can throw a #UD exception in any of the following cases:

State requirement, table 4-8 is not fulfilled.
The optional #UD Opcode condition in Table 4-9.
Coding clause #UD conditions in table 4-10.
The Opmask #UD designation in table 4-11.
If EVEX.LL! = 10b (VL = 512).

Only this last reason seems to be applicable here, but it will mean that your team will throw a #UD exception with or without the {sae} modifier. Since this seems to directly contradict the permitted encodings in the instruction summary, I'm not sure what will happen.

AVX512 Vector Length and SAE Control

More articles: