Instructions AVX512 log2 or pow

I need the AVX512 function double pow(double, int n) (I need this to calculate the binomial distribution, which must be accurate). In particular, I would like to do this for Knights Landing, which has an AVX512ER. One way to get this is

 x^n = exp2(log2(x)*n) 

Knights Corner has the vlog2ps command ( _mm512_log2_ps intrinsic) and the vexp223ps command ( _mm512_exp223_ps intrinsic ), so at least I could execute float pow(float, float) with these two instructions.

However, with the knights landing, I do not find the log2 instruction. I find the vexp2pd instruction ( _mm512_exp2a23_pd intrinsic) in the AVX512ER. It seems strange to me that Knights Corner has a log2 instruction, but Knights Landing, which is newer and better, does not.

I have currently implemented pow(double, n) using re-squaring , but I think it would be more efficient if I had a log2 instruction.

 //AVX2 but easy to convert to AVX512 with mask registers static __m256d pown_AVX2(__m256d base, __m256i exp) { __m256d result = _mm256_set1_pd(1.0); int mask = _mm256_testz_si256(exp, exp); __m256i onei = _mm256_set1_epi64x(1); __m256d onef = _mm256_set1_pd(1.0); while(!mask) { __m256i t1 = _mm256_and_si256(exp, onei); __m256i t2 = _mm256_cmpeq_epi64(t1, _mm256_setzero_si256()); __m256d t3 = _mm256_blendv_pd(base, onef, _mm256_castsi256_pd(t2)); result = _mm256_mul_pd(result, t3); exp = _mm256_srli_epi64(exp, 1); base = _mm256_mul_pd(base,base); mask = _mm256_testz_si256(exp, exp); } return result; } 

Is there a more efficient algorithm to get double pow(double, int n) with AVX512 and AVX512ER than re-squaring? Is there an easy way (e.g. with a few instructions) to get log2 ?


Here is the version of AVX512F using re-squaring

 static __m512d pown_AVX512(__m512d base, __m512i pexp) { __m512d result = _mm512_set1_pd(1.0); __m512i onei = _mm512_set1_epi32(1); __mmask8 mask; do { __m512i t1 = _mm512_and_epi32(pexp, onei); __mmask8 mask2 = _mm512_cmp_epi32_mask(onei, t1, 0); result = _mm512_mask_mul_pd(result, mask2, result, base); pexp = _mm512_srli_epi32(pexp, 1); base = _mm512_mul_pd(base,base); mask = _mm512_test_epi32_mask(pexp, pexp); } while(mask); return result; } 

Int32 performance is not int64. Ideally, I would use __m256i for eight integers. However, this requires an AVX512VL, which extends 512b operations to 256b and 128b, but KNL does not have an AVX512VL. Instead, I use 512b operations for 32-bit integers, and I drop the 16b mask to 8b.

+6
source share

Source: https://habr.com/ru/post/1014760/


All Articles