The INC instruction is not a SIMD level instruction, it works with whole scalars. As you and Paul have already suggested, the easiest way is to add 1 to each vector element that you can make by adding the vector 1 s.
If you want to simulate the internal, you can implement your own function:
inline __m256i _mm256_inc_epi16(__m256i a) { return _mm256_add_epi16(a, _mm256_set1_epi16(1)); }
For similar x86 issues in the future, you can find a collection of Intel ISA built-in features in the Intel Intrinsics Guide . Also see the Extensive resources documented in the x86 and sse tag:
source share