I need to compare two buffers for equality. I do not need information on the ratio of two buffers, only if each of the two pieces is equal or not. My Intel machine supports up to SSE4.2
Naive approach:
const size_t CHUNK_SIZE = 16; //128bit for SSE2 integer registers const int ARRAY_SIZE = 200000000; char* array_1 = (char*)_aligned_malloc(ARRAY_SIZE, 16); char* array_2 = (char*)_aligned_malloc(ARRAY_SIZE, 16); for (size_t i = 0; i < ARRAY_SIZE; ) { volatile bool result = memcmp(array_1+i, array_2+i, CHUNK_SIZE); i += CHUNK_SIZE; }
Compared to my first attempt to use SSE:
union U { __m128i m; volatile int i[4]; } res; for (size_t i = 0; i < ARRAY_SIZE; ) { __m128i* pa1 = (__m128i*)(array_1+i); __m128i* pa2 = (__m128i*)(array_2+i); res.m = _mm_cmpeq_epi32(*pa1, *pa2); volatile bool result = ( (res.i[0]==0) || (res.i[1]==0) || (res.i[2]==0) || (res.i[3]==0) ); i += CHUNK_SIZE; }
The gain in speed is about 33%. Can i do better?
source share