++...
, std:: minmax .
. , , , . . , . , , .
, , .
2 , , - . : 2 min/max 1 a b. .
2 , 32- malloc, . , - .
F.ex, , AVX2. (. , , CPU!). : https://software.intel.com/sites/landingpage/IntrinsicsGuide/ .
, , :
- _mm256_min_epi32
- _mm256_max_epi32
- _mm256_stream_load_si256
, , , __mm256 . : min/max 256- , , 32- min/max .
: ... . , .
2 , , , , . , limit, , , , .
, , , ... . ; , , . , inline, . aligned, .
, int* . , , const.
. , SSE, AVX2 ( ). , - .
Run in release mode, compile with optimization on "fast" and see what happens under the hood. If you do all this, you should see instructions vpmax...appearing in the inner loops, which means that the compiler makes excellent use of intrinsics.
I do not know what else you want to do in the loop ... if you use all these instructions, you have to press the memory speed on large arrays.