The fastest way to determine a non-zero minimum

Having an array of four integers, how to determine its non-zero minimum - the fastest way?

+6
source share
5 answers

There is a parallel solution to this problem, but it is probably not worth the effort.

First, we define the operation xchg(m, n) from the array a:

 xchg(m, n) => ((a[m] > a[n] && a[n] != 0) || a[m] == 0) ? swap(a[m],a[n]) 

This operation sorts the two elements "m" and "n" in ascending order if they both contain nonzero values ​​or change them if the value in the element "m" is zero.

Next, we perform a set of five such operations as follows:

 xchg(0,2) xchg(1,3) xchg(0,1) xchg(2,3) xchg(1,2) 

Paralyzed xchg operations can be performed in parallel, reducing time costs by 40% for strictly sequential execution. When we are done, any non-zero elements of the array will be sorted in ascending order. The smallest element will be at [0]. If this value is zero, there are no nonzero values ​​in the array.

This solution uses the built-in parallelism provided by the sorting networks ( http://en.wikipedia.org/wiki/Sorting_network ), but sequential scanning of 4 elements also uses no more than three comparison operations, and to a decisive extent, it takes half as much storage entry :

sequential scan

 int v = a[0] for (n = 1; n < 4; n++) { if ((a[n] < v && a[n] != 0 ) || v == 0) v = a[n] } 
+6
source

If you do not save the minimum value, because the elements are added to the array or you save the array in sorted order, I see no other solution than iterating through each member to determine the minimum value.

There is no β€œquick” way to test each participant.

As a rule, I suggest not optimizing anything unless it is actually slow. The old rule of your program spends 90% of its time in 10% of the code, which is usually true. Thus, the rules that programmers make up 99.99% can optimize the code not at that 10%.

Profile your code - profile of your code - enter your code

+8
source

If we are thinking about microoptimization, then it is probably faster to calculate min(min(a,b),min(c,d)) instead of min(min(min(a,b),c),d) on a modern processor without order , due to smaller sequential dependencies: in the first processor it can calculate min(a,b) and min(c,d) independently of each other if there are a sufficient number of executable blocks. It is assumed that the processor has a conditional move instruction, so min does not require branching.

+5
source

Depends on the input. If the array is not sorted, you will have to iterate over the full array. If the array is sorted, you just need to loop until you find something that is not non-zero - it is much shorter.

+3
source

Well, the fastest way to encode it is std::min({a,b,c,d}) .

In a more serious note: if you use a bottleneck on something like accepting a minimum number of values, the best solution would be to find a way to break this minimal search task into pieces and send it to the GPU (or many threads), which can then perform many minimal computing.

Parallelism will probably help more than trying to write a minimal function in an assembly.

+1
source

Source: https://habr.com/ru/post/906006/


All Articles