Using SSE to speed up lower_bound

Question

Using SSE to speed up lower_bound

In the project I'm currently working on, I often need to find the smallest possible index in a sorted array into which to insert an element (for example, std :: lower_bound in C ++). It seems pretty attractive to me to use SSE to speed up my algorithm, since I work with uint32 arrays, the size of which is usually the size of the processor cache line. I have never used SSE instructions before, so I can’t figure out what the implementation of this SSE function will look like. Please give tips to help me write it optimally with SSE.

+3

c assembly x86 x86-64 sse

fokenrute Jan 22 '11 at 20:12

source share

1 answer

Billy ONeal · Answer 1 · 2011-01-22T20:44:40+0000

Nothing like this std::lower_boundwill scale well using SSE. The reason SSE makes things faster is because it allows you to do multiple calculations at once. For example, one SSE instruction can cause 4 multiplication operations to be performed immediately. However, the method std::lower_boundcannot be parallelized, since each step of the algorithm requires the results of a comparison of the previous steps. In addition, this is O (log n), and, as a result, it is unlikely to be a bottleneck.

Also, before moving on to inline assembly, you should be aware that whenever you use inline assembly, you lose most of the compiler optimizations that may arise in this section of your program, and often your program will be slower - usually compilers write better assembler than we humans.

SSE, intrinsics - "" , , SSE, . Microsoft Visual ++, GNU. (, , . )

, std::lower_bound SSE, . , , lower_bound, , insertion sort, , . , , , , , O (n lg n). , , - std::set, O (lg n) , O (n + lg n) re .

, , :)

Using SSE to speed up lower_bound

More articles: