I get a performance penalty when mixing SIMD instructions and multithreading

Question

I get a performance penalty when mixing SIMD instructions and multithreading

I was interested in making a face recognition proyect (to use the SIMD instruction set). But during the first semester of this year, I learned something about flows, and I was wondering if I could combine them.

When should you avoid combining multithreading instructions with SIMD? When is it worth it?

+6

performance multithreading intel simd

Aj Nov 08 '11 at 4:03

source share

3 answers

Why do you think a problem will arise? SIMD registers will be replaced like any other CPU registers when the flow changes.

+1

O'rooney Nov 08 '11 at 4:15

source share

There are no new issues to worry about with multithreading and SIMD. As long as you do SIMD correctly and efficiently, you have nothing to worry about.

Meaning SIMD has its own implementation problems, as well as multithreading. But combining them will not be more complicated.

+1

Kyle Nov 08 '11 at 4:16

source share

chill · Accepted Answer · 2011-11-08T07:16:30+0000

Saving the x87 / MMX / XMM / YMM registers can take quite some time and cause significant cache thrash. Typically, saving and restoring the FP state is done in a lazy way: when switching context, the kernel remembers the current thread as the “owner” of the FP state and sets the TS flag to CR0, and this will cause the kernel to trap whenever the thread tries to execute FP insn. The FP state of the old thread and the FP state of the current executable thread are saved and restored accordingly at this time.

Now, if for long periods of time (several or many context switches), no other thread, except yours, uses FP insns - a lazy policy will not lead to the fact that the FP state will not be saved / restored at all, and you will not get hit performance.

Since we are obviously talking about a multiprocessor system, threads that execute your algorithm in parallel will not conflict with each other, because they must run on their own CPU / core / HT and have their own set of registers.

TL; dg

You do not have to worry about the overhead of saving and restoring FP registers.

I get a performance penalty when mixing SIMD instructions and multithreading

More articles: