Saving the x87 / MMX / XMM / YMM registers can take quite some time and cause significant cache thrash. Typically, saving and restoring the FP state is done in a lazy way: when switching context, the kernel remembers the current thread as the βownerβ of the FP state and sets the TS flag to CR0, and this will cause the kernel to trap whenever the thread tries to execute FP insn. The FP state of the old thread and the FP state of the current executable thread are saved and restored accordingly at this time.
Now, if for long periods of time (several or many context switches), no other thread, except yours, uses FP insns - a lazy policy will not lead to the fact that the FP state will not be saved / restored at all, and you will not get hit performance.
Since we are obviously talking about a multiprocessor system, threads that execute your algorithm in parallel will not conflict with each other, because they must run on their own CPU / core / HT and have their own set of registers.
TL; dg
You do not have to worry about the overhead of saving and restoring FP registers.
source share