SSE and MMX use the same registers, so it doesn't matter which of the two you use (of course, when using MMX and SSE)
The best question is how SSE is implemented on your target CPU. Does it have an SSE unit per core? (possibly) If so, then you can also run SSE instructions for each thread.
If it has a common SSE block between the cores, then different threads will fight for it, so much will not be achieved by following the SSE instructions in several threads. (I donβt know if any processors really share the SSE node between threads, so consider this as a hypothetical case)
jalf source share