From the Agner Fog instruction tables:
The Core2 65nm FSQRT takes from 9 to 69 cc. cm (with almost equal inverse bandwidth), depending on the bits of the value and accuracy. For comparison, FDIV takes from 9 to 38 cubic meters. See (with almost equal inverse bandwidth), FMUL takes 5 (recipthroughput = 2) and FADD takes 3 (recipthroughput = 1). SSE performance is about the same, but it looks faster because it cannot do 80-bit math. SSE has super fast approximate mutual and approximate mutual sqrt, though.
On Core2 45nm, division and square root are faster; FSQRT takes from 6 to 20 cubic meters. See, FDIV takes from 6 to 21 cubic meters. See, FADD and FMUL have not changed. Once again, SSE performance is about the same.
You can get documents with this information from your site .
harold Oct 11 '11 at 12:48 2011-10-11 12:48
source share