First of all, depending on your accuracy requirements, this can be significantly more complex than your previous questions.
Now that you have been warned: first you want to reduce the modulo pi / 2 argument (or 2pi, or pi or pi / 4) to get input in a controlled range. This is the subtle part. For a pleasant discussion of the issues raised, download a copy of KC NG. REDUCING ARGUMENTS FOR HUGE ARGUMENTS: Good for the last bit. (a simple google search on the title will give you a pdf file). It is very readable and perfectly describes why it is difficult.
After that, you only need to approximate the functions in a small range around zero, which is easy to do using the polynomial approximation. The Taylor series will work, although it is inefficient. The truncated series of Chebyshev is easy to calculate and reasonably effective; calculating the minimax approximation is even better. This is the easy part.
I implemented sine and cosine exactly as described, entirely in the integer, in the past (sorry, not public sources). Using manual assembly, results in the vicinity of 100 cycles are quite reasonable for "typical" processors. I don’t know what equipment you are facing (performance will mainly depend on how quickly your equipment can generate most of the integer multiplication).
source share