There is a different possible explanation, and the one that is applicable in your case depends on which platform you use, or exactly which mathematical library you use. But one of the possible explanations:
First of all, the calculation of tanh does not depend on the standard definition of tanh , but expresses it in terms of exp(-2*x) or expm1(2*x) , which means that you need to calculate only one exponent, which is probably a difficult operation (except Moreover, there is a separation and some additions).
Secondly, what can be a trick is that for large values of x this will decrease to (exp(2*x)-1)/(exp(2*x)+1) = 1 - 2/(expm1(2*x)+2) . The advantage here is that since the second term is small, it does not have to be calculated with the same relative accuracy in order to obtain the same final accuracy. This means that expm1 is not required expm1 , as usual.
In addition, for smalish x values, there is a similar trick when rewriting it as (1-exp(-2*x))/(1+exp(-2*x)) = - 1/ (1 + 2/(expm1(-2*x)+2) , which again means that we can use the exp(-2*x) factor, which is large, and it doesn’t need to calculate it with the same accuracy, but you don’t need to calculate it this way, instead, you use the expression expm1(-2*x)/(2+expm1(-2*x)) with the same accuracy requirement on expm1 .
In addition, there are other optimizations available for large x values, which is impossible for exp to be basically of the same origin. For large x coefficient expm1(2*x) becomes so large that we can just completely discard it, and for exp we still need to calculate it (this even holds for a large negative x ). For these tanh values, decision 1 would be immediately taken, and exp should be calculated.
source share