Why is tanh faster than exp on my machine?

Question

Why is tanh faster than exp on my machine?

This question arose from a separate issue , which, as it turned out, has some quirks that are clearly machine-specific. When I run the C ++ code below to write the time differences between tanh and exp , I see the following result:

 tanh: 5.22203 exp: 14.9393

tanh is executed ~ 3 times with an accuracy of exp . This is somewhat surprising given the mathematical definition of tanh (and not knowing the implementation of the algorithmic definition).

What else is happening on my laptop (Ubuntu 16.04, Intel Core i7-3517U CPU @ 1.90 GHz × 4), but it doesn’t happen on my desktop (the same OS, not sure about the processor specifications right now).

I compiled the code below using g++ . The above times were without compiler optimization, although the trend persists if I use -On for every n . I also looked at the values of a and b to see if the range of estimated values has an effect. It does not matter.

What would lead to tanh faster than exp on different machines?

 #include <iostream> #include <cmath> #include <ctime> using namespace std; int main() { double a = -5; double b = 5; int N = 10001; double x[10001]; double y[10001]; double h = (ba) / (N-1); clock_t begin, end; for(int i=0; i < N; i++) x[i] = a + i*h; begin = clock(); for(int i=0; i < N; i++) for(int j=0; j < N; j++) y[i] = tanh(x[i]); end = clock(); cout << "tanh: " << double(end - begin) / CLOCKS_PER_SEC << "\n"; begin = clock(); for(int i=0; i < N; i++) for(int j=0; j < N; j++) y[i] = exp(x[i]); end = clock(); cout << "exp: " << double(end - begin) / CLOCKS_PER_SEC << "\n"; return 0; }

edit: some assemblies

This is output when I compile the following simplified code below using g++ -g -O -Wa,-aslh nothing2.cpp > stuff.txt .

 #include <cmath> int main() { double x = 0.0; double y,z; y = tanh(x); z = exp(x); return 0; }

edit: another update

Suppose nothing2.cpp contains simplified code in the previous editor. I ran:

 g++ -o nothing2.so -shared -fPIC nothing2.cpp objdump -d nothing2.so > stuff.txt

Here is the contents of stuff.txt

+2

c ++

Matt hancock Mar 27 '17 at 11:26

source share

1 answer

skyking · Answer 1 · 2017-03-27T12:43:05+0000

There is a different possible explanation, and the one that is applicable in your case depends on which platform you use, or exactly which mathematical library you use. But one of the possible explanations:

First of all, the calculation of tanh does not depend on the standard definition of tanh , but expresses it in terms of exp(-2*x) or expm1(2*x) , which means that you need to calculate only one exponent, which is probably a difficult operation (except Moreover, there is a separation and some additions).

Secondly, what can be a trick is that for large values of x this will decrease to (exp(2*x)-1)/(exp(2*x)+1) = 1 - 2/(expm1(2*x)+2) . The advantage here is that since the second term is small, it does not have to be calculated with the same relative accuracy in order to obtain the same final accuracy. This means that expm1 is not required expm1 , as usual.

In addition, for smalish x values, there is a similar trick when rewriting it as (1-exp(-2*x))/(1+exp(-2*x)) = - 1/ (1 + 2/(expm1(-2*x)+2) , which again means that we can use the exp(-2*x) factor, which is large, and it doesn’t need to calculate it with the same accuracy, but you don’t need to calculate it this way, instead, you use the expression expm1(-2*x)/(2+expm1(-2*x)) with the same accuracy requirement on expm1 .

In addition, there are other optimizations available for large x values, which is impossible for exp to be basically of the same origin. For large x coefficient expm1(2*x) becomes so large that we can just completely discard it, and for exp we still need to calculate it (this even holds for a large negative x ). For these tanh values, decision 1 would be immediately taken, and exp should be calculated.

Why is tanh faster than exp on my machine?

edit: some assemblies

edit: another update

More articles: