I thought f_dot would be slower since it was supposed to create a temporary denominator of arrays, and I assumed that f_no_dot was skipped this step.
For what it's worth, a temporary array is created, so f_no_dot
is slower (but uses less memory).
Elemental operations on arrays of the same size are faster, because numpy does not need to worry about the step (sizes, size, etc.) of the arrays.
Operations using broadcasting will usually be slightly slower than operations that are not required.
If you have spare memory, creating a temporary copy may give you speed, but will use more memory.
For example, comparing these three functions:
import numpy as np import timeit def f_no_dot(x, y): return x / y def f_dot(x, y): denom = np.dot(y, np.ones((1,2))) return x / denom def f_in_place(x, y): x /= y return x num = 3600000 x = np.ones((num, 2)) y = np.ones((num, 1)) for func in ['f_dot', 'f_no_dot', 'f_in_place']: t = timeit.timeit('%s(x,y)' % func, number=100, setup='from __main__ import x,y,f_dot, f_no_dot, f_in_place') print func, 'time...' print t / 100.0
This gives similar deadlines for your results:
f_dot time... 0.184361531734 f_no_dot time... 0.619203259945 f_in_place time... 0.585789341927
However, if we compare memory usage, things get a little clearer ...
The combined size of the x
and y
arrays is about 27.5 + 55 MB or 82 MB (for 64-bit ints). There is an additional ~ 11 MB of service data when importing numpy, etc.
Returning x / y
as a new array (i.e. without making x /= y
), another 55 megabyte array is required.
100 runs of f_dot
: We are creating a temporary array here, so we expect to see 11 + 82 + 55 + 55 MB or ~ 203 MB of memory usage. And this is what we see ...
100 f_no_dot
runs: If a temporary array is not created, we expect peak memory to use 11 + 82 + 55 MB, or 148 MB ...
... this is exactly what we see.
So x / y
does not create an extra temporary array of num x 2
for division.
Thus, the division takes a little longer than if it worked on two arrays of the same size.
100 f_in_place
runs: If we can change x
in place, we can save even more memory if this is the main problem.
Basically, numpy tries to save memory at the expense of speed, in some cases.