The fastest way to find non-final values

This is inspiring: python: numpy combo masking .

The challenge is to create a boolean array of all values ​​that are not finite. For instance:

>>> arr = np.array([0, 2, np.inf, -np.inf, np.nan]) >>> ~np.isfinite(arr) array([False, False, True, True, True], dtype=bool) 

It seems to me that this would be the fastest way to find not finite values, but it seems that there is a faster way. In particular, np.isnan(arr - arr) should do the same:

 >>> np.isnan(arr - arr) array([False, False, True, True, True], dtype=bool) 

With this in mind, we see that it is twice as fast!

 arr = np.random.rand(100000) %timeit ~np.isfinite(arr) 10000 loops, best of 3: 198 µs per loop %timeit np.isnan(arr - arr) 10000 loops, best of 3: 85.8 µs per loop 

So my question is twofold:

  • Why is the np.isnan(arr - arr) trick faster than the "obvious" version of ~np.isfinite(arr) ? Is there an input for which it does not work?

  • Is there an even faster way to find all non-finite values?

+5
source share
1 answer

It is hard to answer because np.isnan and np.isfinite can use different C functions depending on the build. And depending on the performance (which may well depend on the compiler, the system and how the NumPy function itself) from these C-functions, the timings will differ.


ufuncs for both refer to the built-in npy_ func ( source (1.11.3) ):

 /**begin repeat1 * #kind = isnan, isinf, isfinite, signbit, copysign, nextafter, spacing# * #func = npy_isnan, npy_isinf, npy_isfinite, npy_signbit, npy_copysign, nextafter, spacing# **/ 

And these functions are determined based on the presence of compile-time constants ( source (1.11.3) ):

 /* use builtins to avoid function calls in tight loops * only available if npy_config.h is available (= numpys own build) */ #if HAVE___BUILTIN_ISNAN #define npy_isnan(x) __builtin_isnan(x) #else #ifndef NPY_HAVE_DECL_ISNAN #define npy_isnan(x) ((x) != (x)) #else #if defined(_MSC_VER) && (_MSC_VER < 1900) #define npy_isnan(x) _isnan((x)) #else #define npy_isnan(x) isnan(x) #endif #endif #endif /* only available if npy_config.h is available (= numpys own build) */ #if HAVE___BUILTIN_ISFINITE #define npy_isfinite(x) __builtin_isfinite(x) #else #ifndef NPY_HAVE_DECL_ISFINITE #ifdef _MSC_VER #define npy_isfinite(x) _finite((x)) #else #define npy_isfinite(x) !npy_isnan((x) + (-x)) #endif #else #define npy_isfinite(x) isfinite((x)) #endif #endif 

Thus, it can only be that in your case np.isfinite should work (a lot) more than np.isnan . But equally likely, on another computer or with a different line, np.isfinite will be faster or both will be equally fast.

Thus, there is probably no hard and fast rule for what the “fastest way” is. It depends on too many factors. Personally, I would just go with np.isfinite , because it can be faster (and not too slow even in your case), and this makes the intention much clearer.


Just in case, when you really optimize performance, you can always do negation in place. This can reduce time and memory by avoiding one temporary array:

 import numpy as np arr = np.random.rand(1000000) def isnotfinite(arr): res = np.isfinite(arr) np.bitwise_not(res, out=res) # in-place return res np.testing.assert_array_equal(~np.isfinite(arr), isnotfinite(arr)) np.testing.assert_array_equal(~np.isfinite(arr), np.isnan(arr - arr)) %timeit ~np.isfinite(arr) # 3.73 ms ± 4.16 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit isnotfinite(arr) # 2.41 ms ± 29.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) %timeit np.isnan(arr - arr) # 12.5 ms ± 772 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 

Note also that the np.isnan solution on my computer is much slower (Windows 10 64-bit Python 3.5 NumPy 1.13.1 Anaconda build)

+3
source

Source: https://habr.com/ru/post/1270995/


All Articles