Detecting if a NumPy array contains at least one non-numeric value?

I need to write a function that will determine if an input contains at least one value that is not numeric. If a non-numeric value is detected, I will raise an error (because the calculation should only return a numeric value). The number of measurements of the input array is not known in advance - the function must give the correct value regardless of ndim. As an additional complication, the input can be a single float or numpy.float64 or even something strange, like a zero dimensional array.

The obvious way to solve this problem is to write a recursive function that iterates over each object to be destroyed in the array until it finds iterability. It will apply the numpy.isnan() function for each indestructible object. If at least one non-numeric value is found, the function will immediately return False. Otherwise, if all values ​​in iterable are numeric, it will eventually return True.

This works fine, but it's pretty slow, and I expect NumPy to have a much better way to do this. What is an alternative that is faster and more numpyish?

Here is my layout:

 def contains_nan( myarray ): """ @param myarray : An n-dimensional array or a single float @type myarray : numpy.ndarray, numpy.array, float @returns: bool Returns true if myarray is numeric or only contains numeric values. Returns false if at least one non-numeric value exists Not-A-Number is given by the numpy.isnan() function. """ return True 
+79
python numpy
May 26 '09 at 17:43
source share
5 answers

This should be faster than iteration, and will work regardless of form.

 numpy.isnan(myarray).any() 

Edit: 30 times faster:

 import timeit s = 'import numpy;a = numpy.arange(10000.).reshape((100,100));a[10,10]=numpy.nan' ms = [ 'numpy.isnan(a).any()', 'any(numpy.isnan(x) for x in a.flatten())'] for m in ms: print " %.2f s" % timeit.Timer(m, s).timeit(1000), m 

Results:

  0.11 s numpy.isnan(a).any() 3.75 s any(numpy.isnan(x) for x in a.flatten()) 

Bonus: it works fine for NumPy types without an array:

 >>> a = numpy.float64(42.) >>> numpy.isnan(a).any() False >>> a = numpy.float64(numpy.nan) >>> numpy.isnan(a).any() True 
+142
May 27 '09 at a.m.
source share

If infinity is a possible value, I would use numpy.isfinite

 numpy.isfinite(myarray).all() 

If the above value is True , then myarray does not contain the values numpy.nan , numpy.inf or -numpy.inf .

numpy.nan will match the values ​​of numpy.inf , for example:

 In [11]: import numpy as np In [12]: b = np.array([[4, np.inf],[np.nan, -np.inf]]) In [13]: np.isnan(b) Out[13]: array([[False, False], [ True, False]], dtype=bool) In [14]: np.isfinite(b) Out[14]: array([[ True, False], [False, False]], dtype=bool) 
+13
Oct 09 '15 at 17:13
source share

With numpy 1.3 or svn you can do it

 In [1]: a = arange(10000.).reshape(100,100) In [3]: isnan(a.max()) Out[3]: False In [4]: a[50,50] = nan In [5]: isnan(a.max()) Out[5]: True In [6]: timeit isnan(a.max()) 10000 loops, best of 3: 66.3 Β΅s per loop 

Compared to previous versions, nans mapping treatment is not compatible.

+3
Aug 25 '09 at 0:04
source share

(np.where(np.isnan(A)))[0].shape[0] will be greater than 0 if A contains at least one element from nan , A can be an nxm matrix.

Example:

 import numpy as np A = np.array([1,2,4,np.nan]) if (np.where(np.isnan(A)))[0].shape[0]: print "A contains nan" else: print "A does not contain nan" 
+2
May 11 '17 at 20:52
source share

Pff! Microseconds! Never solve a problem in microseconds that can be solved in nanoseconds.

Please note that the accepted answer is:

  • iterates over all data, regardless of whether nan is found
  • creates a temporary array of size N, which is redundant.

The best solution is to return True immediately when the NAN is found:

 import numba import numpy as np NAN = float("nan") @numba.njit(nogil=True) def _any_nans(a): for x in a: if np.isnan(x): return True return False @numba.jit def any_nans(a): if not a.dtype.kind=='f': return False return _any_nans(a.flat) array1M = np.random.rand(1000000) assert any_nans(array1M)==False %timeit any_nans(array1M) # 573us array1M[0] = NAN assert any_nans(array1M)==True %timeit any_nans(array1M) # 774ns (!nanoseconds) 

and works for n-sizes:

 array1M_nd = array1M.reshape((len(array1M)/2, 2)) assert any_nans(array1M_nd)==True %timeit any_nans(array1M_nd) # 774ns 

Compare this with a simple solution:

 def any_nans(a): if not a.dtype.kind=='f': return False return np.isnan(a).any() array1M = np.random.rand(1000000) assert any_nans(array1M)==False %timeit any_nans(array1M) # 456us array1M[0] = NAN assert any_nans(array1M)==True %timeit any_nans(array1M) # 470us %timeit np.isnan(array1M).any() # 532us 

The early exit method is acceleration by 3 orders of magnitude or magnitude (in some cases). Not too shabby for a simple annotation.

0
Jul 17 '19 at 20:36
source share



All Articles