Overflow warnings when multiplied by numpy masked arrays

I have an application that reads 32-bit floating point data from a netcdf file, which uses the default netcdf fill value, that is 9.96920996839e + 36. At a certain point in the application, the base scaling (multiplication) of the operation on masks of an array of type float32 is performed created from the input in this way:

x = marr * scale # or, equivalently, x = ma.multiply(marr,scale) 

This operation generates overflow warnings that occur during propagation, apparently because the product of the fill and scale values ​​exceeds the maximum value of the 32-bit float. It is known that other values ​​in the masked array are small. The question then becomes: why does numpy even calculate the product for masked elements in the input array? Of course, they should simply be ignored, right?

As it happens, the warning can be silently ignored, since the corresponding values ​​in the output array are still marked as masked. But it would be interesting to know if this is a mistake in numpy or "works as designed."

Below is a snippet of code.

 import numpy as np import numpy.ma as ma arr = [9.96920996839e+36, 1.123, 2.345, 9.96920996839e+36] marr = ma.masked_values(np.array(arr, dtype='float32'), 9.96920996839e+36) x = marr * 128.0 

As expected, the overflow warning does not appear if the masked array is of type float64 (although, presumably, if the scale factor were large enough). Similarly, the warning disappears if a lower fill value, for example. -1.0e20, used in case of float32.

At first glance, it seems that numpy cannot identify masked values ​​when a larger fill value is used (which is very close to the maximum value for the 32-bit fp value).

TIA
Phil

+4
source share
2 answers

The question is, why does numpy even compute the product for masked elements in the input array? Of course, they should simply be ignored, right?

Unfortunately no. In the current implementation, any operation is applied to the whole array, then the mask is applied again.

I know this seems counterproductive, but it was a more reliable and less ineffective alternative to other approaches. Initially, it would be great to apply the operation only in the corresponding domain, but the calculation of this domain could become quite complicated (there were problems with pow ). Moreover, additional tests would crash miserable performances.

A new method has recently been introduced in which numpy functions accept an optional where argument, which may help in this ... But there is also talk of introducing support for missing / ignored values ​​directly at the C level, which is likely to be the way to go.

+2
source

Probably a mistake. In the previous line:

  np.seterr(divide='ignore', invalid='ignore') 

which indicates that it is designed to handle mask 0 or NaN, but not very large values. It should be

  np.seterr(divide='ignore', invalid='ignore', over='ignore') 

to handle large mask values.

Note that numpy.ma operations usually work with all values ​​in an array, including masked values; this is apparently due to performance and broadcast issues.

+1
source

Source: https://habr.com/ru/post/1435248/


All Articles