Numpy ndarray subclass not working properly

`Hello everyone.

I found that there is strange behavior when subclassing ndarray.

import numpy as np class fooarray(np.ndarray): def __new__(cls, input_array, *args, **kwargs): obj = np.asarray(input_array).view(cls) return obj def __init__(self, *args, **kwargs): return def __array_finalize__(self, obj): return a=fooarray(np.random.randn(3,5)) b=np.random.randn(3,5) a_sum=np.sum(a,axis=0,keepdims=True) b_sum=np.sum(b,axis=0, keepdims=True) print a_sum.ndim #1 print b_sum.ndim #2 

As you saw, the keepdims argument keepdims not work for my fooarray subclass. He lost one of his axles. How can I not avoid this problem? Or more generally, how can I properly subclass numpy ndarray?

+6
source share
2 answers

np.sum can accept various objects as input: not only ndarrays, but also lists, generators, np.matrix s, for example. The keepdims parameter obviously does not make sense for lists or generators. This is also not suitable for np.matrix instances, since np.matrix always has 2 dimensions. If you look at the call signature for np.matrix.sum , you will see that its sum method has no keepdims parameter:

 Definition: np.matrix.sum(self, axis=None, dtype=None, out=None) 

Thus, some ndarray subclasses may have sum methods that do not have a keepdims parameter. This is an unfortunate violation of the Liskov signature principle and the origin of the error that you encountered.

Now, if you look at the source code for np.sum , you will see that it is a delegation function that tries to determine what to do based on the type of the first argument.

If the type of the first argument is not ndarray , it disables the keepdims parameter. It does this because passing the keepdims parameter to np.matrix.sum will np.matrix.sum an exception.

Since np.sum tries to make delegation the most general way, without making any assumptions about which arguments the ndarray subclass can use, it passes the keepdims parameter when passing fooarray .

The np.sum is not to use np.sum , but to call a.sum . In any case, this will be more direct, since np.sum is just a delegating function.

 import numpy as np class fooarray(np.ndarray): def __new__(cls, input_array, *args, **kwargs): obj = np.asarray(input_array, *args, **kwargs).view(cls) return obj a = fooarray(np.random.randn(3, 5)) b = np.random.randn(3, 5) a_sum = a.sum(axis=0, keepdims=True) b_sum = np.sum(b, axis=0, keepdims=True) print(a_sum.ndim) # 2 print(b_sum.ndim) # 2 
+4
source

To talk a bit about the @mskimm comment, if you look at the relevant part of the numpy source code, core/fromnumeric.py , it’s clear why a.sum(..., keepdims=True) works, whereas np.sum(a, ..., keepdims=True) will not:

 def sum(a, axis=None, dtype=None, out=None, keepdims=False): ... if isinstance(a, _gentype): res = _sum_(a) if out is not None: out[...] = res return out return res elif type(a) is not mu.ndarray: try: sum = a.sum except AttributeError: return _methods._sum(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims) # NOTE: Dropping the keepdims parameters here... return sum(axis=axis, dtype=dtype, out=out) else: return _methods._sum(a, axis=axis, dtype=dtype, out=out, keepdims=keepdims) ... 

Since you subclassed np.ndarray , type(a) is fooarray , not mu.ndarray , so you mu.ndarray up with this line:

 # NOTE: Dropping the keepdims parameters here... return sum(axis=axis, dtype=dtype, out=out) 

The keepdims keyword keepdims is a relatively new ndarrays function and is not currently implemented for some other array-like classes, such as np.matrix or np.ma.masked_array , which also have a .sum() method, hence why this parameter is currently time is omitted for non ndarray s.

+2
source

Source: https://habr.com/ru/post/969513/


All Articles