The correct way to test numpy.dtype

I am looking at a third-party library that has the following if -test:

 if isinstance(xx_, numpy.ndarray) and xx_.dtype is numpy.float64 and xx_.flags.contiguous: xx_[:] = ctypes.cast(xx_.ctypes._as_parameter_,ctypes.POINTER(ctypes.c_double)) 

It seems that xx_.dtype is numpy.float64 always fails:

 >>> xx_ = numpy.zeros(8, dtype=numpy.float64) >>> xx_.dtype is numpy.float64 False 

What is the correct way to verify that the dtype array is numpy float64 ?

+5
source share
2 answers

This is a bug in lib.

dtype objects can be built dynamically. And NumPy does it all the time. There is no guarantee that they are interned, so building a dtype that already exists will give you the same thing.

Also, np.float64 is not really a dtype ; this ... I don't know what these types call, but the types used to build scalar objects from the bytes of the array, which are usually found in the type dtype attribute, so I'm going to call this a dtype.type . (Note that np.float64 subclasses both NumPy numeric tower types and ABC with a Python numeric tower, while np.dtype , of course not.)

You can usually use them interchangeably; when you use dtype.type or, for that matter, the native Python numeric type where dtype was expected, and dtype is built on the fly (which, again, is not guaranteed to be interned), but of course that doesn’t mean that they are identical:

 >>> np.float64 == np.dtype(np.float64) == np.dtype('float64') True >>> np.float64 == np.dtype(np.float64).type True 

dtype.type will usually be identical if you use the built-in types:

 >>> np.float64 is np.dtype(np.float64).type True 

But two dtype are often missing:

 >>> np.dtype(np.float64) is np.dtype('float64') False 

But again, none of this is guaranteed. (Also, note that np.float64 and float use the same storage, but are separate types. And of course, you can also make dtype('f8') , which is guaranteed to work the same way dtype(np.float64) , but that doesn't mean 'f8' is or even == , np.float64 .)

Thus, it is possible that building an array by explicitly passing np.float64 as the dtype argument means that you will return the same instance when you check the dtype.type attribute, but this is not guaranteed. And if you pass np.dtype('float64') or ask NumPy to output it from the data, or you pass a dtype string to parse it, e.g. 'f8' , etc., It will be less likely. More importantly, you definitely will not get np.float64 back as dtype .


So how should this be fixed?

Well, the docs define what it means that the two dtype are equal, and this is a useful thing, and I think this is probably the useful thing you are looking for here. So just replace is with == :

 if isinstance(xx_, numpy.ndarray) and xx_.dtype == numpy.float64 and xx_.flags.contiguous: 

However, to some extent, I only assume that you are looking. (The fact that he checks the adjacent flag implies that he will probably go into internal storage ... but then why doesn't he check the order of C or Fortran, the order of bytes, or something else?)

+14
source

Try:

 x = np.zeros(8, dtype=np.float64) print x.dtype is np.dtype(np.float64)) 

is checks for the identity of two objects, regardless of whether they have the same id() . It is used, for example, for testing is None , but may give errors when testing integers or strings. But in this case, there is another problem: x.dtype and np.float64 are not the same class.

 isinstance(x.dtype, np.dtype) # True isinstance(np.float64, np.dtype) # False x.dtype.__class__ # numpy.dtype np.float64.__class__ # type 

np.float64 is actually a function. np.float64() produces 0.0 . x.dtype() throws an error. (The np.float64 fix is ​​a class.)

In my interactive tests:

 x.dtype is np.dtype(np.float64) 

returns True . But I do not know how universal this is, or simply the result of some kind of local caching. The dtype documentation mentions the dtype attribute:

dtype.num A unique number for each of the 21 different built-in types.

Both types of dtypes give 12 for this num .

 x.dtype == np.float64 

True tests.

Also, using type works:

 x.dtype.type is np.float64 # True 

When I import ctypes and execute cast (with your xx_ ), I get an error:

ValueError: setting an array element with a sequence.

I do not know enough ctypes to understand what he is trying to do. It looks like it is doing a pointer type conversion of data xx_ , xx_.ctypes._as_parameter_ - this is the same number as xx_.__array_interface__['data'][0] .


In numpy test code, I find these dtype tests:

 issubclass(arr.dtype.type, (nt.integer, nt.bool_) assert_(dat.dtype.type is np.float64) assert_equal(A.dtype.type, np.unicode_) assert_equal(r['col1'].dtype.kind, 'i') 

numpy documentation also talks about

 np.issubdtype(x.dtype, np.float64) np.issubsctype(x, np.float64) 

both of which use issubclass .


Further tracing of the c code suggests that x.dtype == np.float64 evaluates to:

 x.dtype.num == np.dtype(np.float64).num 

That is, the scalar type is converted to the dtype and .num . The code is in scalarapi.c , descriptor.c , multiarraymodule.c of numpy / core / src / multiarray

0
source

Source: https://habr.com/ru/post/1206847/


All Articles