The correct way to test numpy.dtype

Question

The correct way to test numpy.dtype

I am looking at a third-party library that has the following if -test:

 if isinstance(xx_, numpy.ndarray) and xx_.dtype is numpy.float64 and xx_.flags.contiguous: xx_[:] = ctypes.cast(xx_.ctypes._as_parameter_,ctypes.POINTER(ctypes.c_double))

It seems that xx_.dtype is numpy.float64 always fails:

 >>> xx_ = numpy.zeros(8, dtype=numpy.float64) >>> xx_.dtype is numpy.float64 False

What is the correct way to verify that the dtype array is numpy float64 ?

+5

python numpy

Zero Nov 14 '14 at 2:15

source share

2 answers

Try:

 x = np.zeros(8, dtype=np.float64) print x.dtype is np.dtype(np.float64))

is checks for the identity of two objects, regardless of whether they have the same id() . It is used, for example, for testing is None , but may give errors when testing integers or strings. But in this case, there is another problem: x.dtype and np.float64 are not the same class.

 isinstance(x.dtype, np.dtype) # True isinstance(np.float64, np.dtype) # False x.dtype.__class__ # numpy.dtype np.float64.__class__ # type

np.float64 is actually a function. np.float64() produces 0.0 . x.dtype() throws an error. (The np.float64 fix is a class.)

In my interactive tests:

 x.dtype is np.dtype(np.float64)

returns True . But I do not know how universal this is, or simply the result of some kind of local caching. The dtype documentation mentions the dtype attribute:

dtype.num A unique number for each of the 21 different built-in types.

Both types of dtypes give 12 for this num .

 x.dtype == np.float64

True tests.

Also, using type works:

 x.dtype.type is np.float64 # True

When I import ctypes and execute cast (with your xx_ ), I get an error:

ValueError: setting an array element with a sequence.

I do not know enough ctypes to understand what he is trying to do. It looks like it is doing a pointer type conversion of data xx_ , xx_.ctypes._as_parameter_ - this is the same number as xx_.__array_interface__['data'][0] .

In numpy test code, I find these dtype tests:

 issubclass(arr.dtype.type, (nt.integer, nt.bool_) assert_(dat.dtype.type is np.float64) assert_equal(A.dtype.type, np.unicode_) assert_equal(r['col1'].dtype.kind, 'i')

numpy documentation also talks about

 np.issubdtype(x.dtype, np.float64) np.issubsctype(x, np.float64)

both of which use issubclass .

Further tracing of the c code suggests that x.dtype == np.float64 evaluates to:

 x.dtype.num == np.dtype(np.float64).num

That is, the scalar type is converted to the dtype and .num . The code is in scalarapi.c , descriptor.c , multiarraymodule.c of numpy / core / src / multiarray

0

hpaulj Nov 14 '14 at 4:20

source share

abarnert · Accepted Answer · 2014-11-14T02:20:49+0000

This is a bug in lib.

dtype objects can be built dynamically. And NumPy does it all the time. There is no guarantee that they are interned, so building a dtype that already exists will give you the same thing.

Also, np.float64 is not really a dtype ; this ... I don't know what these types call, but the types used to build scalar objects from the bytes of the array, which are usually found in the type dtype attribute, so I'm going to call this a dtype.type . (Note that np.float64 subclasses both NumPy numeric tower types and ABC with a Python numeric tower, while np.dtype , of course not.)

You can usually use them interchangeably; when you use dtype.type or, for that matter, the native Python numeric type where dtype was expected, and dtype is built on the fly (which, again, is not guaranteed to be interned), but of course that doesn’t mean that they are identical:

 >>> np.float64 == np.dtype(np.float64) == np.dtype('float64') True >>> np.float64 == np.dtype(np.float64).type True

dtype.type will usually be identical if you use the built-in types:

 >>> np.float64 is np.dtype(np.float64).type True

But two dtype are often missing:

 >>> np.dtype(np.float64) is np.dtype('float64') False

But again, none of this is guaranteed. (Also, note that np.float64 and float use the same storage, but are separate types. And of course, you can also make dtype('f8') , which is guaranteed to work the same way dtype(np.float64) , but that doesn't mean 'f8' is or even == , np.float64 .)

Thus, it is possible that building an array by explicitly passing np.float64 as the dtype argument means that you will return the same instance when you check the dtype.type attribute, but this is not guaranteed. And if you pass np.dtype('float64') or ask NumPy to output it from the data, or you pass a dtype string to parse it, e.g. 'f8' , etc., It will be less likely. More importantly, you definitely will not get np.float64 back as dtype .

So how should this be fixed?

Well, the docs define what it means that the two dtype are equal, and this is a useful thing, and I think this is probably the useful thing you are looking for here. So just replace is with == :

 if isinstance(xx_, numpy.ndarray) and xx_.dtype == numpy.float64 and xx_.flags.contiguous:

However, to some extent, I only assume that you are looking. (The fact that he checks the adjacent flag implies that he will probably go into internal storage ... but then why doesn't he check the order of C or Fortran, the order of bytes, or something else?)

The correct way to test numpy.dtype

More articles: