Comparing Unicode by elementary string in numpy

Question

Comparing Unicode by elementary string in numpy

I have a question about comparing equality with numpy and string arrays. Let's say I define the following array:

x = np.array(['yes', 'no', 'maybe'])

Then I can check for equality with other lines, and it does an elementary comparison with one line (the following, I think, broadcast rules are here: http://docs.scipy.org/doc/numpy-1.10.1/user/basics. broadcasting.html ?):

 'yes' == x #op : array([ True, False, False], dtype=bool) x == 'yes' #op : array([ True, False, False], dtype=bool)

However, if I compare strings with unicode, I get different behavior using elementary comparison only if I compare an array with a string, and only one comparison is performed if I compare a string with an array.

 x == u'yes' #op : array([ True, False, False], dtype=bool) u'yes' == x #op : False

I cannot find details about this behavior in numpy docs and was hoping someone could explain or point out details to me why the comparison with unicode strings behaves differently?

+5

python arrays numpy unicode python-2.x

jay - bee Jan 29 '16 at 13:39

source share

1 answer

一二三 · Accepted Answer · 2016-01-29T14:46:37+0000

The relevant piece of information is part of the Python enforcement rules :

For objects x and y , the first x.__op__(y) checked. If this is not implemented or returns NotImplemented , y.__rop__(x) checked.

Using your numpy x array when the left side is str ( 'yes' == x ):

'yes'.__eq__(x) returns NotImplemented and So
allows x.__eq__('yes') - this leads to a quantitative comparison of the elements.

However, if the left side is a unicode ( u'yes' == x ):

u'yes'.__eq__(x) just returns False .

The reason for the different __eq__ actions is that str.__eq__() simply returns NotImplemented if its argument is not of type str , whereas unicode.__eq__() first tries to convert its argument to unicode , and returns NotImplemented only if the conversion fails. In this case, the numpy array is converted to unicode : u'yes' == x essentially u'yes' == unicode(x) .

Comparing Unicode by elementary string in numpy

More articles: