ISIN function does not work for dates

d = {'Dates':[pd.Timestamp('2013-01-02'), pd.Timestamp('2013-01-03'), pd.Timestamp('2013-01-04')], 'Num1':[1,2,3], 'Num2':[-1,-2,-3]} df = DataFrame(data=d) 

We have this data frame.

 Dates Num1 Num2 0 2013-01-02 00:00:00 1 -1 1 2013-01-03 00:00:00 2 -2 2 2013-01-04 00:00:00 3 -3 Dates datetime64[ns] Num1 int64 Num2 int64 dtype: object 

It gives me

 df['Dates'].isin([pd.Timestamp('2013-01-04')]) 0 False 1 False 2 False Name: Dates, dtype: bool 

I expect True on the date "2013-01-04", what am I missing? I am using the latest version of <0.460>

+6
source share
4 answers

I have the same version of pandas and @DSM's answer was helpful. Another workaround would be to use the apply method:

 >>> df.Dates.apply(lambda date: date in [pd.Timestamp('2013-01-04')]) 0 False 1 False 2 True Name: Dates, dtype: bool 
+2
source

Yes, that seems like a mistake. This applies to this part of lib.ismember :

 for i in range(n): val = util.get_value_at(arr, i) if val in values: result[i] = 1 else: result[i] = 0 

val is a numpy.datetime64 object, and values is a set of Timestamp objects. Membership testing should work, but not:

 >>> import pandas as pd, numpy as np >>> ts = pd.Timestamp('2013-01-04') >>> ts Timestamp('2013-01-04 00:00:00', tz=None) >>> dt64 = np.datetime64(ts) >>> dt64 numpy.datetime64('2013-01-03T19:00:00.000000-0500') >>> dt64 == ts True >>> dt64 in [ts] True >>> dt64 in {ts} False 

I think that usually this behavior - work in a list that does not work in a set - is connected with something wrong with __hash__ :

 >>> hash(dt64) 1357257600000000 >>> hash(ts) -7276108168457487299 

You cannot run membership test if the hashes do not match. I can come up with several ways to fix this, but choosing the best one will depend on the design they made while implementing Timestamps, which I cannot comment on.

+1
source

It worked for me.

 df['Dates'].isin(np.array([pd.Timestamp('2013-01-04')]).astype('datetime64[ns]')) 

I know this is a bit verbose. But just in case, you need to make sure that this helps. See https://github.com/pydata/pandas/issues/5021 for more details.

+1
source

Did you try to add 00:00:00 after that? It would be better if you added an entry and added some tags so that people get more of your question and the syntax you use.

0
source

Source: https://habr.com/ru/post/954848/


All Articles