I have a seemingly simple task. A dataframe with 2 columns: A and B. If the values ββin B are greater than the values ββin - replace these values ββwith the values ββof A. I used this by running df.B[df.B > df.A] = df.A , however, the recent update to pandas give SettingWithCopyWarning when faced with this related assignment. The official documentation recommends using .loc .
Ok, I said, and did it through df.loc[df.B > df.A, 'B'] = df.A , and everything works fine if column B does not have all the NaN values. Then something strange happens:
In [1]: df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, np.NaN, np.NaN]}) In [2]: df Out[2]: AB 0 1 NaN 1 2 NaN 2 3 NaN In [3]: df.loc[df.B > df.A, 'B'] = df.A In [4]: df Out[4]: AB 0 1 -9223372036854775808 1 2 -9223372036854775808 2 3 -9223372036854775808
Now, if even one of the elements of B satisfies the condition (more than A), then everything works fine:
In [1]: df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, 4, np.NaN]}) In [2]: df Out[2]: AB 0 1 NaN 1 2 4 2 3 NaN In [3]: df.loc[df.B > df.A, 'B'] = df.A In [4]: df Out[4]: AB 0 1 NaN 1 2 2 2 3 NaN
But if none of the Bs elements satisfies, then all -9223372036854775808 are replaced with -9223372036854775808 :
In [1]: df = pd.DataFrame({'A':[1,2,3],'B':[np.NaN,1,np.NaN]}) In [2]: df Out[2]: AB 0 1 NaN 1 2 1 2 3 NaN In [3]: df.loc[df.B > df.A, 'B'] = df.A In [4]: df Out[4]: AB 0 1 -9223372036854775808 1 2 1 2 3 -9223372036854775808
Is this a bug or function? How do I make this replacement?
Thanks!