Replace values ​​in dataframe column based on condition

I have a seemingly simple task. A dataframe with 2 columns: A and B. If the values ​​in B are greater than the values ​​in - replace these values ​​with the values ​​of A. I used this by running df.B[df.B > df.A] = df.A , however, the recent update to pandas give SettingWithCopyWarning when faced with this related assignment. The official documentation recommends using .loc .

Ok, I said, and did it through df.loc[df.B > df.A, 'B'] = df.A , and everything works fine if column B does not have all the NaN values. Then something strange happens:

 In [1]: df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, np.NaN, np.NaN]}) In [2]: df Out[2]: AB 0 1 NaN 1 2 NaN 2 3 NaN In [3]: df.loc[df.B > df.A, 'B'] = df.A In [4]: df Out[4]: AB 0 1 -9223372036854775808 1 2 -9223372036854775808 2 3 -9223372036854775808 

Now, if even one of the elements of B satisfies the condition (more than A), then everything works fine:

 In [1]: df = pd.DataFrame({'A': [1, 2, 3],'B': [np.NaN, 4, np.NaN]}) In [2]: df Out[2]: AB 0 1 NaN 1 2 4 2 3 NaN In [3]: df.loc[df.B > df.A, 'B'] = df.A In [4]: df Out[4]: AB 0 1 NaN 1 2 2 2 3 NaN 

But if none of the Bs elements satisfies, then all -9223372036854775808 are replaced with -9223372036854775808 :

 In [1]: df = pd.DataFrame({'A':[1,2,3],'B':[np.NaN,1,np.NaN]}) In [2]: df Out[2]: AB 0 1 NaN 1 2 1 2 3 NaN In [3]: df.loc[df.B > df.A, 'B'] = df.A In [4]: df Out[4]: AB 0 1 -9223372036854775808 1 2 1 2 3 -9223372036854775808 

Is this a bug or function? How do I make this replacement?

Thanks!

+5
source share
1 answer

This is a buggy, fixed here .

Since pandas allows you to basically set something on the right side of the expression in loc, there are probably 10 cases that need to be fixed. To give you an idea:

 df.loc[lhs, column] = rhs 

where rhs can be: list,array,scalar , and lhs can be: slice,tuple,scalar,array

and a small subset of cases where the resulting dtype of a column should be output / set according to rhs. (This is a bit complicated). For example, let's say that you do not set all the elements to lhs, and this is an integer, then you need to force swim. But if you installed all the AND elements, then rhs was an integer, then you need to force BACK into an integer.

In this, this is a special case, and LHS is an array, so we usually try to force LHS to the type of ARI, but in this case it degenerates if we have an unsafe conversion (Int β†’ float) p>

Suffice it to say that this was the missing edge.

+7
source

Source: https://habr.com/ru/post/1205740/


All Articles