What is the behavior of operations on two pandas series with unique labels?

The main behavior is that it tries to connect the values ​​with the same label. If no label is found, assigned NaN. If the label is not unique left or right (but not at the same time), it exhausts all the possibilities. For instance,

pd.Series((2,3), ("a","b")) * pd.Series((5,7), ("b","b"))

returns:

a     NaN
b    15.0
b    21.0

and

pd.Series((2,3), ("b","b")) * pd.Series((5,7), ("a","b"))

returns

a     NaN
b    14.0
b    21.0

But if the label is not unique left and right, for example,

pd.Series((2,3), ("b","b")) * pd.Series((5,7), ("b","b"))

You get

b    10
b    21

I would prefer that this exhausts all the possibilities, i.e. to return

b    10
b    14
b    15
b    21

What defines a subset of return values? Is this based on line order? If so, what is the reason for this behavior?

Thank you

+4
source share
1 answer

Here is one interesting note:

In [146]: a
Out[146]:
b    2
b    3
a    4
dtype: int64

In [147]: b
Out[147]:
a    2
b    5
b    7
dtype: int64

:

In [148]: a.index
Out[148]: Index(['b', 'b', 'a'], dtype='object')

In [149]: b.index
Out[149]: Index(['a', 'b', 'b'], dtype='object')

:

In [150]: a * b
Out[150]:
a     8
b    10
b    14
b    15
b    21
dtype: int64

, :

In [151]: a.sort_index() * b
Out[151]:
a     8
b    10
b    21
dtype: int64

In [155]: (a.sort_index().index == b.index).all()
Out[155]: True

DataFrame.join() , :

In [128]: a = pd.Series((2,3), ("b","b"))

In [129]: b = pd.Series((5,7), ("b","b"))

In [130]: a.to_frame('a').join(b.to_frame('b')).eval("a * b")
Out[130]:
b    10
b    14
b    15
b    21
dtype: int64
0

Source: https://habr.com/ru/post/1672078/


All Articles