What is the behavior of operations on two pandas series with unique labels?

Question

What is the behavior of operations on two pandas series with unique labels?

The main behavior is that it tries to connect the values with the same label. If no label is found, assigned NaN. If the label is not unique left or right (but not at the same time), it exhausts all the possibilities. For instance,

pd.Series((2,3), ("a","b")) * pd.Series((5,7), ("b","b"))

returns:

a     NaN
b    15.0
b    21.0

and

pd.Series((2,3), ("b","b")) * pd.Series((5,7), ("a","b"))

returns

a     NaN
b    14.0
b    21.0

But if the label is not unique left and right, for example,

pd.Series((2,3), ("b","b")) * pd.Series((5,7), ("b","b"))

You get

b    10
b    21

I would prefer that this exhausts all the possibilities, i.e. to return

What defines a subset of return values? Is this based on line order? If so, what is the reason for this behavior?

Thank you

+4

python pandas

Denziloe Mar 12 '17 at 12:41

source share

1 answer

MaxU · Answer 1 · 2017-03-12T11:18:58+0000

Here is one interesting note:

In [146]: a
Out[146]:
b    2
b    3
a    4
dtype: int64

In [147]: b
Out[147]:
a    2
b    5
b    7
dtype: int64

:

In [148]: a.index
Out[148]: Index(['b', 'b', 'a'], dtype='object')

In [149]: b.index
Out[149]: Index(['a', 'b', 'b'], dtype='object')

:

In [150]: a * b
Out[150]:
a     8
b    10
b    14
b    15
b    21
dtype: int64

, :

In [151]: a.sort_index() * b
Out[151]:
a     8
b    10
b    21
dtype: int64

In [155]: (a.sort_index().index == b.index).all()
Out[155]: True

DataFrame.join() , :

In [128]: a = pd.Series((2,3), ("b","b"))

In [129]: b = pd.Series((5,7), ("b","b"))

In [130]: a.to_frame('a').join(b.to_frame('b')).eval("a * b")
Out[130]:
b    10
b    14
b    15
b    21
dtype: int64

What is the behavior of operations on two pandas series with unique labels?

More articles: