Pandas DataFrame with a tuple of rows as an index

I feel some strange pandas behavior here. I have a dataframe that looks like

 df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'], index=[('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')]) In [14]: df Out[14]: Col 1 Col 2 Col 3 (1, a) NaN NaN NaN (2, a) NaN NaN NaN (1, b) NaN NaN NaN (2, b) NaN NaN NaN 

I can set the value of an arbitrary element

 In [15]: df['Col 2'].loc[('1', 'b')] = 6 In [16]: df Out[16]: Col 1 Col 2 Col 3 (1, a) NaN NaN NaN (2, a) NaN NaN NaN (1, b) NaN 6 NaN (2, b) NaN NaN NaN 

But when I go to the link to an element that I just installed using the same syntax, I get

 In [17]: df['Col 2'].loc[('1', 'b')] KeyError: 'the label [1] is not in the [index]' 

Can someone tell me what I'm doing wrong or why is this happening? Am I just not allowed to set the index as a multi-element tuple?

Edit

It seems that wrapping the tuple index on the list works.

 In [38]: df['Col 2'].loc[[('1', 'b')]] Out[38]: (1, b) 6 Name: Col 2, dtype: object 

Despite the fact that I'm still getting some kind of strange behavior in my actual use case, it would be nice to find out if this is recommended.

+6
source share
1 answer

Your tuple in parentheses is considered a sequence containing the elements you want to receive. This is how you would pass ['1', 'b'] as an argument. Thus, the KeyError: pandas message tries to find the key '1' and obviously does not find it.

That's why it works when you add extra brackets, since now the argument becomes a sequence of one element - your tuple.

You should avoid the ambiguity around list arguments and tuple when choosing. The behavior may also differ depending on whether the index is a simple index or a multi-index.

In any case, if you ask about the recommendations here, I see that you should try not to create simple indexes made from tuples: pandas will work better and will be more powerful if you really create multiindex:

 df = pd.DataFrame(columns=['Col 1', 'Col 2', 'Col 3'], index=pd.MultiIndex.from_tuples([('1', 'a'), ('2', 'a'), ('1', 'b'), ('2', 'b')])) df['Col 2'].loc[('1', 'b')] = 6 df['Col 2'].loc[('1', 'b')] Out[13]: 6 df Out[14]: Col 1 Col 2 Col 3 1 a NaN NaN NaN 2 a NaN NaN NaN 1 b NaN 6 NaN 2 b NaN NaN NaN 
+4
source

Source: https://habr.com/ru/post/1011571/


All Articles