In Pandas, why does the β€œkey” column of the group disappear in this scenario?

I have the following code ... which for some reason causes the key column to disappear. I also noticed that once a key column "accidentally" disappears. I am trying to isolate cases, this is one.

I am using pandas version 0.20.1

DF = pd.DataFrame([['a', 1], ['b', 2], ['b', 3]], columns = ['G', 'N']) groupByObj = DF.groupby('G') print groupByObj.get_group('b') groupByObj.sum() print groupByObj.get_group('b') 

The first print groupByObj.get_group('b') results in:

  GN 1 b 2 2 b 3 

The second print groupByObj.get_group('b') results in:

  N 1 2 2 3 

Why does the column "key" ("G") disappear after running groupByObj.sum()

+5
source share
1 answer

This is a bug in Pandas, discussed in:

The latter is still open.

From reading bits on GitHub and, as mentioned in the comments, it seems that the second output is the desired behavior and was obtained in the case of sum by adding the following line to pandas.core.groupby._GroupBy#_set_group_selection :

 self._reset_cache('_selected_obj') 

Since this reset occurs when sum (and several other functions) is called, this G column is still displayed on the first call to get_group . BTW - reset also fails when calling mean and several other functions. It seems that this error is a bit more complete than the thought, and was not solved by a simple reset cache.

+1
source

Source: https://habr.com/ru/post/1270695/


All Articles