Why the first and last time in the group does not give me the first and last

I am posting this because the topic just grew up in another question / answer and the behavior is not very well documented.

Consider a df data block

 df = pd.DataFrame(dict( A=list('xxxyyy'), B=[np.nan, 1, 2, 3, 4, np.nan] )) AB 0 x NaN 1 x 1.0 2 x 2.0 3 y 3.0 4 y 4.0 5 y NaN 

I wanted to get the first and last rows of each group, defined by column 'A' .

I tried

 df.groupby('A').B.agg(['first', 'last']) first last A x 1.0 2.0 y 3.0 4.0 

However, this does not give me the np.NaN that I was expecting.

How to get the actual first and last values ​​in each group?

+5
source share
2 answers

One option is to use the .nth method:

 >>> gb = df.groupby('A') >>> gb.nth(0) B A x NaN y 3.0 >>> gb.nth(-1) B A x 2.0 y NaN >>> 

However, I did not find a way to accumulate them neatly. Of course, you can always use the pd.DataFrame constructor:

 >>> pd.DataFrame({'first':gb.B.nth(0), 'last':gb.B.nth(-1)}) first last A x NaN 2.0 y 3.0 NaN 

Note. I explicitly used the gb.B attribute, otherwise you should use .squeeze

+6
source

As @unutbu noted here :

groupby.first and groupby.last return the first and last nonzero values, respectively.

To get the actual first and last values, do:

 def h(x): return x.values[0] def t(x): return x.values[-1] df.groupby('A').B.agg([h, t]) ht A x NaN 2.0 y 3.0 NaN 
+5
source

Source: https://habr.com/ru/post/1270978/


All Articles