When answering the question Sort a series of panda data by month name? we meet some strange groupby behavior.
df = pd.DataFrame([["dec", 12], ["jan", 40], ["mar", 11], ["aug", 21], ["aug", 11], ["jan", 11], ["jan", 1]], columns=["Month", "Price"]) df["Month_dig"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.month df.sort_values(by="Month_dig", inplace=True) # Now df looks like Month Price Month_dig 1 jan 40 1 5 jan 11 1 6 jan 1 1 2 mar 11 3 3 aug 21 8 4 aug 11 8 0 dec 12 12 total = (df.groupby(df['Month'])['Price'].mean()) print(total) # output Month aug 16.000000 dec 12.000000 jan 17.333333 mar 11.000000 Name: Price, dtype: float64
It appears that in total data is sorted alphabetically. While FP and I were expecting
Month jan 17.333333 mar 11.000000 aug 16.000000 dec 12.000000 Name: Price, dtype: float64
What mechanism is behind groupby ? I know that this keeps order in each group from the documentation, but is there a rule for order among groups ? It seems to me that a fairly simple group order would be ["jan", "mar", "aug", "dec"], since the data in df sorted this way.
ps From ["aug", "dec", "jan", "mar"] it seems that the names of these groups are sorted in alphabetical order.
I am using Python 3.6 and pandas '0.20.3'