Sort panda data series by month name?

I have a Series object that has:

date price dec 12 may 15 apr 13 .. 

Problem:. I want it to be displayed by month and calculate the average price for each month and present it in a sorted way by month.

Output Required:

  month mean_price Jan XXX Feb XXX Mar XXX 

I thought to create a list and pass it in sorting:

 months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"] 

but sort_values ​​does not support this for rows.

I have one big problem: even if

df = df.sort_values(by='date',ascending=True,inplace=True) works to the initial df , but after I did groupby , it did not maintain the order coming out of the sorted df .

In conclusion, I needed these two columns from the source data. The datetime column is sorted and through the group using the month (dt.strftime ('% B')) the sorting is messed up. Now I need to sort it by the name of the month.


My code is:

 df # has 5 columns though I need the column 'date' and 'price' df.sort_values(by='date',inplace=True) #at this part it is sorted according to date, great total=(df.groupby(df['date'].dt.strftime('%B'))['price'].mean()) # Though now it is not as it was but instead the months appear alphabetically 
+6
source share
6 answers

Thanks @Brad Solomon for suggesting a faster header line method!

Note 1 @Brad Solomon's answer using pd.categorical should save your resources more than my answer. He showed how to assign an order to his categorical data. You must not miss this: P

Alternatively you can use.

 df = pd.DataFrame([["dec", 12], ["jan", 40], ["mar", 11], ["aug", 21], ["aug", 11], ["jan", 11], ["jan", 1]], columns=["Month", "Price"]) # Preprocessing: capitalize `jan`, `dec` to `Jan` and `Dec` df["Month"] = df["Month"].str.capitalize() # Now the dataset should look like # Month Price # ----------- # Dec XX # Jan XX # Apr XX # make it a datetime so that we can sort it: # use %b because the data use the abbriviation of month df["Month"] = pd.to_datetime(df.Month, format='%b', errors='coerce').dt.month df = df.sort_values(by="Month") total = (df.groupby(df['Month"])['Price'].mean()) # total Month 1 17.333333 3 11.000000 8 16.000000 12 12.000000 

Note 2 groupby will sort the group keys for you by default. Keep in mind to use the same key for sorting and grouping in df = df.sort_values(by=SAME_KEY) and total = (df.groupby(df[SAME_KEY])['Price'].mean()). Otherwise, unintentional behavior may occur. See Group save order among groups? In what format? for more information.

Note 3 A more efficient way of computing is to first calculate and then sort by month. Thus, you only need to sort 12 elements, not just the df . This will reduce the computational cost if you do not need to sort df .

Note 4 People already have a month as an index , and wonder how to categorize it, see pandas. CategoricalIndex @jezrael has a working example for ordering a categorical index in Pandas series sorting by month index

+2
source

You can use categorical data for proper sorting:

 months = ["Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"] df['months'] = pd.Categorical(df['months'], categories=months, ordered=True) df.sort_values(...) # same as you have now; can use inplace=True 

When you specify categories, pandas remembers the spec order as the default sort order.

Documents: Pandas Categories> Sort & Order .

+10
source

I would use the calender and reindex :

series.str.capitalize helps to use capitalization in the series, then we create a dictionary with the calender module and map with the series to get the month number.

Having received the month number, we can sort_values() and get the index. Then reindex .

 import calendar df.date=df.date.str.capitalize() #capitalizes the series d={i:e for e,i in enumerate(calendar.month_abbr)} #creates a dictionary #d={i[:3]:e for e,i in enumerate(calendar.month_name)} df.reindex(df.date.map(d).sort_values().index) #map + sort_values + reindex with index 

  date price 2 Apr 13 1 May 15 0 Dec 12 
+1
source

use the Sort_Dataframeby_Month function to sort month names in chronological order

You need to install packages.

 $ pip install sorted-months-weekdays $ pip install sort-dataframeby-monthorweek 

Example:

 from sorted_months_weekdays import * from sort_dataframeby_monthorweek import * df = pd.DataFrame([['Jan',23],['Jan',16],['Dec',35],['Apr',79],['Mar',53],['Mar',12],['Feb',3]], columns=['Month','Sum']) df Out[11]: Month Sum 0 Jan 23 1 Jan 16 2 Dec 35 3 Apr 79 4 Mar 53 5 Mar 12 6 Feb 3 

To sort the data per month by month, use the function

 Sort_Dataframeby_Month(df=df,monthcolumnname='Month') Out[14]: Month Sum 0 Jan 23 1 Jan 16 2 Feb 3 3 Mar 53 4 Mar 12 5 Apr 79 6 Dec 35 
0
source

You can add the numeric value of the month along with the name in the index (ie, β€œJanuary 01”), sort and then remove the number:

 total=(df.groupby(df['date'].dt.strftime('%m %B'))['price'].mean()).sort_index() 

It might look like this:

 01 January xxx 02 February yyy 03 March zzz 04 April ttt total.index = [ x.split()[1] for x in total.index ] January xxx February yyy March zzz April ttt 
0
source

You should consider re-indexing based on the 0 axis (indexes)

 new_order = ['January', 'February', 'March', 'April', 'May', 'June', 'July', 'August', 'September', 'October', 'November', 'December'] df1 = df.reindex(new_order, axis=0) 
0
source

Source: https://habr.com/ru/post/1274411/


All Articles