Pandas Pivot Table: Order and Intermediate Columns

I am using Pandas 0.19.

Given the following data frame:

FID admin0 admin1 admin2 windspeed population 0 cntry1 state1 city1 60km/h 700 1 cntry1 state1 city1 90km/h 210 2 cntry1 state1 city2 60km/h 100 3 cntry1 state2 city3 60km/h 70 4 cntry1 state2 city4 60km/h 180 5 cntry1 state2 city4 90km/h 370 6 cntry2 state3 city5 60km/h 890 7 cntry2 state3 city6 60km/h 120 8 cntry2 state3 city6 90km/h 420 9 cntry2 state3 city6 120km/h 360 10 cntry2 state4 city7 60km/h 740 

How to create a table like this?

  population 60km/h 90km/h 120km/h admin0 admin1 admin2 cntry1 state1 city1 700 210 0 cntry1 state1 city2 100 0 0 cntry1 state2 city3 70 0 0 cntry1 state2 city4 180 370 0 cntry2 state3 city5 890 0 0 cntry2 state3 city6 120 420 360 cntry2 state4 city7 740 0 0 

I tried with the following pivot table:

 table = pd.pivot_table(df,index=["admin0","admin1","admin2"], columns=["windspeed"], values=["population"],fill_value=0) 

In general, it works fine, but, unfortunately, I cannot sort the new columns in the correct order: a column of 120 km / h appears in front of those that go at 60 km / h and 90 km / h. How to specify the order of new columns?

In addition, as a second step, I need to add subtotals for both admin0 and admin1. Ideally, the table I need should be like this:

  population 60km/h 90km/h 120km/h admin0 admin1 admin2 cntry1 state1 city1 700 210 0 cntry1 state1 city2 100 0 0 SUM state1 800 210 0 cntry1 state2 city3 70 0 0 cntry1 state2 city4 180 370 0 SUM state2 250 370 0 SUM cntry1 1050 580 0 cntry2 state3 city5 890 0 0 cntry2 state3 city6 120 420 360 SUM state3 1010 420 360 cntry2 state4 city7 740 0 0 SUM state4 740 0 0 SUM cntry2 1750 420 360 SUM ALL 2800 1000 360 
+5
source share
2 answers

Solution with subtotals and MultiIndex.from_arrays . Last concat and all Dataframes , sort_index and add all sum :

 #replace km/h and convert to int df.windspeed = df.windspeed.str.replace('km/h','').astype(int) print (df) FID admin0 admin1 admin2 windspeed population 0 0 cntry1 state1 city1 60 700 1 1 cntry1 state1 city1 90 210 2 2 cntry1 state1 city2 60 100 3 3 cntry1 state2 city3 60 70 4 4 cntry1 state2 city4 60 180 5 5 cntry1 state2 city4 90 370 6 6 cntry2 state3 city5 60 890 7 7 cntry2 state3 city6 60 120 8 8 cntry2 state3 city6 90 420 9 9 cntry2 state3 city6 120 360 10 10 cntry2 state4 city7 60 740 #pivoting table = pd.pivot_table(df, index=["admin0","admin1","admin2"], columns=["windspeed"], values=["population"], fill_value=0) print (table) population windspeed 60 90 120 admin0 admin1 admin2 cntry1 state1 city1 700 210 0 city2 100 0 0 state2 city3 70 0 0 city4 180 370 0 cntry2 state3 city5 890 0 0 city6 120 420 360 state4 city7 740 0 0 
 #groupby and create sum dataframe by levels 0,1 df1 = table.groupby(level=[0,1]).sum() df1.index = pd.MultiIndex.from_arrays([df1.index.get_level_values(0), df1.index.get_level_values(1)+ '_sum', len(df1.index) * ['']]) print (df1) population windspeed 60 90 120 admin0 cntry1 state1_sum 800 210 0 state2_sum 250 370 0 cntry2 state3_sum 1010 420 360 state4_sum 740 0 0 df2 = table.groupby(level=0).sum() df2.index = pd.MultiIndex.from_arrays([df2.index.values + '_sum', len(df2.index) * [''], len(df2.index) * ['']]) print (df2) population windspeed 60 90 120 cntry1_sum 1050 580 0 cntry2_sum 1750 420 360 #concat all dataframes together, sort index df = pd.concat([table, df1, df2]).sort_index(level=[0]) 
 #add km/h to second level in columns df.columns = pd.MultiIndex.from_arrays([df.columns.get_level_values(0), df.columns.get_level_values(1).astype(str) + 'km/h']) #add all sum df.loc[('All_sum','','')] = table.sum().values print (df) population 60km/h 90km/h 120km/h admin0 admin1 admin2 cntry1 state1 city1 700 210 0 city2 100 0 0 state1_sum 800 210 0 state2 city3 70 0 0 city4 180 370 0 state2_sum 250 370 0 cntry1_sum 1050 580 0 cntry2 state3 city5 890 0 0 city6 120 420 360 state3_sum 1010 420 360 state4 city7 740 0 0 state4_sum 740 0 0 cntry2_sum 1750 420 360 All_sum 2800 1000 360 

EDIT by comments:

 def f(x): print (x) if (len(x) > 1): return x.sum() df1 = table.groupby(level=[0,1]).apply(f).dropna(how='all') df1.index = pd.MultiIndex.from_arrays([df1.index.get_level_values(0), df1.index.get_level_values(1)+ '_sum', len(df1.index) * ['']]) print (df1) population windspeed 60 90 120 admin0 cntry1 state1_sum 800.0 210.0 0.0 state2_sum 250.0 370.0 0.0 cntry2 state3_sum 1010.0 420.0 360.0 
+4
source

you can do this using reindex () method and custom sorting:

 In [26]: table Out[26]: population windspeed 120km/h 60km/h 90km/h admin0 admin1 admin2 cntry1 state1 city1 0 700 210 city2 0 100 0 state2 city3 0 70 0 city4 0 180 370 cntry2 state3 city5 0 890 0 city6 360 120 420 state4 city7 0 740 0 In [27]: cols = sorted(table.columns.tolist(), key=lambda x: int(x[1].replace('km/h',''))) In [28]: cols Out[28]: [('population', '60km/h'), ('population', '90km/h'), ('population', '120km/h')] In [29]: table = table.reindex(columns=cols) In [30]: table Out[30]: population windspeed 60km/h 90km/h 120km/h admin0 admin1 admin2 cntry1 state1 city1 700 210 0 city2 100 0 0 state2 city3 70 0 0 city4 180 370 0 cntry2 state3 city5 890 0 0 city6 120 420 360 state4 city7 740 0 0 
+4
source

Source: https://habr.com/ru/post/1257959/


All Articles