Pandas formatting / output python.describe ()

Question

Pandas formatting / output python.describe ()

I am trying to get the .describe() function for output in a reformatted way. Here is the csv data ( testProp.csv )

 'name','prop' A,1 A,2 B, 4 A, 3 B, 5 B, 2

when I type the following:

 from pandas import * data = read_csv('testProp.csv') temp = data.groupby('name')['prop'].describe() temp.to_csv('out.csv')

output:

 name A count 3.000000 mean 2.000000 std 1.000000 min 1.000000 25% 1.500000 50% 2.000000 75% 2.500000 max 3.000000 B count 3.000000 mean 3.666667 std 1.527525 min 2.000000 25% 3.000000 50% 4.000000 75% 4.500000 max 5.000000 dtype: float64

However, I need the data in the format below. I tried transpose() and would like to use describe() and manipulate it instead of a .agg([np.mean(), np.max(), etc.... ) :

  count mean std min 25% 50% 75% max A 3 2 1 1 1.5 2 2.5 3 B 3 3.666666667 1.527525232 2 3 4 4.5 5

+5

python pandas formatting output describe

Mike Sep 29 '15 at 4:04

source share

2 answers

In pandas v0.22 you can use the debug function. Based on @Kumar's answer above, you can use the pandas stack / unstack function and play with it.

 from io import StringIO import pandas as pd df = pd.read_csv(StringIO("""name,prop A,1 A,2 B, 4 A, 3 B, 5 B, 2""")) df.shape df temp = df.groupby(['name'])['prop'].describe() temp temp.stack() #unstack(),unstack(level=-1) level can be -1, 0

See the pandas unstack documentation for more information .

0

Vitalis Mar 08 '18 at 13:01

source share

Anand s kumar · Accepted Answer · 2015-09-29T04:23:34+0000

One way to do this is to first do .reset_index() , reset the index for your temp DataFrame and then use DataFrame.pivot as you like. Example -

 In [24]: df = pd.read_csv(io.StringIO("""name,prop ....: A,1 ....: A,2 ....: B, 4 ....: A, 3 ....: B, 5 ....: B, 2""")) In [25]: temp = df.groupby('name')['prop'].describe().reset_index() In [26]: newdf = temp.pivot(index='name',columns='level_1',values=0) In [27]: newdf.columns.name = '' #This is needed so that the name of the columns is not `'level_1'` . In [28]: newdf Out[28]: 25% 50% 75% count max mean min std name A 1.5 2 2.5 3 3 2.000000 1 1.000000 B 3.0 4 4.5 3 5 3.666667 2 1.527525

Then you can save this newdf to csv.

Pandas formatting / output python.describe ()

More articles: