In pandas, how can I get a DataFrame as output when I summarize a DataFrame

Question

In pandas, how can I get a DataFrame as output when I summarize a DataFrame

While I am summing up a DataFrame , it returns Series :

 In [1]: import pandas as pd In [2]: df = pd.DataFrame([[1, 2, 3], [2, 3, 3]], columns=['a', 'b', 'c']) In [3]: df Out[3]: abc 0 1 2 3 1 2 3 3 In [4]: s = df.sum() In [5]: type(s) Out[5]: pandas.core.series.Series

I know that I can build a new DataFrame this Series . But is there still a “pandasic” way?

+4

python pandas dataframe

waitingkuo May 09 '13 at 10:05

source share

3 answers

Andy hayden · Answer 1 · 2013-05-10T23:38:30+0000

I am going to go further and say ... "No", I don’t think there is a direct way to do this, the pandastic way (and the pythonic one too) should be explicit:

 pd.DataFrame(df.sum(), columns=['sum'])

or, more elegantly, using a dictionary (remember that this copies the summed array):

 pd.DataFrame({'sum': df.sum()})

As @root notes that it uses faster:

 pd.DataFrame(np.sum(df.values, axis=0), columns=['sum'])

(Since zen of python claims: "practicality is superior to cleanliness," so if you care about this time, use this.)

However, perhaps the most pandastic way is to simply use the series! :)

.

A bit of %timeit for your tiny example:

 In [11]: %timeit pd.DataFrame(df.sum(), columns=['sum']) 1000 loops, best of 3: 356 us per loop In [12]: %timeit pd.DataFrame({'sum': df.sum()}) 1000 loops, best of 3: 462 us per loop In [13]: %timeit pd.DataFrame(np.sum(df.values, axis=0), columns=['sum']) 1000 loops, best of 3: 205 us per loop

and for a bit more:

 In [21]: df = pd.DataFrame(np.random.randn(100000, 3), columns=list('abc')) In [22]: %timeit pd.DataFrame(df.sum(), columns=['sum']) 100 loops, best of 3: 7.99 ms per loop In [23]: %timeit pd.DataFrame({'sum': df.sum()}) 100 loops, best of 3: 8.3 ms per loop In [24]: %timeit pd.DataFrame(np.sum(df.values, axis=0), columns=['sum']) 100 loops, best of 3: 2.47 ms per loop

ns63sr · Answer 2 · 2017-09-08T10:38:38+0000

I'm not sure about earlier versions, but with pandas 0.18.1 you can use the pandas.Series.to_frame method.

 import pandas as pd df = pd.DataFrame([[1, 2, 3], [2, 3, 3]], columns=['a', 'b', 'c']) s = df.sum().to_frame(name='sum') type(s) >>> pandas.core.frame.DataFrame

The name argument is optional and specifies the column name.

Jack straw · Answer 3 · 2018-02-13T02:27:35+0000

df.sum().to_frame() should do what you want.

See https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.to_frame.html .

In pandas, how can I get a DataFrame as output when I summarize a DataFrame

More articles: