Search for mean and standard deviation of a timedelta object in pandas df

Question

Search for mean and standard deviation of a timedelta object in pandas df

I would like to calculate mean and standard deviation timedelta for the banks from the dataframe with the two columns shown below. When I run the code (also shown below), I get the following error:

 pandas.core.base.DataError: No numeric types to aggregate

My data frame:

  bank diff Bank of Japan 0 days 00:00:57.416000 Reserve Bank of Australia 0 days 00:00:21.452000 Reserve Bank of New Zealand 55 days 12:39:32.269000 US Federal Reserve 8 days 13:27:11.387000

My code is:

 means = dropped.groupby('bank').mean() std = dropped.groupby('bank').std()

+12

python pandas datetime timedelta mean

Graham streich Jun 18 '17 at 15:24

source share

4 answers

No need to convert timedelta back and forth. Numpy and pandas can easily do this for you with faster lead times. Using dropped DataFrame :

 import numpy as np grouped = dropped.groupby('bank')['diff'] mean = grouped.apply(lambda x: np.mean(x)) std = grouped.apply(lambda x: np.std(x))

+4

Wesam Aug 10 '18 at 21:12

source share

Pandas mean() and other aggregation methods support the parameter numeric_only=False .

 dropped.groupby('bank').mean(numeric_only=False)

Found here: Aggregations for Timedelta values in Python DataFrame

+4

Alexander Usikov Mar 10 '19 at 15:33

source share

I would suggest passing the argument numeric_only=False to mean mentioned by Alexander Usikov - this works for pandas version 0. 20+.

If you have an older version, the following works:

 import pandas pd df = pd.DataFrame({ 'td': pd.Series([pd.Timedelta(days=i) for i in range(5)]), 'group': ['a', 'a', 'a', 'b', 'b'] }) ( df .astype({'td': int}) # convert timedelta to integer (nanoseconds) .groupby('group') .mean() .astype({'td': 'timedelta64[ns]'}) )

0

Cor Jul 11 '19 at 2:33 pm

source share

jezrael · Accepted Answer · 2017-06-18T15:29:32+0000

You need to convert timedelta to some numerical value, for example int64 by values which are the most accurate, because the conversion to ns is what is the numerical representation of timedelta :

 dropped['new'] = dropped['diff'].values.astype(np.int64) means = dropped.groupby('bank').mean() means['new'] = pd.to_timedelta(means['new']) std = dropped.groupby('bank').std() std['new'] = pd.to_timedelta(std['new'])

Another solution is to convert the values to seconds by total_seconds , but this is less accurate:

 dropped['new'] = dropped['diff'].dt.total_seconds() means = dropped.groupby('bank').mean()

Search for mean and standard deviation of a timedelta object in pandas df

More articles: