Search for mean and standard deviation of a timedelta object in pandas df

I would like to calculate mean and standard deviation timedelta for the banks from the dataframe with the two columns shown below. When I run the code (also shown below), I get the following error:

 pandas.core.base.DataError: No numeric types to aggregate 

My data frame:

  bank diff Bank of Japan 0 days 00:00:57.416000 Reserve Bank of Australia 0 days 00:00:21.452000 Reserve Bank of New Zealand 55 days 12:39:32.269000 US Federal Reserve 8 days 13:27:11.387000 

My code is:

 means = dropped.groupby('bank').mean() std = dropped.groupby('bank').std() 
+12
source share
4 answers

You need to convert timedelta to some numerical value, for example int64 by values which are the most accurate, because the conversion to ns is what is the numerical representation of timedelta :

 dropped['new'] = dropped['diff'].values.astype(np.int64) means = dropped.groupby('bank').mean() means['new'] = pd.to_timedelta(means['new']) std = dropped.groupby('bank').std() std['new'] = pd.to_timedelta(std['new']) 

Another solution is to convert the values ​​to seconds by total_seconds , but this is less accurate:

 dropped['new'] = dropped['diff'].dt.total_seconds() means = dropped.groupby('bank').mean() 
+12
source

No need to convert timedelta back and forth. Numpy and pandas can easily do this for you with faster lead times. Using dropped DataFrame :

 import numpy as np grouped = dropped.groupby('bank')['diff'] mean = grouped.apply(lambda x: np.mean(x)) std = grouped.apply(lambda x: np.std(x)) 
+4
source

Pandas mean() and other aggregation methods support the parameter numeric_only=False .

 dropped.groupby('bank').mean(numeric_only=False) 

Found here: Aggregations for Timedelta values ​​in Python DataFrame

+4
source

I would suggest passing the argument numeric_only=False to mean mentioned by Alexander Usikov - this works for pandas version 0. 20+.

If you have an older version, the following works:

 import pandas pd df = pd.DataFrame({ 'td': pd.Series([pd.Timedelta(days=i) for i in range(5)]), 'group': ['a', 'a', 'a', 'b', 'b'] }) ( df .astype({'td': int}) # convert timedelta to integer (nanoseconds) .groupby('group') .mean() .astype({'td': 'timedelta64[ns]'}) ) 
0
source

Source: https://habr.com/ru/post/1268955/


All Articles