Pandas AVG () function between two date columns

You have df:

  Client  Status Dat_Start   Dat_End
   1       A     2015-01-01  2015-01-19 
   1       B     2016-01-01  2016-02-02
   1       A     2015-02-12  2015-02-20
   1       B     2016-01-30  2016-03-01

I would like to get the average between two dates (Dat_end and Dat_Start) for grouping Status = 'A' by client column using Pandas syntax.

So this will be smth SQL-like:

Select Client, AVG (Dat_end-Dat_Start) as Date_Diff 
from Table
where Status='A'
Group by Client

Thank!

+4
source share
2 answers

Calculate timedeltas:

df['duration'] = df.Dat_End-df.Dat_Start

df
Out[92]: 
   Client Status  Dat_Start    Dat_End  duration
0       1      A 2015-01-01 2015-01-19   18 days
1       1      B 2016-01-01 2016-02-02   32 days
2       1      A 2015-02-12 2015-02-20    8 days
3       1      B 2016-01-30 2016-03-01   31 days

Filter and request the amount and quantity for pandas <0.20:

df[df.Status=='A'].groupby('Client').duration.agg(['sum', 'count'])
Out[98]: 
           sum  count
Client               
1      26 days      2

For the upcoming pandas 0.20 cm. The average value is added to the group here for timedeltas. This will work:

df[df.Status=='A'].groupby('Client').duration.mean()
+2
source
In [10]: df.loc[df.Status == 'A'].groupby('Client') \
           .apply(lambda x: (x.Dat_End-x.Dat_Start).mean()).reset_index()
Out[10]:
   Client       0
0       1 13 days
+1
source

Source: https://habr.com/ru/post/1669398/


All Articles