Create a Pandas framework with counting elements spanning a date range

I have a DF that has two dates of interest that look something like this:

LIST_DATE END_DATE 2000-04-18 2000-05-17 00:00:00 2000-05-18 2000-09-18 00:00:00 2000-04-18 2001-06-07 00:00:00 

And I created a month-by-month index table of the "montot" period, which currently only has the month and year index

 <class 'pandas.tseries.period.PeriodIndex'> freq: M [1999-01, ..., 2013-07] 

What I want to do is for each month in the second "montot" table, count the elements in the first table that fall into time periods (these are active ads by months) and add this field to the table ... so, for example, 1 the 1st point in the 1st table will be counted 1 in the 4th month and once a month 5, and the second point will be counted once a month 5-month 9th, etc. with the monthly amount recorded in the new table / field. So I will have a table

 Month active 1/1999 5 2/1999 8 

etc. Not sure how to approach it using Pandas / Python ...

+4
source share
1 answer

Here's one way to do this, first value_counts periods in each date column (using the to_period Timestamp method)

 In [11]: p = pd.PeriodIndex(freq='m', start='2000-1', periods=18) In [12]: starts = df['LIST_DATE'].apply(lambda t: t.to_period(freq='m')).value_counts() In [13]: ends = df['END_DATE'].apply(lambda t: t.to_period(freq='m')).value_counts() 

Reindex them with PeriodIndex, fill in NaNs (so you can subtract), and the cumulative launch started with accumulated completion to give you the current asset:

 In [14]: starts.reindex(p).fillna(0).cumsum() - ends.reindex(p).fillna(0).cumsum() Out[14]: 2000-01 0 2000-02 0 2000-03 0 2000-04 2 2000-05 2 2000-06 2 2000-07 2 2000-08 2 2000-09 1 2000-10 1 2000-11 1 2000-12 1 2001-01 1 2001-02 1 2001-03 1 2001-04 1 2001-05 1 2001-06 0 Freq: M, dtype: float64 

An alternative final step is to create a DataFrame (which initially tracks the changes, so the launch is positive and ends in negative):

 In [21]: current = pd.DataFrame({'starts': starts, 'ends': -ends}, p) In [22]: current Out[22]: ends starts 2000-01 NaN NaN 2000-02 NaN NaN 2000-03 NaN NaN 2000-04 NaN 2 2000-05 -1 1 2000-06 NaN NaN 2000-07 NaN NaN 2000-08 NaN NaN 2000-09 -1 NaN 2000-10 NaN NaN 2000-11 NaN NaN 2000-12 NaN NaN 2001-01 NaN NaN 2001-02 NaN NaN 2001-03 NaN NaN 2001-04 NaN NaN 2001-05 NaN NaN 2001-06 -1 NaN In [23]: current.fillna(0) Out[23]: ends starts 2000-01 0 0 2000-02 0 0 2000-03 0 0 2000-04 0 2 2000-05 -1 1 2000-06 0 0 2000-07 0 0 2000-08 0 0 2000-09 -1 0 2000-10 0 0 2000-11 0 0 2000-12 0 0 2001-01 0 0 2001-02 0 0 2001-03 0 0 2001-04 0 0 2001-05 0 0 2001-06 -1 0 

Cumsum tracks the current results of launches and ends to this point:

 In [24]: current.fillna(0).cumsum() Out[24]: ends starts 2000-01 0 0 2000-02 0 0 2000-03 0 0 2000-04 0 2 2000-05 -1 3 2000-06 -1 3 2000-07 -1 3 2000-08 -1 3 2000-09 -2 3 2000-10 -2 3 2000-11 -2 3 2000-12 -2 3 2001-01 -2 3 2001-02 -2 3 2001-03 -2 3 2001-04 -2 3 2001-05 -2 3 2001-06 -3 3 

And summing up these columns together, gives those that are currently active, and has the same result as above:

 In [25]: current.fillna(0).cumsum().sum(1) 
+6
source

Source: https://habr.com/ru/post/1501948/


All Articles