I think going down the path of concurrent use attempts is probably overly complicated. I have not tried this approach on a large sample, so your mileage may vary, but it should give you an idea ...
Let's start with some dates ...
import pandas as pd dates = pd.to_datetime(['2016-01-03', '2016-09-09', '2016-12-12', '2016-03-03'])
We will use some rest data from pandas.tseries.holiday
- note that essentially we want DatetimeIndex
...
from pandas.tseries.holiday import USFederalHolidayCalendar holiday_calendar = USFederalHolidayCalendar() holidays = holiday_calendar.holidays('2016-01-01')
This gives us:
DatetimeIndex(['2016-01-01', '2016-01-18', '2016-02-15', '2016-05-30', '2016-07-04', '2016-09-05', '2016-10-10', '2016-11-11', '2016-11-24', '2016-12-26', ... '2030-01-01', '2030-01-21', '2030-02-18', '2030-05-27', '2030-07-04', '2030-09-02', '2030-10-14', '2030-11-11', '2030-11-28', '2030-12-25'], dtype='datetime64[ns]', length=150, freq=None)
Now we find the nearest holiday indices for the source dates using searchsorted
:
indices = holidays.searchsorted(dates) # array([1, 6, 9, 3]) next_nearest = holidays[indices] # DatetimeIndex(['2016-01-18', '2016-10-10', '2016-12-26', '2016-05-30'], dtype='datetime64[ns]', freq=None)
Then take the difference between the two:
next_nearest_diff = pd.to_timedelta(next_nearest.values - dates.values).days
You need to be careful with the indices so that you don't wrap yourself around, and on the previous date do the calculation using indices - 1
, but it should act like (I hope) a relatively good base.