I don't think there is a quick, one-liner way to do this, but I believe the best approach is to do it this way:
As your data grows in size, this โnear dateโ operation can become quite expensive if you don't do something complicated. I like to use scikit-learn NearestNeighbor for this kind of thing.
I have put together one approach to this solution that should scale relatively well. First, we can generate some simple data:
import pandas as pd import numpy as np dates = pd.date_range('2015', periods=200, freq='D') rand = np.random.RandomState(42) i1 = np.sort(rand.permutation(np.arange(len(dates)))[:5]) i2 = np.sort(rand.permutation(np.arange(len(dates)))[:5]) df1 = pd.DataFrame({'Code': rand.randint(0, 2, 5), 'Date': dates[i1], 'val1':rand.rand(5)}) df2 = pd.DataFrame({'Code': rand.randint(0, 2, 5), 'Date': dates[i2], 'val2':rand.rand(5)})
Let it choose:
>>> df1 Code Date val1 0 0 2015-01-16 0.975852 1 0 2015-01-31 0.516300 2 1 2015-04-06 0.322956 3 1 2015-05-09 0.795186 4 1 2015-06-08 0.270832 >>> df2 Code Date val2 0 1 2015-02-03 0.184334 1 1 2015-04-13 0.080873 2 0 2015-05-02 0.428314 3 1 2015-06-26 0.688500 4 0 2015-06-30 0.058194
Now write an apply function that adds a column of nearby dates to df1 using scikit-learn:
from sklearn.neighbors import NearestNeighbors def find_nearest(group, match, groupname): match = match[match[groupname] == group.name] nbrs = NearestNeighbors(1).fit(match['Date'].values[:, None]) dist, ind = nbrs.kneighbors(group['Date'].values[:, None]) group['Date1'] = group['Date'] group['Date'] = match['Date'].values[ind.ravel()] return group df1_mod = df1.groupby('Code').apply(find_nearest, df2, 'Code') >>> df1_mod Code Date val1 Date1 0 0 2015-05-02 0.975852 2015-01-16 1 0 2015-05-02 0.516300 2015-01-31 2 1 2015-04-13 0.322956 2015-04-06 3 1 2015-04-13 0.795186 2015-05-09 4 1 2015-06-26 0.270832 2015-06-08
Finally, we can combine them with a direct call to pd.merge :
>>> pd.merge(df1_mod, df2, on=['Code', 'Date']) Code Date val1 Date1 val2 0 0 2015-05-02 0.975852 2015-01-16 0.428314 1 0 2015-05-02 0.516300 2015-01-31 0.428314 2 1 2015-04-13 0.322956 2015-04-06 0.080873 3 1 2015-04-13 0.795186 2015-05-09 0.080873 4 1 2015-06-26 0.270832 2015-06-08 0.688500
Note that lines 0 and 1 match the same val2 ; this is expected, given how you described your desired solution.