This is my first stackoverflow question. Take it easy on me!
I have two data sets obtained simultaneously by different data acquisition systems with different sampling rates. One of them is very regular, and the other is not. I would like to create a single dataframe containing both data sets, using timestamps with a time interval (in seconds) as a reference for both. Wrongly sampled data should be interpolated at time intervals at intervals.
Here are some toy details that demonstrate what I'm trying to do:
import pandas as pd import numpy as np
df1 and df2 are as follows:
df1: t y1 0 0.0 0.0 1 0.5 0.5 2 1.0 1.0 3 1.5 1.5 4 2.0 2.0 df2: t y2 0 0.00 0.00 1 0.34 1.02 2 1.01 3.03 3 1.40 4.20 4 1.60 4.80 5 1.70 5.10 6 2.01 6.03
I am trying to combine df1 and df2 by interpolating y2 on df1.t. Desired Result:
df_combined: t y1 y2 0 0.0 0.0 0.0 1 0.5 0.5 1.5 2 1.0 1.0 3.0 3 1.5 1.5 4.5 4 2.0 2.0 6.0
I read the documentation for pandas.resample, and also looked at previous stack questions, but could not find a solution to my specific problem. Any ideas? Sounds like it should be easy.
UPDATE: I realized one possible solution: first, interpolate the second series, and then add to the first data frame:
from scipy.interpolate import interp1d f2 = interp1d(t2,y2,bounds_error=False) df1['y2'] = f2(df1.t)
which gives:
df1: t y1 y2 0 0.0 0.0 0.0 1 0.5 0.5 1.5 2 1.0 1.0 3.0 3 1.5 1.5 4.5 4 2.0 2.0 6.0
This works, but I'm still open to other solutions if there is a better way.