Combining data frames by approximate column values

Question

Combining data frames by approximate column values

I have two data frames containing time series (with time encoded as numeric rather than temporary objects, and time is not sorted). I would like to normalize the response variable in one data frame to the response variable in another data frame. The problem is that the time points in two data frames are not completely equivalent. So, I need to combine two data frames according to the approximate coincidence of the two time columns.

The data is as follows:

df1 <- structure(list(t1 = c(3, 1, 2, 4), y1 = c(9, 1, 4, 16)), .Names = c("t1", "y1"), row.names = c(NA, -4L), class = "data.frame") df2 <- structure(list(t2 = c(0.9, 4.1), y2 = structure(1:2, .Label = c("a", "b"), class = "factor")), .Names = c("t2", "y2"), row.names = c(NA, -2L), class = "data.frame")

The result should look like this:

 t1 y1 y2 1 1 a 4 16 b

It seems that approx or approxfun would be helpful, but I can't figure out how to do this.

+4

r

Drew steen Oct 16 '12 at 19:42

source share

2 answers

@JoshuaUlrich provided a good way to do this if you want the end result to include everything from df2 , and you don't care if column t1 values from t2 .

However, if you want to avoid these things and continue in the vein suggested by @BrandonBertelsen, you can define a custom function round and then use it in the merge column of the second data.frame . For instance:

 # define a more precise rounding function that meets your needs. # eg, this one rounds values in x to their nearest multiple of h gen.round <- function(x, h) { ifelse(x %% h > (h/2), h + h * (x %/% h), -(h + h * (-x %/% h))) } # make a new merge function that uses gen.round to round the merge column # in the second data.frame merge.approx <- function(x, y, by.x, by.y, h, ...) { y <- within(y, assign(by.y, gen.round(get(by.y), h))) merge(x, y, by.x=by.x, by.y=by.y, ...) } merge.approx(df1, df2, by.x='t1', by.y='t2', h =.5) t1 y1 y2 1 1 1 a 2 4 16 b

+1

Matthew plourde Oct 16 '12 at 20:42

source share

Joshua ulrich · Accepted Answer · 2012-10-16T19:51:55+0000

You can easily do this with na.approx from the zoo:

 library(zoo) Data <- merge(df1, df2, by.x="t1", by.y="t2", all=TRUE) Data$y1 <- na.approx(Data$y1, na.rm=FALSE, rule=2) na.omit(Data) # t1 y1 y2 # 1 0.9 1 a # 6 4.1 16 b

You can also do this with approx :

 Data <- merge(df1, df2, by.x="t1", by.y="t2", all=TRUE) y1.na <- is.na(Data$y1) Data$y1[y1.na] <- (approx(Data$y1, rule=2, n=NROW(Data))$y)[y1.na]

Combining data frames by approximate column values

More articles: