R - assign column value based on closest match in second data frame

I have two data frames: logger and df (numbers are numeric):

logger <- data.frame( time = c(1280248354:1280248413), temp = runif(60,min=18,max=24.5) ) df <- data.frame( obs = c(1:10), time = runif(10,min=1280248354,max=1280248413), temp = NA ) 

I would like to search logger $ time for the closest match of each line in df $ time and assign the associated logger $ temp to df $ temp. So far, I have managed to use the following loop:

 for (i in 1:length(df$time)){ closestto<-which.min(abs((logger$time) - (df$time[i]))) df$temp[i]<-logger$temp[closestto] } 

However, now I have large data frames (logger has 13,620 lines, and df has 266138), and the processing time is long. I read that loops are not the most efficient way to do something, but I am not familiar with the alternatives. Is there a faster way to do this?

+6
source share
2 answers

I would use data.table for this. This makes it super simple and super fast key join. There is even a very useful roll = "nearest" argument for exactly the behavior you are looking for (with the exception of your example data, this is not necessary because all times from df displayed in logger ). In the following example, I renamed df$time to df$time1 to figure out which column belongs to which table ...

 # Load package require( data.table ) # Make data.frames into data.tables with a key column ldt <- data.table( logger , key = "time" ) dt <- data.table( df , key = "time1" ) # Join based on the key column of the two tables (time & time1) # roll = "nearest" gives the desired behaviour # list( obs , time1 , temp ) gives the columns you want to return from dt ldt[ dt , list( obs , time1 , temp ) , roll = "nearest" ] # time obs time1 temp # 1: 1280248361 8 1280248361 18.07644 # 2: 1280248366 4 1280248366 21.88957 # 3: 1280248370 3 1280248370 19.09015 # 4: 1280248376 5 1280248376 22.39770 # 5: 1280248381 6 1280248381 24.12758 # 6: 1280248383 10 1280248383 22.70919 # 7: 1280248385 1 1280248385 18.78183 # 8: 1280248389 2 1280248389 18.17874 # 9: 1280248393 9 1280248393 18.03098 #10: 1280248403 7 1280248403 22.74372 
+5
source

You can use the data.table library. It will also help increase efficiency with large data sizes -

 library(data.table) logger <- data.frame( time = c(1280248354:1280248413), temp = runif(60,min=18,max=24.5) ) df <- data.frame( obs = c(1:10), time = runif(10,min=1280248354,max=1280248413) ) logger <- data.table(logger) df <- data.table(df) setkey(df,time) setkey(logger,time) df2 <- logger[df, roll = "nearest"] 

Output -

 > df2 time temp obs 1: 1280248356 22.81437 7 2: 1280248360 24.08711 10 3: 1280248366 22.31738 2 4: 1280248367 18.61222 5 5: 1280248388 19.46300 4 6: 1280248393 18.26535 6 7: 1280248400 20.61901 9 8: 1280248402 21.92584 1 9: 1280248410 19.36526 8 10: 1280248410 19.36526 3 
+1
source

Source: https://habr.com/ru/post/958006/


All Articles