How to join data frames based on a condition between two columns

I am stuck in a project where I need to combine two data frames. They look something like this:

Data1 Traffic Source Registrations Hour Minute organic 1 6 13 social 1 8 54 Data2 Email Hour2 Minute2 test@domain.com 6 13 test2@domain2.com 8 55 

I have the following line of code to combine two data frames:

 merge.df <- merge(Data1, Data2, by.x = c( "Hour", "Minute"), by.y = c( "Hour2", "Minute2")) 

It would be great if the variable time (hours and minutes) was not slightly disconnected between the two data sets. Is there a way to make the Min column match Minute2 if it is + or - minute?

I thought I could create 2 new columns for the dataset:

 Data1 Traffic Source Registrations Hour Minute Minute_plus1 Minute_minus1 organic 1 6 13 14 12 social 1 8 54 55 53 

Is it possible to combine two data frames if "Minute2" matches any variable from "Minute", "Minute_plus1" or "Minute_minus1"? Or is there a more efficient way to do this merge?

+6
source share
1 answer

For such things, I usually turn to SQL:

 library(sqldf) x = sqldf(" SELECT * FROM Data1 d1 JOIN Data2 d2 ON d1.Hour = d2.Hour2 AND ABS(d1.Minute - d2.Minute2) <= 1 ") 

Depending on the size of your data, you can also simply join the Hour and then filter. Using dplyr :

 library(dplyr) x = Data1 %>% left_join(Data2, by = c("Hour" = "Hour2")) %>% filter(abs(Minute - Minute2) <= 1) 

although you can do the same with base functions.

+10
source

Source: https://habr.com/ru/post/986174/


All Articles