Join tables by date

I am looking for a simple method for joining two tables by date range. 1 contains the exact date, another table contains two variables that determine the beginning and end of the time period. I need to join tables if the date in the first table is in the range from the second table.

data1 <- data.table(date = c('2010-01-21', '2010-01-25', '2010-02-02', '2010-02-09'),
                name = c('id1','id2','id3','id4'))


data2 <- data.table(beginning=c('2010-01-15', '2010-01-23', '2010-01-30', '2010-02-05'), 
                ending = c('2010-01-22','2010-01-29','2010-02-04','2010-02-13'),
                class = c(1,2,3,4))

result <- data.table(date = c('2010-01-21', '2010-01-25', '2010-02-02', '2010-02-09'),
                 beginning=c('2010-01-15', '2010-01-23', '2010-01-30', '2010-02-05'), 
                 ending = c('2010-01-22','2010-01-29','2010-02-04','2010-02-13'),
                 name = c('id1','id2','id3','id4'),
                 class = c(1,2,3,4))

Any help please? I found some difficult examples, but they do not even work with my data due to the formats. I need something like:

select * from data1
left join
select * from data2
where data2.beginning <= data1.date <= data2.ending

thank

+4
source share
1 answer

I know that at the base it looks awful, but here's what I came up with. It is better to use the sqldf package (see below).

library(data.table)
data1 <- data.table(date = c('2010-01-21', '2010-01-25', '2010-02-02', '2010-02-09'),
                    name = c('id1','id2','id3','id4'))


data2 <- data.table(beginning=c('2010-01-15', '2010-01-23', '2010-01-30', '2010-02-05'), 
                    ending = c('2010-01-22','2010-01-29','2010-02-04','2010-02-13'),
                    class = c(1,2,3,4))

result <- cbind(data1,"beginning"=sapply(1:nrow(data2),function(x) data2$beginning[data2$beginning[x]<data1$date & data2$ending[x]>data1$date]),
            "ending"=sapply(1:nrow(data2),function(x) data2$ending[data2$beginning[x]<data1$date & data2$ending[x]>data1$date]),
            "class"=sapply(1:nrow(data2),function(x) data2$class[data2$beginning[x]<data1$date & data2$ending[x]>data1$date]))

Using sqldf package:

library(sqldf)
result = sqldf("select * from data1
                left join data2
                on data1.date between data2.beginning and data2.ending")
+4
source

Source: https://habr.com/ru/post/1542675/


All Articles