Is there a way (in dplyr) to filter (dates) with ranges given by two vectors without a join operation?

I want to do just that: Use dates from one data frame and filter data in another data frame - R

other than combining, because I'm afraid that after I join my data, the result will be too large to fit the memory before the filter.

Here is an example of data:

tmp_df <- data.frame(a = 1:10) 

I want to perform an operation that looks like this:

 lower_bound <- c(2, 4) upper_bound <- c(2, 5) tmp_df %>% filter(a >= lower_bound & a <= upper_bound) # does not work as <= is vectorised inappropriately 

and my desired result:

 > tmp_df[(tmp_df$a <= 2 & tmp_df$a >= 2) | (tmp_df$a <= 5 & tmp_df$a >= 4), , drop = F] # one way to get indices to subset data frame, impractical for a long range vector a 2 2 4 4 5 5 

My problem with memory requirements (regarding a connection-related solution) is that tmp_df has a lot more lines, and the lower_bound and upper_bound have a lot more records. A dplyr solution, or a solution that may be part of a pipe, is preferred.

+5
source share
2 answers

Perhaps you could take the inrange function from data.table , which

checks if each value is in x between any of the intervals represented in the lower, upper.

Using:

inrange (x, lower, upper, incbounds = TRUE)

 library(dplyr); library(data.table) tmp_df %>% filter(inrange(a, c(2,4), c(2,5))) # a #1 2 #2 4 #3 5 
+6
source

If you want to stick with dplyr , it has similar functions provided with the between function.

 # ranges I want to check between my_ranges <- list(c(2,2), c(4,5), c(6,7)) tmp_df <- data.frame(a=1:10) tmp_df %>% filter(apply(bind_rows(lapply(my_ranges, FUN=function(x, a){ data.frame(t(between(a, x[1], x[2]))) }, a) ), 2, any)) a 1 2 2 4 3 5 4 6 5 7 

Just remember that argument borders are enabled by default and cannot be changed as with inrange

+3
source

Source: https://habr.com/ru/post/1268966/


All Articles