Joining / combining data / tables based on criteria & # 8594; or <

I have a data frame with weekly section data. Each section has data for 104 weeks and a total of 83 sections.

I have a second data frame with a start and end week for the section that I want to filter the main data frame.

In both tables, Week is a combination of Year and Week, for example. 201501 and always from 1 to 52 weeks.

So, in the example below, I want to filter section A by week 201401 to 201404, section B, by week 201551 to 201603.

I initially thought that I could add an additional column to the Weeks_Filter data frame, which is the serial number from the beginning and the end of the weeks for each section (duplicating each row for every week), then combine 2 tables and save all the data from the Weeks_Filter table (all.y = TRUE) because it worked on a small sample that I made, but I donโ€™t know how to add consecutive weeks, as they can span different years.

 Week <- c("201401","201402","201403","201404","201405", "201451", "201552", "201601", "201602", "201603") Section <- c(rep("A",5),rep("B",5)) df <- data.frame(cbind(Week, Section)) Section <- c("A", "B") Start <- c("201401","201551") End <- c("201404","201603") Weeks_Filter <- data.frame(cbind(Section, Start, End)) 
+5
source share
4 answers
 require(data.table) df <- merge(df, Weeks_Filter) df[, -1] <- apply(df[, -1], 2, function(x) as.numeric(as.character(x))) df <- data.table(df) df[Week >= Start & Week <= End, .SD, by = Section] 

Output:

  Section Start End Week 1: A 201401 201404 201401 2: A 201401 201404 201402 3: A 201401 201404 201403 4: A 201401 201404 201404 5: B 201551 201603 201552 6: B 201551 201603 201601 7: B 201551 201603 201602 8: B 201551 201603 201603 
-2
source

In the latest version, the development version of data.table adds a union without equi (and in older ones you can use foverlaps ):

 setDT(df) # convert to data.table in place setDT(Weeks_Filter) # fix the column types - you have factors currently, converting to integer df[, Week := as.integer(as.character(Week))] Weeks_Filter[, `:=`(Start = as.integer(as.character(Start)), End = as.integer(as.character(End)))] # the actual magic df[df[Weeks_Filter, on = .(Section, Week >= Start, Week <= End), which = T]] # Week Section #1: 201401 A #2: 201402 A #3: 201403 A #4: 201404 A #5: 201552 B #6: 201601 B #7: 201602 B #8: 201603 B 
+4
source

Using dplyr , you can

  • merge your data frames
  • section group
  • based on start and end columns

One of the problems is that your โ€œweeksโ€ are characters and become factors in how you encoded them. I took the shortcut and just made them numeric, but I would recommend using lubridate to create the corresponding vectors of the Date class.

 library(dplyr) tempdf <- full_join(df, Weeks_Filter) tempdf$Week <- as.numeric(as.character(tempdf$Week)) tempdf$Start <- as.numeric(as.character(tempdf$Start)) tempdf$End <- as.numeric(as.character(tempdf$End)) tempdf_filt <- tempdf %>% group_by(Section) %>% filter(Week >= Start, Week <= End) 

It looks like in your data, the problem with โ€œ201451โ€ should be โ€œ201551โ€, but otherwise returns what you want:

 > tempdf_filt Source: local data frame [8 x 4] Groups: Section [2] Week Section Start End (dbl) (fctr) (dbl) (dbl) 1 201401 A 201401 201404 2 201402 A 201401 201404 3 201403 A 201401 201404 4 201404 A 201401 201404 5 201552 B 201551 201603 6 201601 B 201551 201603 7 201602 B 201551 201603 8 201603 B 201551 201603 
+1
source

Perhaps creating a vector of all the desired weeks will work for the filter. Here is an approximate example using the R base:

 # get weeks allWeeks <- as.character(1:52) allWeeks <- ifelse(nchar(allWeeks)==1, paste0("0",allWeeks), allWeeks) # get all year-weeks allWeeks <- paste0(2014:2015, allWeeks) # filter vector to select desired weeks keepWeeks <- keepWeeks[grep("201(40[1-4]|55[12]|60[123]))", allWeeks)] dfKeeper <- df[df$Week %in% keepWeeks,] 

I tried to build a regex that would capture the periods you want, but you might have to adjust it a bit.

0
source

Source: https://habr.com/ru/post/1246713/


All Articles