I have a data.frame dat that stores my regular data, and the groups are identified by ID .
data <- structure(list(NAME = structure(c(1L, 1L, 2L), .Label = c("NAME1", "NAME2"), class = "factor"), ID = c(23L, 23L, 57L), REF_YEAR = c(1920L, 1938L, 1869L), SURV_YEAR = c(1938L, 1962L, 1872L), VALUE = c(20L, 40L, 34L)), .Names = c("NAME", "ID", "REF_YEAR", "SURV_YEAR","VALUE"), class = "data.frame", row.names = c(NA, -3L)) NAME ID REF_YEAR SURV_YEAR VALUE 1 NAME1 23 1920 1938 20 2 NAME1 23 1938 1962 40 3 NAME2 57 1869 1872 34
And I have a second data.frame , dat_q , which I would like to compare with dat
dat_q <- structure(list(NAME = structure(1:2, .Label = c("NAME1", "NAME2"), class = "factor"), ID = c(23L, 57L), REF_YEAR = c(1934L, 1866L), SURV_YEAR = c(1938L, 1868L), VALUE = structure(1:2, .Label = c("A", "B"), class = "factor")), .Names = c("NAME", "ID", "REF_YEAR", "SURV_YEAR", "VALUE"), class = "data.frame", row.names = c(NA, -2L)) NAME ID REF_YEAR SURV_YEAR VALUE 1 NAME1 23 1934 1938 A 2 NAME2 57 1866 1868 B
My question is: how could I delete all rows in dat_q that have the same value in REF_YEAR or SURV_YEAR columns than in the same dat columns (in 1938 data samples)? This should be applied by the group (as defined by the ID ), and not over the whole data.frame
In the end, using my sample data, this will be the result obtained from the dat_q filtering dat_q
NAME ID REF_YEAR SURV_YEAR VALUE 2 NAME2 57 1866 1868 B
EDIT
Here are some other examples of data that the code provided by @thelatemail will not work with. And I cannot understand why dat_q should be filtered out because it contains the same value as dat .
data <- structure(list(NAME = structure(c(1L, 1L, 1L), .Label = "NAME1", class = "factor"), ID = c(226L, 226L, 226L), SURV_YEAR = c(2009L, 2010L, 2012L), REF_YEAR = c(2008L, 2009L, 2011L), VALUE = c(-7L, -37L, -51L)), .Names = c("NAME", "ID", "SURV_YEAR", "REF_YEAR", "VALUE"), class = "data.frame", row.names = c(NA, -3L)) NAME ID SURV_YEAR REF_YEAR VALUE 1 NAME1 226 2009 2008 -7 2 NAME1 226 2010 2009 -37 3 NAME1 226 2012 2011 -51 dat_q <- structure(list(NAME = structure(1L, .Label = "NAME1", class = "factor"), ID = 226L, REF_YEAR = 2010L, SURV_YEAR = 2011L, VALUE = structure(1L, .Label = "-X", class = "factor")), .Names = c("NAME", "ID", "REF_YEAR", "SURV_YEAR", "VALUE"), class = "data.frame", row.names = c(NA, -1L)) NAME ID REF_YEAR SURV_YEAR VALUE 1 NAME1 226 2010 2011 -X