I have a dataset like this:
id_1 <- c(1, 1, 1)
id_2 <- c(2, NA, NA)
day <- c("Mon", "Mon", "Mon")
month <- c("May", NA, "May")
year <- c("2017", NA, NA)
df <- cbind(id_1, id_2, day, month, year)
These lines are repetitive observations in my data. I would like to keep only the most complete line (i.e. Line 1). My real data has 15 columns, so use
duplicated(df[, <some combination of columns>])
seems complicated. Is there a function for this? Or some simple answer that I am missing? Answers in R are preferred, but SQL is also an option. Thank you in advance!
EDIT: id_1 and id_2 are both identifiers for observation. id_1 should definitely be unique in this data, but it is suitable for id_2 as NA or repeated for some lines. In the end, I will merge this data table with another data table using id_2. Therefore, I would like to delete lines that repeat information already captured by a line that includes id_2.
source
share