I am working on panel data with a unique case identifier and a column for time points of observations (long format). There are both time-constant variables and time-varying observations:
id time tc1 obs1
1 101 1 male 4
2 101 2 male 5
3 101 3 male 3
4 102 1 female 6
5 102 3 female 2
6 103 1 male 2
For my model, I now need data with complete records for an identifier for each moment in time. In other words, if there is no observation, I still need to put a line with id, time, time-constant variables and NA for the observed variables (like the line (102, 2, "woman", NA) in the above example). So my question is:
- How do I know if there is a row in my dataset with a unique combination of id and time?
- If not, how can I add this line, transfer variables in time and fill in observations using NA?
It would be great if someone could shed light on this.
Many thanks!
EDIT
Thank you all for your answers. Here's what I finally did is a combination of several suggested approaches. The fact is that I have several variables in time (obs1-obsn) for each line, and I did not get a dcast to place for this - value.name takes no more arguments.
iddat = expand.grid(id = unique(dataset$id), time = (c(1996,1999,2002,2005,2008,2011)))
iddat <- iddat[order(iddat$id, iddat$time), ]
dataset_new <- merge(dataset, iddat, all.x=TRUE, all.y=TRUE, by=c("id", "time"))
dataset_new[c("tc1", "tc2", "tc3")] <- list(NULL)
temp <- dataset[c("tc1", "tc2", "tc3")]
dataset_new <- merge(dataset_new, temp, by=c("id"))
dataset_new <- dataset_new[order(dataset_new$id, dataset_new$time), ]
dataset_new <- unique(dataset_new)
rm(temp)
rm(iddat)
All the best and thanks again Matt