Say that I have (fake) patient data from their visits:
foo <- data.frame(PatientNumber=c(11,11,11,22,22,33,33,33,44,55,55),
VisitDate=c("11/03/07","11/03/07","11/20/07","12/20/08",
"12/30/09","09/20/12","09/20/12","10/25/07","05/09/08","06/09/13","06/09/13"),
ICD9=c(10,15,10,30,30,25,60,25,14,40,13))
What gives:
PatientNumber VisitDate ICD9
1 11 11/03/07 10
2 11 11/03/07 15
3 11 11/20/07 10
4 22 12/20/08 30
5 22 12/30/09 30
6 33 09/20/12 25
7 33 09/20/12 60
8 33 10/25/07 25
9 44 05/09/08 14
10 55 06/09/13 40
11 55 06/09/13 13
I would like to have a unique row for each patient at the time of the visit. If the patient has several codes for the date, I would like a new column for all ICD codes to be indicated during this visit. Here's what it looks like:
WhatIWant <- data.frame(PatientNumber=c(11,11,22,22,33,33,44,55),
VisitDate=c("11/03/07", "11/20/07", "12/20/08", "12/30/09", "09/20/12","10/25/07","05/09/08","06/09/13"),
ICD9_1=c(10,10,30,30,25,25,14,40),
ICD9_2= c(15,NA,NA,NA,60,NA,NA,13))
> WhatIWant
PatientNumber VisitDate ICD9_1 ICD9_2
1 11 11/03/07 10 15
2 11 11/20/07 10 NA
3 22 12/20/08 30 NA
4 22 12/30/09 30 NA
5 33 09/20/12 25 60
6 33 10/25/07 25 NA
7 44 05/09/08 14 NA
8 55 06/09/13 40 13
I tried changing the form, but it seems that all ICD9 columns are added to the column and add value to the column if they have a value or not (as shown below). I get something like 200 columns, I would like only 3 (the maximum number of codes per patient per visit in the data set, i.e. ICD9_1, ICD9_2, ICD9_3).
test <- reshape(foo, idvar = c("VisitDate"), timevar = c("PatientNumber"), direction = "wide")
> test
VisitDate ICD9.11 ICD9.22 ICD9.33 ICD9.44 ICD9.55
1 0007-11-03 10 NA NA NA NA
3 0007-11-20 10 NA NA NA NA
4 0008-12-20 NA 30 NA NA NA
5 0009-12-30 NA 30 NA NA NA
6 0012-09-20 NA NA 25 NA NA
8 0007-10-25 NA NA 25 NA NA
9 0008-05-09 NA NA NA 14 NA
10 0013-06-09 NA NA NA NA 40
, , , , , .
!