I ran into a brick wall with this problem.
I have a data (date) frame with some document identifiers and dates stored in a character vector:
Doc Dates 1 12345 c("06/01/2000","08/09/2002") 2 23456 c("07/01/2000", 09/08/2003", "07/01/2000") 3 34567 c("09/06/2004", "09/06/2004", "12/30/2006") 4 45678 c("06/01/2000","08/09/2002")
I am trying to remove duplicate elements in dates to get this result:
Doc Dates 1 12345 c("06/01/2000","08/09/2002") 2 23456 c("07/01/2000", 09/08/2003") 3 34567 c("09/06/2004", "12/30/2006") 4 45678 c("06/01/2000","08/09/2002")
I tried:
R>unique(dates$dates)
but removes duplicate rows by dates:
Doc Dates 1 12345 c("06/01/2000","08/09/2002") 2 23456 c("07/01/2000", 09/08/2003") 3 34567 c("09/06/2004", "12/30/2006")
Any help on how to remove only duplicate items in dates, and not remove duplicate rows by dates?
** Updated with data
# Match some text string (dates) from some text: df1$dates <- as.character(strapply(df1[[2]], "((\\D\\d{1,2}(/|-)\\d{1,2}(/|-)\\d{2,4})| ([^/]\\d{1,2}(/|-)\\d{2,4})|((JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV){1}[\\s|-]{0,2}\\d{1,4}(\\D[\\s|-]{0,}\\d{2,4}){0,}))")) # Drop first 2 columns from dataframe df2<-df1[ -c(1,2)] # List data >df2 872 7/23/2007 873 c(" 11/4/2007", " 11/4/2007") 874 c(" 4/2/2008", " 8/2/2007") 880 11/14/2006 > class(df2) [1] "data.frame" > class(df2$dates) [1] "character" > dput(df2) structure(list(dates = c("NULL", "NULL", " 7/23/2007", "c(\" 11/4/2007\", \" 11/4/2007\")", "c(\" 4/2/2008\", \" 8/2/2007\")", "NULL", "NULL", "NULL", "NULL", "NULL", " 11/14/2006")), .Names = "dates", class = "data.frame", row.names = 870:880)
So my problem is how to get rid of duplicate dates on line 873?