EDIT:
Since your main problem is related to your speed, I edited my trick in my original solution, which leads to something like this:
uniqueMax <- function(m, belongID = 1, time = 3) { t( vapply( split(1:nrow(m), m[,belongID]), function(i, x, time){ is.zero <- x[i,'Rating'] == 0 if(all(is.zero)) is.zero <- FALSE x[i, , drop=FALSE][which.max(x[i,time]*(!is.zero)),] } , m[1,], x=m, time=time ) ) }
My original solution, which is a bit readable than the previous one:
# Get the rating per belongID getRating <- function(x){ iszero <- x$Rating == 0 if(all(iszero)){ id <- which.max(x$Time) } else { id <- which.max((!iszero)*x$Time)
Result:
tc <- textConnection(' belongID uniqID Time Rating 1 101 5 0 1 102 4 0 2 103 4 0 2 104 3 0 2 105 2 5 3 106 4 2 3 107 5 0 3 108 5 1 ') Data <- read.table(tc,header=TRUE) do.call(rbind,lapply(split(Data,Data$belongID),getRating))
:
belongID uniqID Time Rating 1 1 101 5 0 2 2 105 2 5 3 3 108 5 1
EDIT: Just for fun, I also benchmarked (using rbenchmark
) on a small dataset with 1000 replications and a large one with 10 repetitions:
Result:
> benchmark(Joris(Data),Tyler(Data),uniqueMax(Data), + columns=c("test","elapsed","relative"), + replications=1000) test elapsed relative 1 Joris(Data) 1.20 1.025641 2 Tyler(Data) 1.42 1.213675 3 uniqueMax(Data) 1.17 1.000000 > benchmark(Joris(Data2),Tyler(Data2),uniqueMax(Data2), + columns=c("test","elapsed","relative"), + replications=10) test elapsed relative 1 Joris(Data2) 3.63 1.174757 2 Tyler(Data2) 4.02 1.300971 3 uniqueMax(Data2) 3.09 1.000000
Here, I simply wrapped the Joris () and Tyler () functions around our solutions and created Data2 as follows:
Data2 <- data.frame( belongID = rep(1:1000,each=10), uniqID = 1:10000, Time = sample(1:5,10000,TRUE), Rating = sample(0:5,10000,TRUE) )