I have a data frame with three columns: timestamp, key, timed event.
ts,key,event
3,12,1
8,49,1
12,42,1
46,12,-1
100,49,1
From this I want to create a data frame with a time stamp and (all unique keys - all unique keys with a total of 0 to a given timestamp), divided by all unique keys up to the same timestamp. For instance. for the above example, the result should be:
ts,prob
3,1
8,1
12,1
46,2/3
100,2/3
My first step is to compute the cumsum grouped by key:
items = data.frame(ts=c(3,8,12,46,100), key=c(12,49,42,12,49), event=c(1,1,1,-1,1))
sumByKey = ddply(items, .(key), transform, sum=cumsum(event))
In the second (and last) step, I iterate over sumByKeyusing a for loop and tracking both all unique keys and all unique keys that have 0 in total using vectors, for example. if(!(k %in% uniqueKeys) uniqueKeys = append(uniqueKeys, key). The probability is obtained using two vectors.
, plyr, sumByKey. , , - , ddply. , ( ), , , . (acc, x) acc + x.
, , , ddply?