Calculation of sums of unique values in log in R

Question

Calculation of sums of unique values in log in R

I have a data frame with three columns: timestamp, key, timed event.

ts,key,event
 3,12,1
 8,49,1
 12,42,1
 46,12,-1
 100,49,1

From this I want to create a data frame with a time stamp and (all unique keys - all unique keys with a total of 0 to a given timestamp), divided by all unique keys up to the same timestamp. For instance. for the above example, the result should be:

ts,prob
3,1
8,1
12,1
46,2/3
100,2/3

My first step is to compute the cumsum grouped by key:

items = data.frame(ts=c(3,8,12,46,100), key=c(12,49,42,12,49), event=c(1,1,1,-1,1))
sumByKey = ddply(items, .(key), transform, sum=cumsum(event))

In the second (and last) step, I iterate over sumByKeyusing a for loop and tracking both all unique keys and all unique keys that have 0 in total using vectors, for example. if(!(k %in% uniqueKeys) uniqueKeys = append(uniqueKeys, key). The probability is obtained using two vectors.

, plyr, sumByKey. , , - , ddply. , ( ), , , . (acc, x) acc + x.

, , , ddply?

+3

r

mkhq 25 . '10 18:25

2

- , , ; R (as.numeric(factor(...))), C, . , plyr, R *pply , ( , , ).

0

mbq 25 . '10 21:44

Joris Meys · Accepted Answer · 2010-08-26T08:50:01+0000

, :

items = data.frame(ts=c(3,8,12,46,100), key=c(12,49,42,12,49), event=c(1,1,1,-1,1))

# numbers of keys that sum to zero, no ddply necessary
nzero <- cumsum(ave(items$event,items$key,FUN=cumsum)==0)

# number of unique keys at a given timepoint
nunique <- rep(F,length(items$key))
nunique[match(unique(items$key),items$key)] <- T
nunique <- cumsum(nunique)

# makes :
items$p <- (nunique-nzero)/nunique

items
   ts key event         p
1   3  12     1 1.0000000
2   8  49     1 1.0000000
3  12  42     1 1.0000000
4  46  12    -1 0.6666667
5 100  49     1 0.6666667

Calculation of sums of unique values ​​in log in R

More articles:

Calculation of sums of unique values in log in R