I have a fairly large data set (by my standards), and I want to create an ordinal number for record blocks. I can use the plyr package, but the runtime is very slow. The code below replicates a dimension frame of comparable size.
#
In fact, this is a little less than what I'm working with, since the values ββare usually larger ... but it's close enough.
Here is the runtime on my machine:
> system.time(test.plyr <- ddply(df, + .(id, term), + summarise, + seqnum = 1:length(id), + .progress="text")) |===============================================================================================| 100% user system elapsed 63.52 0.03 63.85
Is there a βbetterβ way to do this? Sorry, I'm on a Windows machine.
Thanks in advance.
EDIT: Data.table is extremely fast, but I cannot correctly calculate the sequence numbers. This is what my version of ddply created. Most of them have only one entry in the group, but some of them have 2 lines, 3 lines, etc.
> with(test.plyr, table(seqnum)) seqnum 1 2 3 4 5 24272 4950 681 88 9
And using the data table as shown below, the same approach gives:
> with(test.dt, table(V1)) V1 1 24272
source share