I have an events matrix that contains the times of occurrence of 5 million events. Each of these 5 million events has a โtypeโ that ranges from 1 to 2000. A very simplified version of the matrix is โโshown below. The units for "times" are seconds since 1970. All events occurred on January 1, 2012.
>events type times 1 1352861760 1 1362377700 2 1365491820 2 1368216180 2 1362088800 2 1362377700
I am trying to split the time from 1/1/2012 into 5-minute buckets, and then fill each of these buckets with how much of each type i event happened in each bucket. My code is below. Please note that types is a vector containing every possible type from 1-2000, and by is 300, because this is how many seconds in 5 minutes.
for(i in 1:length(types)){ local <- events[events$type==types[i],c("type", "times")] assign(sprintf("a%d", i),table(cut(local$times, breaks=seq(range(events$times)[1],range(events$times)[2], by=300)))) }
This leads to the variables a1 through a2000 , which contains the row vector, how many occurrences of type i were in each of the 5-minute buckets.
Next, I find all pair correlations between 'a1' and 'a2000'.
Is there a way to optimize the piece of code that I cited above? It works very slowly, but I canโt think of a way to do it faster. Perhaps too many buckets and too little time.
Any insight would be greatly appreciated.
Playable example:
>head(events) type times 12 1308575460 12 1308676680 12 1308825420 12 1309152660 12 1309879140 25 1309946460 xevents <- xts(events[,"type"],.POSIXct(events[,"times"])) ep <- endpoints(xevents, "minutes", 5) counts <- period.apply(xevents, ep, tabulate, nbins=length(types)) >head(counts) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 2011-06-20 09:11:00 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2011-06-21 13:18:00 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2011-06-23 06:37:00 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2011-06-27 01:31:00 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2011-07-05 11:19:00 0 0 0 0 0 0 0 0 0 0 0 1 0 0 2011-07-06 06:01:00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 >> ep[1:20] [1] 0 1 2 3 4 5 6 7 8 9 10 12 20 21 22 23 24 25 26 27
Above was the code that I used, but the problem is that it did not increase by 5 minutes: it simply increases upon entry into real events.