I am dealing with timing of precipitation volumes, for which I want to calculate the lengths and volumes of individual precipitation events, where the “event” is a sequence of non-zero timestamps. I deal with multiple timings of ~ 60k timesteps, and my current approach is rather slow.
I currently have the following:
import numpy as np
def count_events(timeseries):
start = 0
end = 0
lengths = []
volumes = []
for i, val in enumerate(np.pad(timeseries, pad_width = 1, mode = 'constant')):
if val > 0 and start==0:
start = i
if val == 0 and start>0:
end = i
if end - start != 1:
volumes.append(np.sum(timeseries[start:end]))
elif end - start == 1:
volumes.append(timeseries[start-1])
lengths.append(end-start)
start = 0
return np.asarray(lengths), np.asarray(volumes)
Expected Result:
testrain = np.array([1,0,1,0,2,2,8,2,0,0,0.1,0,0,1])
lengths, volumes = count_events(testrain)
print lengths
[1 1 4 1 1]
print volumes
[ 1. 1. 12. 0.1 1. ]
I imagine a much better way to do this using numpy efficiency, but nothing comes to mind ...
EDIT:
Comparison of various solutions:
testrain = np.random.normal(10,5, 60000)
testrain[testrain<0] = 0
My solution (produces incorrect results, not entirely accurate):
%timeit count_events(testrain)
@ Dawg's:
%timeit dawg(testrain)
%timeit dawg2(testrain)
@ Dsm's:
%timeit DSM(testrain)
#10 loops, best of 3: 28.4 ms per loop
@ DanielLenz's:
%timeit DanielLenz(testrain)