Here's an approach that replicates elements along rows to give us an array of 2D . Then we would fill the upper triangular region with a large number, so that later, when we sorted the array along each row, basically we would sort all the elements to diagonal elements and simulate the aggregate windows. Then, following the definition of median , choosing the average or average of two means (for an even number of elements), we get the elements in the first position: (0,0) , then for the second row: the average value (1,0) & (1,1) , for the third row: (2,1) , for the fourth row: the average of (3,1) & (3,2) and so on. So, we will extract these elements from the sorted array and, therefore, get our median values.
So the implementation will be -
def cummedian_sorted(a): n = a.size maxn = a.max()+1 a_tiled_sorted = np.tile(a,n).reshape(-1,n) mask = np.triu(np.ones((n,n),dtype=bool),1) a_tiled_sorted[mask] = maxn a_tiled_sorted.sort(1) all_rows = a_tiled_sorted[np.arange(n), np.arange(n)//2].astype(float) idx = np.arange(1,n,2) even_rows = a_tiled_sorted[idx, np.arange(1,1+(n//2))] all_rows[idx] += even_rows all_rows[1::2] /= 2.0 return all_rows
Runtime test
Approaches -
Dates -
Install # 1:
In [43]: a = np.random.randint(0,100,(100)) In [44]: print np.allclose(cummedian_loopy(a), cummedian_sorted(a)) ...: print np.allclose(cummedian_loopy(a), cummedian_nanfill(a)) ...: True True In [45]: %timeit cummedian_loopy(a) ...: %timeit cummedian_nanfill(a) ...: %timeit cummedian_sorted(a) ...: 1000 loops, best of 3: 856 µs per loop 1000 loops, best of 3: 778 µs per loop 10000 loops, best of 3: 200 µs per loop
Install # 2:
In [46]: a = np.random.randint(0,100,(1000)) In [47]: print np.allclose(cummedian_loopy(a), cummedian_sorted(a)) ...: print np.allclose(cummedian_loopy(a), cummedian_nanfill(a)) ...: True True In [48]: %timeit cummedian_loopy(a) ...: %timeit cummedian_nanfill(a) ...: %timeit cummedian_sorted(a) ...: 10 loops, best of 3: 118 ms per loop 10 loops, best of 3: 47.6 ms per loop 100 loops, best of 3: 18.8 ms per loop
Install # 3:
In [49]: a = np.random.randint(0,100,(5000)) In [50]: print np.allclose(cummedian_loopy(a), cummedian_sorted(a)) ...: print np.allclose(cummedian_loopy(a), cummedian_nanfill(a)) True True In [54]: %timeit cummedian_loopy(a) ...: %timeit cummedian_nanfill(a) ...: %timeit cummedian_sorted(a) ...: 1 loops, best of 3: 3.36 s per loop 1 loops, best of 3: 583 ms per loop 1 loops, best of 3: 521 ms per loop