Slice averages on 1d nparray: how to make it more NumPy-thonic?

In the framework of some of the simulations that I run, I finally need to perform the following operation on very long sequences of (real) numbers. Here's the gist:

Given the long 1-dimensional NumPy array, for each position in the array I want to average the values ​​before and after this position, take the difference between the average values ​​and load these differences into another nparray of the same size as the original array.

Here is my attempt. It works fine, except that it becomes very slow as the sequence increases.

import numpy as np                                                                                     

def test_sequence(npseq):                                                                                           
    n = npseq.shape[0]                                                                                   

    def f(i):                                                                                          
        pre = np.sum(npseq[:i])/i                                                                        
        post = np.sum(npseq[(i+1):])/(n-i)                                                               
        return pre-post                                                                                

    out = np.array([f(i) for i in range(1,n)])                                                         

    return out

Seems simple enough. But...

In [26]: a = np.random.randint(0,100,100000)
In [27]: %timeit example.test_sequence(a)
1 loops, best of 3: 7.69 s per loop

In [17]: a = np.random.randint(0,100,400000)
In [18]: %timeit example.test_sequence(a)
1 loops, best of 3: 1min 50s per loop

I know that there is probably a smart way to vectorize this, but I'm inexperienced with NumPy. Can someone point me in the right direction?

EDIT: "sum" "average". "". . , " ". . , , , .

+4
2

: np.cumcum():

np.cumsum(a[::-1])[::-1] - np.cumsum(a)

np.cumsum() , a[::-1])[::-1] - . , , np.arange(a.size, 1, -1), np.arange(1, a.size), :

np.cumsum(a[::-1])[::-1]/np.arange(a.size + 1, 1, -1) - np.cumsum(a)/np.arange(1, a.size + 1)

:

In [53]: a
Out[53]: array([32, 69, 79, 34,  1, 77, 54, 42, 73, 75])

In [54]: np.cumsum(a[::-1])[::-1]/np.arange(a.size + 1 , 1, -1) - np.cumsum(a)/np.arange(1, a.size + 1)
Out[54]: 
array([ 16.72727273,  -0.1       , -11.66666667,  -9.        ,
         3.        ,   4.83333333,  -0.62857143,  -1.        ,
        -1.88888889, -16.1       ])
+1

( ) expanding_mean pandas:

import pandas as pd
a = np.array([32, 69, 79, 34,  1, 77, 54, 42, 73, 75])
pd.expanding_mean(a)[:-2:] - pd.expanding_mean(a[::-1])[-3::-1]

( ) , , :

def test_sequence(npseq):                                                                                           
    n = npseq.shape[0]                                                                                   

    def f(i):                                                                                          
        pre = np.sum(npseq[:i])/i
        post = np.sum(npseq[(i+1):])/(n-i-1)
        return pre-post                                                                                

    out = np.array([f(i) for i in range(1,n-1)])                                                         

    return out

test_sequence(a)

array([-22.375     ,  -0.35714286,   6.33333333, -10.7       ,
       -18.        , -14.66666667, -24.57142857, -26.5       ])

:

pd.Series(a[:-2:]).expanding().mean() - pd.Series(a[::-1]).expanding().mean()[-3::-1].reset_index(drop = True)

0   -22.375000
1    -0.357143
2     6.333333
3   -10.700000
4   -18.000000
5   -14.666667
6   -24.571429
7   -26.500000
dtype: float64

:

a = np.random.randint(0,100,100000)
%timeit test_sequence(a)
%timeit pd.Series(a[:-2:]).expanding().mean() - pd.Series(a[::-1]).expanding().mean()[-3::-1].reset_index(drop = True)

1 loop, best of 3: 8.17 s per loop
10 loops, best of 3: 18.5 ms per loop
0

Source: https://habr.com/ru/post/1667691/


All Articles