Vectorized Summing Transformation with External Product - NumPy

I am relatively new to NumPy and often read that you should avoid writing loops. In many cases, I understand how to deal with this, but at the moment I have the following code:

p = np.arange(15).reshape(5,3)
w = np.random.rand(5)
A = np.sum(w[i] * np.outer(p[i], p[i]) for i in range(len(p)))

Does anyone know if there is a way to avoid the inner loop?

Thanks in advance!

+4
source share
1 answer

Approach No. 1: C np.einsum-

np.einsum('ij,ik,i->jk',p,p,w)

Approach # 2: C broadcasting+ np.tensordot-

np.tensordot(p[...,None]*p[:,None], w, axes=((0),(0)))

Approach No. 3: C np.einsum+ np.dot-

np.einsum('ij,i->ji',p,w).dot(p)

Runtime test

Install # 1:

In [653]: p = np.random.rand(50,30)

In [654]: w = np.random.rand(50)

In [655]: %timeit np.einsum('ij,ik,i->jk',p,p,w)
10000 loops, best of 3: 101 µs per loop

In [656]: %timeit np.tensordot(p[...,None]*p[:,None], w, axes=((0),(0)))
10000 loops, best of 3: 124 µs per loop

In [657]: %timeit np.einsum('ij,i->ji',p,w).dot(p)
100000 loops, best of 3: 9.07 µs per loop

Install # 2:

In [658]: p = np.random.rand(500,300)

In [659]: w = np.random.rand(500)

In [660]: %timeit np.einsum('ij,ik,i->jk',p,p,w)
10 loops, best of 3: 139 ms per loop

In [661]: %timeit np.einsum('ij,i->ji',p,w).dot(p)
1000 loops, best of 3: 1.01 ms per loop

The third approach just blew everything else!

Why Approach #3is 10x-130x faster than Approach #1?

np.einsum C. i, j, k ( C, ), .

i, j, , ( C), BLAS matrix-multiplication np.dot. .

+6

Source: https://habr.com/ru/post/1691278/


All Articles