Theano: how to take the "matrix external product", where the elements are matrices

Question

Theano: how to take the "matrix external product", where the elements are matrices

Basically, I have 2 tensors: A, where A.shape = (N, H, D) and B, where B.shape = (K, H, D) . What I would like to do is get the tensor C with the form (N, K, D, H) such that:

 C[i, j, :, :] = A[i, :, :] * B[j, :, :].

Can this be done efficiently in Theano?

Side note: the actual end result that I would like to achieve is to have an E-shaped tensor (N, K, D) such that:

 E[i, j, :] = (A[i, :, :]*B[j, :, :]).sum(0)

So, if there is a way to get it straight, I would prefer it (saves space, I hope).

+5

python numpy theano matrix

Theo Jan 22 '16 at 16:35

source share

2 answers

Divakar · Answer 1 · 2016-01-22T17:01:43+0000

You can suggest one approach that uses broadcasting -

 (A[:,None]*B).sum(2)

Note that the created intermediate array has the form (N, K, H, D) until the sum decreases by axis=2 , reducing it to (N,K,D).

Matt graham · Answer 2 · 2016-09-11T13:03:52+0000

You can get the final three-dimensional result of E without creating a large intermediate array with batched_dot :

 import theano.tensor as tt A = tt.tensor3('A') # A.shape = (D, N, H) B = tt.tensor3('B') # B.shape = (D, H, K) E = tt.batched_dot(A, B) # E.shape = (D, N, K)

Unfortunately, this requires you to rearrange the sizes on your input and output arrays. Although this can be done with the dimshuffle in Theano, it seems that batched_dot cannot handle arbitrarily strikethrough arrays, and therefore the following raise a ValueError: Some matrix has no unit stride when E is evaluated:

 import theano.tensor as tt A = tt.tensor3('A') # A.shape = (N, H, D) B = tt.tensor3('B') # B.shape = (K, H, D) A_perm = A.dimshuffle((2, 0, 1)) # A_perm.shape = (D, N, H) B_perm = B.dimshuffle((2, 1, 0)) # B_perm.shape = (D, H, K) E_perm = tt.batched_dot(A_perm, B_perm) # E_perm.shape = (D, N, K) E = E_perm.dimshuffle((1, 2, 0)) # E.shape = (N, K, D)

batched_dot uses scan in the first dimension (size D ). Since scan is run sequentially, it can be computationally less efficient than computing all products in parallel if they run on a GPU.

You can exchange between the memory efficiency of the batched_dot approach and parallelism in the broadcast approach using scan explicitly. The idea would be to calculate the complete product C for batches of size M in parallel (assuming that M is an exact coefficient of D ), iterating over batches with scan :

 import theano as th import theano.tensor as tt A = tt.tensor3('A') # A.shape = (N, H, D) B = tt.tensor3('B') # B.shape = (K, H, D) A_batched = A.reshape((N, H, M, D / M)) B_batched = B.reshape((K, H, M, D / M)) E_batched, _ = th.scan( lambda a, b: (a[:, :, None, :] * b[:, :, :, None]).sum(1), sequences=[A_batched.T, B_batched.T] ) E = E_batched.reshape((D, K, N)).T # E.shape = (N, K, D)

Theano: how to take the "matrix external product", where the elements are matrices

More articles: