Given that the number of matrices is known before Theano is compiled, you can simply use regular Python lists from Theano matrices.
Here is a complete example showing the difference between numpy and Theano versions.
This code has been updated to include comparison with @Divakar's vectorized approach, which works better. For Teano, two vectorized approaches are possible, where Teano performs concatenation, and the other is that numpy performs concatenation, the result of which is then passed to Theano.
import timeit import numpy as np import theano import theano.tensor as tt def compile_theano_version1(number_of_matrices, n, dtype): assert number_of_matrices > 0 assert n > 0 L = [tt.matrix() for _ in xrange(number_of_matrices)] res = tt.zeros(n, dtype=dtype) for M in L: res += tt.dot(MT, M) return theano.function(L, res) def compile_theano_version2(number_of_matrices): assert number_of_matrices > 0 L = [tt.matrix() for _ in xrange(number_of_matrices)] concatenated_L = tt.concatenate(L, axis=0) res = tt.dot(concatenated_L.T, concatenated_L) return theano.function(L, res) def compile_theano_version3(): concatenated_L = tt.matrix() res = tt.dot(concatenated_L.T, concatenated_L) return theano.function([concatenated_L], res) def numpy_version1(*L): assert len(L) > 0 n = L[0].shape[1] res = np.zeros((n, n), dtype=L[0].dtype) for M in L: res += np.dot(MT, M) return res def numpy_version2(*L): concatenated_L = np.concatenate(L, axis=0) return np.dot(concatenated_L.T, concatenated_L) def main(): iteration_count = 100 number_of_matrices = 20 n = 300 min_x = 400 dtype = 'float64' theano_version1 = compile_theano_version1(number_of_matrices, n, dtype) theano_version2 = compile_theano_version2(number_of_matrices) theano_version3 = compile_theano_version3() L = [np.random.standard_normal(size=(x, n)).astype(dtype) for x in range(min_x, number_of_matrices + min_x)] start = timeit.default_timer() numpy_res1 = np.sum(numpy_version1(*L) for _ in xrange(iteration_count)) print 'numpy_version1', timeit.default_timer() - start start = timeit.default_timer() numpy_res2 = np.sum(numpy_version2(*L) for _ in xrange(iteration_count)) print 'numpy_version2', timeit.default_timer() - start start = timeit.default_timer() theano_res1 = np.sum(theano_version1(*L) for _ in xrange(iteration_count)) print 'theano_version1', timeit.default_timer() - start start = timeit.default_timer() theano_res2 = np.sum(theano_version2(*L) for _ in xrange(iteration_count)) print 'theano_version2', timeit.default_timer() - start start = timeit.default_timer() theano_res3 = np.sum(theano_version3(np.concatenate(L, axis=0)) for _ in xrange(iteration_count)) print 'theano_version3', timeit.default_timer() - start assert np.allclose(numpy_res1, numpy_res2) assert np.allclose(numpy_res2, theano_res1) assert np.allclose(theano_res1, theano_res2) assert np.allclose(theano_res2, theano_res3) main()
When doing these prints (something like)
numpy_version1 1.47830819649 numpy_version2 1.77405482179 theano_version1 1.3603150303 theano_version2 1.81665318145 theano_version3 1.86912039489
Statements pass, showing that both Theano and numpy versions both calculate the same result with a high degree of accuracy. Obviously, this accuracy will decrease if float32 used instead of float64 .
Synchronization results show that a vector approach cannot be preferred; it depends on the size of the matrix. In the above example, the matrices are large, and the approach without concatenation is faster, but if the parameters n and min_x changed in the main function much less, then the concatenation approach is faster. Other results may occur while working on the GPU (Theano version only).