The best way to write loops in numpy is to not write loops and use vectorized operations instead. For instance:
c = 0 for i in range(len(a)): c += a[i] + b[i]
becomes
c = np.sum(a + b, axis=0)
For a and b with the form (100000, 100) it takes 0.344 seconds in the first embodiment and 0.062 seconds in the second.
In the case presented in your question, the following does what you want:
sol['ems'][naxis:] = numpy.ravel( numpy.repeat( data['ems'][naxis:,ispec,numpy.newaxis] * ems_mod, nplanes, axis=1 ), order='F' )
This can be further optimized with some tricks , but it will reduce clarity and probably premature optimization, because:
simple python: 0.064 seconds
numpy: 0.002 seconds
The solution works as follows:
The original version contains jp = naxis + ip , which simply skips the first naxis elements [naxis:] selects everything except the first naxis elements. Your inner loop repeats the data[jp,ispec] for nplanes times and writes it to several locations ip3d = jp + npoints_per_plane * ipl , which is equivalent to shifting the offset 2D matrix by naxis . Therefore, the second dimension is added via numpy.newaxis to (previously 1D) data['ems'][naxis:, ispec] , the values ββare repeated nplanes times in this new dimension via numpy.repeat . The resulting 2D array is then smoothed again through numpy.ravel (in Fortran order, that is, with the smallest axis with the smallest step) and written to the corresponding sol['ems'] subarray. If the target array was in fact 2D, repetition may be skipped by automatically broadcasting the array.
If you are faced with a situation where you cannot avoid using loops, you can use Cython (which supports efficient buffer representations on numpy arrays).