You effectively perform nmultiple convolution [1, 1, 1, ..., 1, 1, 1].
So, have you considered using FFT on a sufficiently zero array, raise its elements to power nand using the inverse FFT to restore all coefficients
(1+x+x**2+x**3+...+x**(k-1))**n
and then just reading the ones that interest you?
UPDATE
Since FFT is circular, you will need an array that is at least the number of terms in
(1+x+x**2+x**3+...+x**(k-1))**n
or, in other words, (k-1)*n+1so that the results do not wrap at the ends (or, at least when they do, they add only zeros to the affected elements). Often its length should also be equal to power, since this is required by the FFT algorithm (implementations that do not require this will fill your input with zeros until it appears).
In a C-like pseudo-code:
unsigned int m = 1;
while(m<(k-1)*n+1) m *= 2;
complex c[m];
for(unsigned int i=0;i!=k;++i) c[i] = complex(1.0, 0.0);
for(unsigned int i=k;i!=m;++i) c[i] = complex(0.0, 0.0);
c = fft(c);
for(unsigned int i=0;i!=m;++i) c[i] = pow(c[i], double(n));
c = inv_fft(c);
, r - c , x**r .
, , , . , , , k n 0,5, , .
, numpy FFT , numpy.fft.rfft numpy.fft.irfft , , .