Faster convolution of probability density functions in Python

Question

Faster convolution of probability density functions in Python

Suppose that it is necessary to calculate the convolution of the total number of discrete probability density functions. The following example shows four distributions that take values of 0,1,2 with given probabilities:

import numpy as np pdfs = np.array([[0.6,0.3,0.1],[0.5,0.4,0.1],[0.3,0.7,0.0],[1.0,0.0,0.0]])

A convolution can be found as follows:

 pdf = pdfs[0] for i in range(1,pdfs.shape[0]): pdf = np.convolve(pdfs[i], pdf)

The probabilities of seeing 0,1, ..., 8 are then given by the expression

 array([ 0.09 , 0.327, 0.342, 0.182, 0.052, 0.007, 0. , 0. , 0. ])

This part is a bottleneck in my code, and there seems to be something available to vectorize this operation. Does anyone have a suggestion to do this faster?

Alternatively, a solution in which you could use

 pdf1 = np.array([[0.6,0.3,0.1],[0.5,0.4,0.1]]) pdf2 = np.array([[0.3,0.7,0.0],[1.0,0.0,0.0]]) convolve(pd1,pd2)

and get pairwise convolutions

  array([[ 0.18, 0.51, 0.24, 0.07, 0. ], [ 0.5, 0.4, 0.1, 0. , 0. ]])

also very helpful.

+6

python vectorization numpy convolution probability density

Forzaa Mar 6 '15 at 14:45

source share

1 answer

Mark dickinson · Accepted Answer · 2015-03-24T15:10:51+0000

You can efficiently calculate the convolution of all your PDF files using fast Fourier transforms (FFTs): the key fact is that FFT convolution is a product of the FFT of the individual probability density functions. Thus, convert each PDF, merge the converted PDF files together, and then perform the reverse conversion. You will need to embed each input PDF with zeros to the appropriate length to avoid wrapparound effects.

This should be reasonably efficient: if you have m PDF files, each of which contains n entries, then the time to calculate the convolution using this method should grow as (m^2)n log(mn) . Time is dominated by FFTs, and we efficiently compute independent FFTs m + 1 ( m forward transforms and one inverse transform), each of an array no longer than mn . But, as always, if you need real timings, you need a profile.

Here is the code:

 import numpy.fft def convolve_many(arrays): """ Convolve a list of 1d float arrays together, using FFTs. The arrays need not have the same length, but each array should have length at least 1. """ result_length = 1 + sum((len(array) - 1) for array in arrays) # Copy each array into a 2d array of the appropriate shape. rows = numpy.zeros((len(arrays), result_length)) for i, array in enumerate(arrays): rows[i, :len(array)] = array # Transform, take the product, and do the inverse transform # to get the convolution. fft_of_rows = numpy.fft.fft(rows) fft_of_convolution = fft_of_rows.prod(axis=0) convolution = numpy.fft.ifft(fft_of_convolution) # Assuming real inputs, the imaginary part of the output can # be ignored. return convolution.real

Applying this to your example, here is what I get:

 >>> convolve_many([[0.6, 0.3, 0.1], [0.5, 0.4, 0.1], [0.3, 0.7], [1.0]]) array([ 0.09 , 0.327, 0.342, 0.182, 0.052, 0.007])

This is the main idea. If you want to change this, you can also look at numpy.fft.rfft (and its reverse, numpy.fft.irfft ), which use the fact that the input is real to create more compact converted arrays. You can also get some speed by filling the rows array with zeros so that the total number of columns is optimal for FFT execution. The definition of "optimal" here will depend on the implementation of the FFT, but, for example, the strengths of the two will be good goals. Finally, there are some obvious simplifications that can be made when creating rows if all input arrays are the same length. But I will leave these potential improvements for you.

Faster convolution of probability density functions in Python

More articles: