The relationship between sigma and bandwidth in gaussian_filter and gaussian_kde

Using scipy.ndimage.filters.gaussian_filter and scipy.stats.gaussian_kde functions for a given dataset can give very similar results if the sigma and bw_method in each function are appropriately selected.

For example, I can get the following graphs for a random two-dimensional distribution of points by setting sigma=2. in gaussian_filter (left graph) and bw_method=sigma/30. in gaussian_kde (right graph):

enter image description here

(MWE is at the bottom of the question)

Obviously, there is a connection between these parameters, since a Gaussian filter is used, and the other is a Gaussian estimator of the core density from the data.

Definition of each parameter:

sigma: scalar or sequence of scalars. Standard deviation for the Gaussian core. The standard deviations of the Gaussian filter are given for each axis as a sequence or as a singular, in which case it is equal for all axes.

I can understand this, given the definition of a Gaussian operator:

enter image description here

bw_method: str, scalar or callable, optional. The method used to calculate the width of the evaluation band. It can be "Scott", silver, scalar constant or called. If scalar, it will be used directly as kde.factor. If called, it should take an instance of gaussian_kde as soon as the parameter and returns a scalar. If "No" (default), "scott used. See Notes for more details.

In this case, suppose the input for bw_method is a float to be comparable to sigma . Here, where I get lost, since I can not find any information about this kde.factor parameter.

What I would like to know is an exact mathematical equation that relates both of these parameters (i.e. sigma and bw_method when using a float), if possible.


MWE:

 import numpy as np from scipy.stats import gaussian_kde from scipy.ndimage.filters import gaussian_filter import matplotlib.pyplot as plt def rand_data(): return np.random.uniform(low=1., high=200., size=(1000,)) # Generate 2D data. x_data, y_data = rand_data(), rand_data() xmin, xmax = min(x_data), max(x_data) ymin, ymax = min(y_data), max(y_data) # Define grid density. gd = 100 # Define bandwidth bw = 2. # Using gaussian_filter # Obtain 2D histogram. rang = [[xmin, xmax], [ymin, ymax]] binsxy = [gd, gd] hist1, xedges, yedges = np.histogram2d(x_data, y_data, range=rang, bins=binsxy) # Gaussian filtered histogram. h_g = gaussian_filter(hist1, bw) # Using gaussian_kde values = np.vstack([x_data, y_data]) # Data 2D kernel density estimate. kernel = gaussian_kde(values, bw_method=bw / 30.) # Define x,y grid. gd_c = complex(0, gd) x, y = np.mgrid[xmin:xmax:gd_c, ymin:ymax:gd_c] positions = np.vstack([x.ravel(), y.ravel()]) # Evaluate KDE. z = kernel(positions) # Re-shape for plotting z = z.reshape(gd, gd) # Make plots. fig, (ax1, ax2) = plt.subplots(1, 2) # Gaussian filtered 2D histograms. ax1.imshow(h_g.transpose(), origin='lower') ax2.imshow(z.transpose(), origin='lower') plt.show() 
+6
source share
1 answer

There is no relationship because you are doing two different things.

With scipy.ndimage.filters.gaussian_filter, you filter a 2D variable (image) with a kernel, and that kernel turns out to be Gaussian. This, in fact, smoothing the image.

With scipy.stats.gaussian_kde, you are trying to evaluate the probability density function of your 2D variable. The bandwidth (or smoothing parameter) is your integration step and should be as small as the data allows.

Both images look the same because your uniform distribution from which you drew the samples is no different from the usual distribution. Obviously, you will get a better score with normal kernel function.

You can read the kernel density estimate .

Edit : In kernel density estimation (KDE), kernels are scaled so that the bandwidth is the standard deviation of the smoothing kernel. What bandwidth to use is not obvious, as it depends on the data. There is an optimal choice for one-dimensional data, called the Silverman rule.

To summarize, there is no connection between the standard deviation of a Gaussian filter and the bandwidth of KDE, because we are talking about oranges and apples. However, speaking of KDE only , there is a relationship between the KDE width and the standard deviation of the same KDE kernel. They are equal! In fact, the implementation details differ, and there may be a scaling depending on the size of the kernel. You can read your specific package gaussian_kde.py

+3
source

Source: https://habr.com/ru/post/975038/


All Articles