From my comments:
In the general case, to estimate the density, the Gaussian role acts as a “window” function, and the “covariance” of this window (in fact, the throughput parameter in the one-dimensional case) is intended only to control how the window response falls as a function of the distance to the point. I am not familiar with any KDE procedure that tries to use a specific multivariate covariance structure for this window drop effect.
I would also suggest that the most difficult “covariance” that would be practical would be a diagonal matrix, where you used a different bandwidth parameter for each data measurement. Maybe (and this may seem very insignificant), you could do some PCA analysis of the main directions of your data and place different bandwidths there, but I think it is unlikely that this will be a return, unless the data have of completely different scales, in which case you would be better off just clogging your inputs before doing KDE in the first place and using one bandwidth.
If you read the KDE examples from scikits.learn and the documentation for their KernelDensity class , it also seems that (like SciPy) they just offer you a bandwidth function (one floating point number) to summarize how the kernel response will fall.
It seems to me that it does not have much practical interest in order to have a lot of control over the multivariate bandwidth settings. It’s best to do some evaluations or standardization to transform your input variables in a way that makes them the same (so that smoothing in every direction with the same scale is appropriate), and then use KDE to predict or classify the values into this transformed space and apply the inverse transforms to each coordinate if you want to return to the original scaled space.
ely source share