SciPy helped the wrong result

Hi python enthusiasts!

I am currently working with signal filtering for research purposes and decided to use SciPy. Nothing special, just automation of routine work.

So here is the code

from scipy.signal import medfilt print(medfilt([2,6,5,4,0,3,5,7,9,2,0,1], 5)) 

But the fact is that the returned sequence is not calculated correctly

 SciPy: [ 2. 4. 4. 4. 4. 4. 5. 5. 5. 2. 1. 0.] Me : [ 5. 4.5 4. 4. 4. 4. 5. 5. 5. 2. 1.5 1.] 

It seems that the developers of the package messed up one detail. When the aperture (core in SciPy terms) is larger than the analysis window, there is another filtering rule.

For example, with kernel=5 filtered subsequence [2, 6, 5] has a median of 5, not 2, as calculated by SciPy, isn't it? Similarly, if kernel=5 for the subsequence [2,6,5,4] median is 5 and 4, we need to take the average between them, so the median is 4.5.

Can someone explain to me who got the correct result in this case?

+6
source share
1 answer

I believe that both you and SciPy have the right results. The difference is what happens at the borders, but I believe that you and SciPy made the right choice.

The question is what should happen when your sliding window is at the edges and there is no valid data to fill your sliding window .

You have chosen the median real part of the sliding window, which makes sense, but may add some bias, because your extreme points are too represented compared to all other points.

SciPy instead chose to extend the signal on either edge with fill numbers. So, at the borders, SciPy essentially computes

 >>> np.median([0, 0, 2, 6, 5]) 2.0 >>> np.median([0, 2, 6, 5, 4]) 4.0 >>> np.median([9, 2, 0, 1, 0]) 1.0 >>> np.median([2, 0, 1, 0, 0]) 0.0 

The reason SciPy does this is almost certainly speed-related: it is optimized to do the same thing many times, and it’s much easier to optimize the median for a whole group of 5-element arrays than to optimize it for a whole group of 5- element arrays, as well as two 4-element arrays and two 3-element arrays. There is definitely an argument that should be made that it should not be pounded with zeros, but instead with boundary values, but it should be noted that no border strategy will be perfect; The ideal way to resolve border issues will depend on your specific signal.

If you see a description of the Wikipedia median filters , they expand the signal at each of the edges, supplementing it with a value at the edges, which also seems reasonable. They also note these three other ways to solve boundary problems:

  • Avoid border processing with or without trimming the signal border.
  • Getting records from other places in the signal. With images, for example, records from the extreme horizontal or vertical border can be selected.
  • Reduce the window next to the borders, so that each window is filled (as you did.)

In the end, you really need to try different options and see what works best for your signal. The main assumption of this kind of filtering is that your signal will be quite large, and the boundary problem should never be so critical (since the bulk of the signal does not exist at the border). It would be nice if SciPy allowed you to choose what it should do at the borders, though!

+13
source

Source: https://habr.com/ru/post/971803/


All Articles