I believe that both you and SciPy have the right results. The difference is what happens at the borders, but I believe that you and SciPy made the right choice.
The question is what should happen when your sliding window is at the edges and there is no valid data to fill your sliding window .
You have chosen the median real part of the sliding window, which makes sense, but may add some bias, because your extreme points are too represented compared to all other points.
SciPy instead chose to extend the signal on either edge with fill numbers. So, at the borders, SciPy essentially computes
>>> np.median([0, 0, 2, 6, 5]) 2.0 >>> np.median([0, 2, 6, 5, 4]) 4.0 >>> np.median([9, 2, 0, 1, 0]) 1.0 >>> np.median([2, 0, 1, 0, 0]) 0.0
The reason SciPy does this is almost certainly speed-related: it is optimized to do the same thing many times, and itβs much easier to optimize the median for a whole group of 5-element arrays than to optimize it for a whole group of 5- element arrays, as well as two 4-element arrays and two 3-element arrays. There is definitely an argument that should be made that it should not be pounded with zeros, but instead with boundary values, but it should be noted that no border strategy will be perfect; The ideal way to resolve border issues will depend on your specific signal.
If you see a description of the Wikipedia median filters , they expand the signal at each of the edges, supplementing it with a value at the edges, which also seems reasonable. They also note these three other ways to solve boundary problems:
- Avoid border processing with or without trimming the signal border.
- Getting records from other places in the signal. With images, for example, records from the extreme horizontal or vertical border can be selected.
- Reduce the window next to the borders, so that each window is filled (as you did.)
In the end, you really need to try different options and see what works best for your signal. The main assumption of this kind of filtering is that your signal will be quite large, and the boundary problem should never be so critical (since the bulk of the signal does not exist at the border). It would be nice if SciPy allowed you to choose what it should do at the borders, though!