Removing extreme values in a vector in Matlab?

Question

Removing extreme values in a vector in Matlab?

So I have = [2 7 4 9 2 4 999]

And I would like to remove 999 from the matrix (which is an obvious outlier).

Is there a general way to remove such values? I have a set of vectors, and not all of them have such extreme values. prctile (a, 99.5) is going to print the largest number in the vector no matter how extreme (or not extreme) it is.

+4

matlab

InquilineKea Mar 12 '13 at 23:22

source share

3 answers

I can think of two:

Sort the matrix and remove the n-elements above and below.
Calculate the mean and standard deviation and discard all values outside the range: mean +/- (n * standard deviation)

In both cases, n must be selected by the user.

+2

sfotiadis Mar 12 '13 at 23:28

source share

Filter your signal.

 %choose the value N = 10; filtered = filter(ones(1,N)/N, 1, signal);

Find the noise

 noise = signal - filtered;

Remove noisy items

 THRESH = 50; signal = signal(abs(noise) < THRESH);

This is better than mean+-n*stddev , because it searches for local changes, so it will not be interrupted by a slowly changing signal, for example [1 2 3 ... 998 998] .

+1

Dmitry Galchinsky Mar 12 '13 at 23:44

source share

bla · Accepted Answer · 2013-03-12T23:31:36+0000

There are several ways to do this, but first you must determine what is “extreme”? Is it above a certain threshold above a certain number of standard deviations? Or, if you know that you have exactly n these extreme events and that their values are greater than the others, you can use sort and delete the last n elements. etc.

For example, a(a>threshold)=[] takes care of the threshold as a definition, and a(a>mean(a)+n*std(a))=[] takes care of dropping the n standard deviation above the average of a .

A completely different approach is to use the median a , if the vector is as short as you mention, you want to look at the median value, and then you can either generate something higher than some coefficient of this value a(a>n*median(a))=[] .

Finally, a way to evaluate the approach to treating these spikes would be to take a histogram of the data and work from there ...

Removing extreme values ​​in a vector in Matlab?

More articles:

Removing extreme values in a vector in Matlab?