Eliminate standard deviation outliers in SQL Server

I am trying to eliminate outliers in SQL Server 2008 by standard deviation. I would like only records that contain a value in a specific column within the standard deviation of that column within +/- 1.

How can i do this?

+3
source share
3 answers

If you assume the distribution of events in a bell, then only 68% of the values ​​will be within 1 standard deviation from the mean (95% are covered by two standard deviations).

( stdev/stdevp sql-), , .

declare @stdtest table (colname varchar(20), colvalue int)

insert into @stdtest (colname, colvalue) values ('a', 2)
insert into @stdtest (colname, colvalue) values ('b', 4)
insert into @stdtest (colname, colvalue) values ('c', 4)
insert into @stdtest (colname, colvalue) values ('d', 4)
insert into @stdtest (colname, colvalue) values ('e', 5)
insert into @stdtest (colname, colvalue) values ('f', 5)
insert into @stdtest (colname, colvalue) values ('g', 7)
insert into @stdtest (colname, colvalue) values ('h', 9)

declare @std decimal
declare @mean decimal
declare @lower decimal
declare @higher decimal
declare @noofstds int

select @std = STDEV(colvalue), @mean = AVG(colvalue) from @stdtest

--68%
set @noofstds = 1
select @lower = @mean - (@noofstds * @std)
select @higher = @mean + (@noofstds * @std)

select @lower, @higher, * from @stdtest where colvalue between @lower and @higher

--returns rows with a colvalue between 3 and 7 inclusive

--95%
set @noofstds = 2
select @lower = @mean - (@noofstds * @std)
select @higher = @mean + (@noofstds * @std)

select @lower, @higher, * from @stdtest where colvalue between @lower and @higher

--returns rows with a colvalue between 1 and 9 inclusive
+16

SQL STDEV, . , +/- STDEV.

-

    create table #test
(
   testNumber int
   )

   INSERT INTO #test (testNumber)
   SELECT  2
   UNION ALL 
   SELECT 4
   UNION ALL 
   SELECT 4
   UNION ALL 
   SELECT 4
   UNION ALL 
   SELECT 5
   UNION ALL 
   SELECT 5
   UNION ALL 
   SELECT 7
   UNION ALL 
   SELECT 9

   SELECT testNumber FROM #test t
   JOIN (
    SELECT STDEV (testnumber) as [STDEV], AVG(testnumber) as mean
    FROM #test
        ) X on t.testNumber >= X.mean - X.STDEV AND t.testNumber <= X.mean + X.STDEV
+4

, . , , , . " ", , , , .

You do not give any context or explanation of what you are doing. It is easy to provide a function or technique that will meet the needs of your particular case, but I thought it advisable to post a warning until additional information is provided.

0
source

Source: https://habr.com/ru/post/1750849/


All Articles