How to detect outliers in ArrayList

I am trying to recall some code that will allow me to search through my ArrayList and find any values ​​outside the general range of "good values".

Example: 100 105 102 13 104 22 101

How can I write code to find that (in this case) 13 and 22 do not fall into “good values” around 100?

+4
source share
6 answers

several criteria for detecting outliers. The simplest ones, such as the Chowenet test , use the mean and standard deviation calculated from the sample to determine the “normal” range of values. Any value outside this range is considered an outlier.

Other criteria are the Grubb test and Dixon Q test and can give better results than Chowenet, for example, if the sample is obtained due to skew.

+6
source
package test; import java.util.ArrayList; import java.util.Collections; import java.util.List; public class Main { public static void main(String[] args) { List<Double> data = new ArrayList<Double>(); data.add((double) 20); data.add((double) 65); data.add((double) 72); data.add((double) 75); data.add((double) 77); data.add((double) 78); data.add((double) 80); data.add((double) 81); data.add((double) 82); data.add((double) 83); Collections.sort(data); System.out.println(getOutliers(data)); } public static List<Double> getOutliers(List<Double> input) { List<Double> output = new ArrayList<Double>(); List<Double> data1 = new ArrayList<Double>(); List<Double> data2 = new ArrayList<Double>(); if (input.size() % 2 == 0) { data1 = input.subList(0, input.size() / 2); data2 = input.subList(input.size() / 2, input.size()); } else { data1 = input.subList(0, input.size() / 2); data2 = input.subList(input.size() / 2 + 1, input.size()); } double q1 = getMedian(data1); double q3 = getMedian(data2); double iqr = q3 - q1; double lowerFence = q1 - 1.5 * iqr; double upperFence = q3 + 1.5 * iqr; for (int i = 0; i < input.size(); i++) { if (input.get(i) < lowerFence || input.get(i) > upperFence) output.add(input.get(i)); } return output; } private static double getMedian(List<Double> data) { if (data.size() % 2 == 0) return (data.get(data.size() / 2) + data.get(data.size() / 2 - 1)) / 2; else return data.get(data.size() / 2); } } 

Output: [20.0]

Explanation:

  • Sort list of integers, low to high
  • Divide the list of integers into two parts (in the middle) and put them in 2 new separate lists of ArrayLists (name them "left" and "right")
  • Find the average (median) in both new ArrayLists
  • Q1 is the median on the left side, Q3 is the median on the right side
  • Application of a mathematical formula:
  • IQR = Q3 - Q1
  • LowerFence = Q1 - 1.5 * IQR
  • UpperFence = Q3 + 1.5 * IQR
  • More on this formula: http://www.mathwords.com/o/outlier.htm
  • Scroll through all of my original elements, and if any of them falls below the lower fence or above the upper fence, add them to the "output" ArrayList
  • This new "exit" ArrayList contains outliers
+2
source
  • find the average for your list
  • create a Map that displays the number at a distance from the average
  • sort values ​​by distance from average
  • and differentiate the last n number, making sure that there is no injustice with the distance
+1
source

Use this algorithm. This algorithm uses the mean and standard deviation. These 2 numbers are optional values ​​(2 * standard deviation).

  public static List<int> StatisticalOutLierAnalysis(List<int> allNumbers) { if (allNumbers.Count == 0) return null; List<int> normalNumbers = new List<int>(); List<int> outLierNumbers = new List<int>(); double avg = allNumbers.Average(); double standardDeviation = Math.Sqrt(allNumbers.Average(v => Math.Pow(v - avg, 2))); foreach (int number in allNumbers) { if ((Math.Abs(number - avg)) > (2 * standardDeviation)) outLierNumbers.Add(number); else normalNumbers.Add(number); } return normalNumbers; } 
+1
source

The implementation of the Grubb test can be found on MathUtil.java . It will find one outlier that you can remove from your list and repeat until you delete all outliers.

Depends on commons-math , so if you use Gradle:

 dependencies { compile 'org.apache.commons:commons-math:2.2' } 
+1
source

This is just a very simple implementation that retrieves information whose number is not in the range:

 List<Integer> notInRangeNumbers = new ArrayList<Integer>(); for (Integer number : numbers) { if (!isInRange(number)) { // call with a predefined factor value, here example value = 5 notInRangeNumbers.add(number, 5); } } 

Also, inside the isInRange method isInRange you must determine what you mean by “good values” . Below you will find an example implementation.

 private boolean isInRange(Integer number, int aroundFactor) { //TODO the implementation of the 'in range condition' // here the example implementation return number <= 100 + aroundFactor && number >= 100 - aroundFactor; } 
0
source

Source: https://habr.com/ru/post/1502200/


All Articles