Statistics of a large stream of integers in java

I am reading a huge number of integers from a file, and in the end I want to get basic statistics from these Integers (median, average, 25th, 75th, etc.). I could calculate some of these statistics on the go, but it seems to me that calculating the 25th / 75th percentile would be difficult. The simplest approach, I think, would be to put Integers on a list and make statistics from that list. However, since the list is so large, it can slow down the program to use a large amount of memory. Do you have any suggestions? This is how I get the data and the two options I was thinking about:

Scanner input = new Scanner(new File("name")); ArrayList<Integer> lits= new ArrayList<Integer>(); while(input.hasNextLine()){ list.add(Integer.parseInt(input.nextLine())); } doStatistics(list); 

OR

 Scanner input = new Scanner(new File("name")); while(input.hasNextLine()){ //I dont know how I would acomplish this for the percentile stats acqquireStats(Integer.parseInt(input.nextLine())); } 
+6
source share
2 answers

Given that the number of values ​​is much less than the number of samples, it makes sense to store the number by value than the opposite.

 Long[] samples = new Long[101]; while(input.hasNextLine()){ try{ samples[Math.max(0, Math.min(100, Integer.parseInt(input.nextLine())))]; } catch (ParseException e){/*not a number*/} } 

This gives you a huge dataset represented by just a tiny array.

+3
source
+1
source

Source: https://habr.com/ru/post/918119/


All Articles