Why is a large java array slow

I created an array of a class with a long length, ~ 150M elements, sorted by key (described below). Then I create a simple http server for the feedback of each request as a binary search function in an array. (I'm sure the server is working fine)

Initiating data is just fine (ofcouse is pretty slow). The binary search function is fast as expected.

The problem is that the response is fast for a while (10 minutes, 1 hour ... a lot of time), then the server takes quite a lot of time (several minutes) to execute the binary search function for the request, then quickly back and then slowly after a while ... Although it is slow, I check the server status (htop), it seems that jvm is in GC.

The problem does not arise when I split a large array into smaller ones, for example: 10 arrays of 15M elements, I find the target array before searching. So I think something happens in the JVM when I create an array too large

(Edit: I have no problem splitting a large array into pieces, because I implemented the SiteInfo object for native, the large number of objects in the JVM is reduced. Therefore, the problem caused by too many objects that I created as answers below, thanks to you guys)

Do you guys have any ideas on my problem?

(I sent my code, there are some pseudo codes that I think are not very important)

public static class Token2TopSite implements Comparable<Token2TopSite> { public final String token; // this is key for binary search public final SiteInfo[] topSites; // just data, not important at this question, I think public Token2TopSite(String token, SiteInfo[] topSites) { this.token = token; this.topSites = topSites; } @Override public int compareTo(Token2TopSite o) { return token.compareTo(o.token); } public static void main(String[] args) { Token2TopSite[] array = new Token2TopSite[150 * 1000000]; ...; // init data for array, this runs properly Arrays.sort(array); startServerOnArray(array); // each request is a element search on the array } } 
+6
source share
2 answers

I think the diagnosis of Omry Yadan is probably correct. They sound like GC pauses, and they are likely to be especially bad if you have a huge amount of long-lived reachable objects in the heap. GC crawls all living objects while executing a "complete" collection.

(You can confirm that this is indeed a GC-related issue by enabling GC logging and comparing times when server performance is slow with GC events.)

However, I do not agree with his proposed solution.

Instead of rewriting the application, a simpler solution is to configure the JVM to use the parallel or low pause garbage collector. It is simply a matter of setting some parameters in the command that starts your web server JVM.

Here are some Oracle links:

+4
source

The JVM does not work correctly if there are more than 10 million million objects. the reason is that the garbage collector, when it searches for free memory, must traverse all of your objects if it does not find anything free. one solution is to use fewer objects. I wrote a primitive collection library called Banana that manages memory. it basically creates a single (or several) array of int (int []) and allows you to create dynamic data structures on top of this (several are already implemented, HashTable, LinkedList, LRUCache, etc.).

+1
source

Source: https://habr.com/ru/post/973221/


All Articles