Out of memory: multithreading using hashset

I have implemented a java program. It is basically a multi-threaded service with a fixed number of threads. Each thread performs one task at a time, creates a hash set, the size of the hash set can vary from 10 to 20,000+ elements in one hash set. At the end of each stream, the result is added to the general list of the collection, using synchronized.

The problem occurs at some point when I get an exception from memory. Now, after a little research, I found that this memory exception occurs when the GC is busy clearing the memory, and at that moment it stops the whole world to do something.

Please give me tips on working with so much data. Is a hashset the right data structure to use? How to deal with memory exclusion, I mean that one of the ways is to use System.GC (), which again is not good, as this will slow down the whole process. Or is it possible to display "HashSet hsN" after I add it to the general list of collections?

Please let me know your thoughts and guide me wherever I am wrong. This service will deal with a huge amount of data processing.

thanks

//business object - to save the result of thread execution public class Location{ integer taskIndex; HashSet<Integer> hsN; } //task to be performed by each thread public class MyTask implements Runnable { MyTask(long task) { this.task = task; } @Override public void run() { HashSet<Integer> hsN = GiveMeResult(task);//some function calling which returns a collection of integer where the size vary from 10 to 20000 synchronized (locations) { locations.add(task,hsN); } } } public class Main { private static final int NTHREDS = 8; private static List<Location> locations; public static void main(String[] args) { ExecutorService executor = Executors.newFixedThreadPool(NTHREDS); for (int i = 0; i < 216000; i++) { Runnable worker = new MyTask(i); executor.execute(worker); } // This will make the executor accept no new threads // and finish all existing threads in the queue executor.shutdown(); // Wait until all threads are finish while (!executor.isTerminated()) { } System.out.println("Finished all threads"); } } 

For such an implementation, is JAVA the best choice or C # .net4?

+3
source share
5 answers

A few problems that I see:

  • You synchronize the MyTask object, which is created separately for each execution. You should synchronize the shared object, preferably the one you are modifying, i.e. The locations object.

  • 216,000 runs, multiplied by 10,000 returned objects each, multiplied by a minimum of 12 bytes per Integer object, is about 24 GB of memory. Do you even have such physical memory available on your computer, much less accessible to the JVM?

    32-bit JVMs have a heap size limit of less than 2 GB. With a 64-bit JVM, on the other hand, an Integer object takes up about 16 bytes, which increases memory requirements to more than 30 GB.

    With these numbers, it comes OutOfMemoryError no surprise that you get an OutOfMemoryError ...

    PS: If you have such physical memory, and you still think that you are doing the right thing, maybe you should take a look at setting the JVM heap size .

EDIT:

Even with the 25 GB of memory available to the JVM, it can still click it:

  • Each Integer object requires 16 bytes on modern 64-bit JVMs.

  • You will also need an 8-byte link that points to it, no matter what List implementation you use.

  • If you use the implementation of a linked list, each record will also have service data of at least 24 bytes for the list entry object.

At best, you can hope to store about 1,000,000,000 Integer objects in 25 GB - half if you use a linked list. This means that each task cannot produce more than 5,000 (2,500) objects on average without causing errors.

I'm not sure about your specific requirement, but have you thought about returning a more compact object? For example, an int[] array created from each HashSet would save at least 4 bytes per result without the overhead of the object container.

EDIT 2:

I just realized that you store HashSet objects yourself in a list. HashSet objects use a HashMap inside, which then uses the HashMap.Entry object of each record. On a 64-bit JVM, a write object consumes about 40 bytes of memory in addition to the stored object:

  • The key reference pointing to the Integer object is 8 bytes.

  • The reference to the value (always null in a HashSet) is 8 bytes.

  • The next record link is 8 bytes.

  • The hash value is 4 bytes.

  • The object overhead is 8 bytes.

  • Filling objects - 4 bytes.

those. for each Integer object, you need 56 bytes to store in a HashSet . With a typical HashMap load HashMap 0.75, you must add another 10 or bytes for the links in the HashMap array. With 66 bytes per Integer you can only store about 400,000,000 such objects in 25 GB, without taking into account the rest of your application any other overhead. This is less than 2000 objects per task ...

EDIT 3:

You are better off storing a sorted array of int[] instead of a HashSet . This array is searchable in logarithmic time for any arbitrary integer and minimizes memory consumption by up to 4 bytes per number. Given memory I / O, it will also be as fast (or faster) as the HashSet implementation.

+5
source

If you want to use a more efficient memory solution, I would use a TIntHashSet or a sorted int[] . In this case, you get the full GC before OutOfMemoryError. This is not the cause of the problem, but the symptoms. The cause of the problem is that you are using too much memory for the amount you are resolving as the maximum heap.

Another solution is to create the tasks you do, instead of creating all your tasks in advance. You can do this by breaking your task into NTHREAD tasks. It looks like you are trying to save every solution. If so, this will not help. Instead, you need to find a way to reduce consumption.

Depending on your distribution of numbers, BitSet may be more efficient. This uses 1 bit per integer in the range. for example, let's say your range is 0 - 20,000, this will only use 2.5 KB.

+1
source

Now, after a little research, I found that this memory exception occurs when the GC is busy clearing the memory, and at that moment it stops the whole world to do something.

No - not true. Memory exceptions occur because you are using more memory than was allocated for your program. Very rarely is this memory exception due to some GC behavior. This can happen if you have configured GC poorly.

Have you tried working with great value - Xmx ? And why don't you just use a Hashtable for locations?

+1
source
  • If you are going to store the integers 216000 * 10000 in memory, you need a huge memory.
  • You can try the maximum permissible Xmx settings on your system and see how many objects you can save before you run out of memory.
  • It is not clear why you want to save the results of processing a large number of threads, which is the next step? If you really need to store so much data, you probably need to use a database.
+1
source

You probably need to increase the size of your heap. Check out the JXM -Xmx settings.

0
source

Source: https://habr.com/ru/post/909822/


All Articles