Heap size issue - Managing memory with java

My application has the following code that performs two functions:

Parse a file that has the number "n".

There will be two web service calls for each file in the file.

public static List<String> parseFile(String fileName) { List<String> idList = new ArrayList<String>(); try { BufferedReader cfgFile = new BufferedReader(new FileReader(new File(fileName))); String line = null; cfgFile.readLine(); while ((line = cfgFile.readLine()) != null) { if (!line.trim().equals("")) { String [] fields = line.split("\\|"); idList.add(fields[0]); } } cfgFile.close(); } catch (IOException e) { System.out.println(e+" Unexpected File IO Error."); } return idList; } 

When I try to parse a file with 1 million lines of record, the java process exits after processing a certain amount of data. I got java.lang.OutOfMemoryError: Java heap space error. I can partially understand that the Java process is stopping due to this huge data. Please suggest me how to do this with huge data.

EDIT: Will this part of the code be new BufferedReader(new FileReader(new File(fileName))); analyze the entire file and is affected by file size.

+4
source share
3 answers

The problem is that you are accumulating all the data in the list. The best way to get close to this is to do it in a streaming way. This means that you do not accumulate all identifiers in the list, but call your web service on each line or accumulate a smaller buffer, and then make a call.

Opening a file and creating a BufferedReader will not affect memory consumption, since bytes from the file will be read (more or less) line by line. The problem in this paragraph is in the code idList.add(fields[0]); , the list will grow as large as the file as you continue to accumulate all the file data in it.

Your code should do something like this:

  while ((line = cfgFile.readLine()) != null) { if (!line.trim().equals("")) { String [] fields = line.split("\\|"); callToRemoteWebService(fields[0]); } } 
+3
source

Increase java heap memory size with the -Xms and -Xmx options. Unless explicitly stated, jvm sets the heap size to ergonomic defaults, which are not enough in your case. Read this article to learn more about memory tuning in jvm: http://www.oracle.com/technetwork/java/javase/tech/memorymanagement-whitepaper-1-150020.pdf

EDIT: An alternative way to do this in producer-consumer mode is to use parallel processing. The general idea is to create a producer stream that reads the file and poses tasks for processing and n consumer flows that consume them. A very general idea (for illustrative purposes) is as follows:

 // blocking queue holding the tasks to be executed final SynchronousQueue<Callable<String[]> queue = // ... // reads the file and submit tasks for processing final Runnable producer = new Runnable() { public void run() { BufferedReader in = null; try { in = new BufferedReader(new FileReader(new File(fileName))); String line = null; while ((line = file.readLine()) != null) { if (!line.trim().equals("")) { String[] fields = line.split("\\|"); // this will block if there are not available consumer threads to process it... queue.put(new Callable<Void>() { public Void call() { process(fields); } }); } } } catch (InterruptedException e) { Thread.currentThread().interrupt()); } finally { // close the buffered reader here... } } } // Consumes the tasks submitted from the producer. Consumers can be pooled // for parallel processing. final Runnable consumer = new Runnable() { public void run() { try { while (true) { // this method blocks if there are no items left for processing in the queue... Callable<Void> task = queue.take(); taks.call(); } } catch (InterruptedException e) { Thread.currentThread().interrupt(); } } } 

Of course, you need to write code that controls the life cycle of consumer and producer flows. The right way to do this is to implement it with the help of the Contractor.

+2
source

If you want to work with big data, you have 2 options:

  • use a large enough heap to fit all the data. this will β€œwork” for a while, but if your data size is unlimited, it will ultimately fail.
  • work with data gradually. only save part of the data (of limited size) in memory at any given time. This is an ideal solution because it will scale for any amount of data.
+1
source

Source: https://habr.com/ru/post/1436830/


All Articles