OutOfMemoryError: Java heap space when trying to read a large file

I am trying to read a large file (approximately 516mb) and it has 18 lines of text. I tried to write the code myself and got an error in the first line of code when trying to read the file:

try(BufferedReader br = new BufferedReader(new FileReader("test.txt"))) { String line; while ((line = br.readLine()) != null) { String fileContent = line; } } 

Note. The file exists and the size is approximately 516 MB. If there is another safer and faster way to read, please tell me (even if it will be line breaks). Edit: Here I tried to use Scanner, but it lasts a little longer and then gives the same error

 try(BufferedReader br = new BufferedReader(new FileReader("test.txt"))) { Scanner scanner = new Scanner(br); while(scanner.hasNext()){ int index = Integer.parseInt(scanner.next()); // and here do something with index } } 

I even split the file into 1800 lines, but nothing was fixed

+6
source share
6 answers

Using BufferedReader will already help you avoid loading the entire file into memory. So, for further improvement, as you said, each number is separated by a space, so instead:

 line = br.readLine(); 

We can wrap the reader with a scanner,

 Scanner scanner = new Scanner(br); 

And extract each number in the file using scanner.next(); and save it in an integer array, also reduce memory usage:

 int val = Integer.parseInt(scanner.next()); 

This will help you avoid reading the entire sentence.

And you can also limit the size of your buffer for BufferedReader

 BufferedReader br = new BufferedReader(new FileReader("test.txt") , 8*1024); 

Additional information Does the scanner class scan the entire file immediately in memory?

+4
source

Java was designed to work with a large amount of data, which is more available memory. At the lover level, an API file is a stream, possibly endless.

However, with chip memory, people prefer the easy way - they read everything in memory and work with memory. This usually works, but not in your case. Increasing memory only hides this problem until you have a larger file. So, it's time to do it right.

I do not know your sorting approach, which you use for comparison. If it is good, then it can create some sortable key or index of each row. You read the file once, create a pf map of such keys, sort them, and then create a sorted file based on this sorted map. This will be (the worst case scenario) in your case 1 + 18 file readings plus 1 record.

However, if you do not have such a key and just compare the characters of the strings by character, you should have 2 input streams and compare one with the other. If one line is not in the right place, you overwrite the file in the correct order and do it again. The worst case scenario is 18 * 18 readings for comparison, 18 * 2 readings for writing and 18 scriptures.

This is a consequence for such an architecture when you store your data in huge lines in huge files.

+1
source

Increase the witn -Xmx heap size.

For your file, I would suggest installing -Xmx1536m at least, since the 516M file size will increase at boot time. Internally, Jaava uses 16 bits to represent a character, so a file with 10 bytes of text will take approx. 20 bytes as String (except when using UTF-8 with many concatenated characters).

0
source

Note. Increasing the heap memory limit for sorting a file with 18 lines is just a lazy way to solve a programming problem, this philosophy always increases memory instead of solving a real problem - this is the reason Java programs are not well aware of slowness and the like.

My advice is to avoid increasing memory for such a task, you need to split the file line by line and concatenate the lines in a way that is similar to MergeSort. Thus, your program can scale if the file size grows.

To split a file into several "line subscript files", use the read method of the BufferedReader class:

 private void splitBigFile() throws IOException { // A 10 Mb buffer size is decent enough final int BUFFER_SIZE = 1024 * 1024 * 10; try (BufferedReader br = new BufferedReader(new FileReader("test.txt"))) { String line; int fileIndex = 0; FileWriter currentSplitFile = new FileWriter(new File("test_split.txt." + fileIndex)); char buffer[] = new char[BUFFER_SIZE]; int readed = 0; while ((readed = br.read(buffer)) != -1) { // Inspect the buffer in search of the new line character boolean endLineProcessed = false; for (int i = 0; i < readed; i++) { if (buffer[i] == '\n') { // This chunk contains the new line character, write this last chunk the current file and create a new one currentSplitFile.write(buffer, 0, i); fileIndex++; currentSplitFile = new FileWriter(new File("test_split.txt." + fileIndex)); currentSplitFile.write(buffer, i, readed - i); endLineProcessed = true; } } // If not end of line found, just write the chunk if (!endLineProcessed) { currentSplitFile.write(buffer, 0, readed); } } } } 

To combine them, open all the files and save a separate buffer (small, for example, 2 mb each) for each of them, read the first fragment of each file and there, you will have enough information to start reordering the file index. Continue reading snippets if some files have links.

0
source

It's hard to guess without understanding the memory profile of your application, JVM settings, and hardware. It can be as simple as changing the JVM memory settings, or just as difficult as converting bytes with RandomFileAccess. I will try to make a long shot here. The problem can only be that you are trying to read very long lines, not that the file is large.

If you look at the implementation of BufferedReader.readLine (), you will see something like this (simplified version):

 String readLine() { StringBuffer sb = new StringBuffer(defaultStringBufferCapacity); while (true) { if (endOfLine) return sb.toString(); fillInternalBufferAndAdvancePointers(defaultCharBufferCapacity);//(*) sb.append(internalBuffer); //(**) } } // defaultStringBufferCapacity = 80, can't be changed // defaultCharBufferCapacity = 8*1024, can be altered 

(*) Here is the most critical line. It tries to fill an internal buffer of a limited size of 8K and add a char buffer to a StringBuffer. A 516Mb file with 18 lines means that each line will occupy ~ 28 MB in memory. Therefore, he tries to select and copy an array of 8K ~ 3500 times per line.

(**) He then tries to put this array into a StringBuffer with a default bandwidth of 80. This causes extra allocations for StringBuffer to make sure that the internal buffer is large enough to save a string of ~ 25 extra allocations per string, if I'm not mistaken.

Basically, I would recommend increasing the size of the internal buffer to 1 MB, just pass an additional parameter to the BufferedReader instance, for example:

  new BufferedReader(..., 1024*1024); 
0
source

EDIT The same for heap java space, declaring variables inside or outside the loop.

Just a tip.

if you can, you should not declare variables inside loops, because of this you can fill the java heap space. In this example, if possible, it would be better:

 try(BufferedReader br = new BufferedReader(new FileReader("test.txt"))) { String line; String fileContent; while ((line = br.readLine()) != null) { fileContent = line; } } 

Why? Since in each iteration, java reserves a new heap space for the same variable (Java considers a new different variable (you may want this, but probably not)), and if the loop is large enough, the heap may be full.

0
source

Source: https://habr.com/ru/post/985167/


All Articles