How can I improve the read speed of a large file in Java?

I just read a file with a size of 167MB and a line number of 1884000. The method I use is BufferedReader to get the effect of reading it in a line.

I noticed that the process of reading a file grows slower and slower as the number of the current line increases (in this case, it took me 3h30min ).

I know that using nio can speed up this procedure, but I want to read the file in line.

My code is as follows; can anyone give me some suggestions? Many thanks!

 String htmlContentPath = html.getAbsolutePath(); BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(htmlContentPath))); String line = null; int cnt = 0; while((line = reader.readLine()) != null) { this.proc(line); if((cnt++ % 2000) == 0) { logger.info("current line number:\t"+cnt); } } 
+4
source share
4 answers

You should find the answer here:

http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly

For the best Java reading performance, there are four things to remember:

  • Minimizing I / O by reading the array at a time, rather than byte at a time. An 8 Kbyte array is a good size.

  • Minimize method calls by receiving data in an array at a time, rather than a byte at a time. Use array indexing to get bytes in the array.

  • Minimize thread synchronization locks if you don't need thread safety. Either make fewer method calls in a thread-safe class, or use an unsafe class, such as FileChannel and MappedByteBuffer.

  • Minimize data copying between JVM / OS, internal buffers, and application arrays. Use a FileChannel with a memory mapping or a straight or wrapped ByteBuffer array.

+2
source

This may be caused by the exchange, depending on the amount of memory in your file in the proc method, you can perform a visualization on your process to see the state of the heap and then configure (xms, xmx) / reduce the memory consumption of your method.

Greetings.

0
source

When I first read your question, I was going to offer you to comment on the proc () call, and then use some other answers to speed up the reading of the file (this should be the whole run time because you commented on the processing call).

For further thought, I suggest you use a profiler (without any lines). If you use Eclipse, there are several JVM profilers on the Eclipse Marketplace, and I'm sure there are profiles integrated into another development environment. Profilers can show you the hot spots in your code - places where you seem to be most of the time. This information, plus your knowledge of program logic, will lead to ways to accelerate the worst bottlenecks.

It is an iterative process with best and best results.

I also recommend that you first use a much smaller sample file for your testing.

0
source

This sounds like a memory problem to me (slowdowns often occur, as the need for garbage collection increases due to lack of memory).

The code you posted doesn't look like it should slow down as the line number increases (assuming the proc () call is "clean").

I 2nd Chris G advise you to remove the proc () call to find out if there is a slowdown when you just read the tone and don't process any of its lines.

I would also add that you can try using the -Xmx and -Xms flags to give the JVM access to more memory from the start.

Here's a question that might make a difference: Java threads slow down by the end of processing

0
source

Source: https://habr.com/ru/post/1495986/


All Articles