Reduced Java File Throughput

I have a program in which each stream reads in a file many lines at a time from a file, processes the lines and writes the lines to another file. Four threads split the list of files for processing among them. I have performance issues in two cases:

  • Four files with 50,000 lines each
    • Throughput starts with processing 700 lines / sec, decreases to ~ 100 lines / sec
  • 30,000 files, 12 lines each
    • Throughput starts at about 800 lines / sec and remains steady.

This is the internal software I'm working on, so unfortunately I can’t share any source code, but the main steps of the program are:

  • Split file list among four workflows
  • Run all threads.
  • Thread reads up to 100 lines at a time and stores in an array String[].
  • Thread applies the conversion to all the rows in the array.
  • Thread writes lines to a file (not the same as the input file).
  • 3-5 repetitions for each stream until all files are fully processed.

What I do not understand is why 30 thousand files with 12 lines give me more performance than several files with many lines. I would expect that the overhead of opening and closing files will be more than reading a single file. In addition, the decrease in productivity of the first case is exponential.

I set the maximum heap size to 1024 MB and it seems to use no more than 100 MB, so the problem with an overloaded GC is not a problem. Do you have any other ideas?

+3
6

, , GC, , . , , . , ( ), , , , , .

IO , IO.

+3

, , , , ( \OS) , . / IO, , ( ) . IO- , . IO .

+2

Java? , . , Netbeans , .

+1

, []. , , , - . , .

, vm - Xmx1024m , , , . -Xms1024m -Xmx1024m (.. ), , .

+1

( 100 , , , ). Java, - .

0

. BufferedReader BufferedWriter, 100 . . .

0

Source: https://habr.com/ru/post/1753418/


All Articles