Reduced Java File Throughput

Question

Reduced Java File Throughput

I have a program in which each stream reads in a file many lines at a time from a file, processes the lines and writes the lines to another file. Four threads split the list of files for processing among them. I have performance issues in two cases:

Four files with 50,000 lines each
- Throughput starts with processing 700 lines / sec, decreases to ~ 100 lines / sec
30,000 files, 12 lines each
- Throughput starts at about 800 lines / sec and remains steady.

This is the internal software I'm working on, so unfortunately I can’t share any source code, but the main steps of the program are:

Split file list among four workflows
Run all threads.
Thread reads up to 100 lines at a time and stores in an array String[].
Thread applies the conversion to all the rows in the array.
Thread writes lines to a file (not the same as the input file).
3-5 repetitions for each stream until all files are fully processed.

What I do not understand is why 30 thousand files with 12 lines give me more performance than several files with many lines. I would expect that the overhead of opening and closing files will be more than reading a single file. In addition, the decrease in productivity of the first case is exponential.

I set the maximum heap size to 1024 MB and it seems to use no more than 100 MB, so the problem with an overloaded GC is not a problem. Do you have any other ideas?

+3

java performance multithreading file-io

A B 06 . '10 21:05

6

, , , , ( \OS) , . / IO, , ( ) . IO- , . IO .

+2

Tim Lloyd 06 . '10 21:20

Java? , . , Netbeans , .

+1

Karmastan 06 . '10 21:10

, []. , , , - . , .

, vm - Xmx1024m , , , . -Xms1024m -Xmx1024m (.. ), , .

+1

Steve B. 06 . '10 21:13

( 100 , , , ). Java, - .

0

Eric Andres 06 . '10 21:18

. BufferedReader BufferedWriter, 100 . . .

0

EJP 08 . '10 1:16

Eyal Schneider · Accepted Answer · 2010-07-06T21:20:48+0000

, , GC, , . , , . , ( ), , , , , .

IO , IO.

Reduced Java File Throughput

More articles: