I process several text files line by line using BufferReader.readlLine() .
Two files with a size of 130 MB, but one takes 40 seconds to process, and the other takes 75 seconds.
I noticed that one file has 1.8 million lines and the other 2.1 million lines. But when I tried to process the file with 3.0 million lines of the same size, it took 30 minutes to process.
So my question is:
Friends, I provide more detailed information.
I split the string into three parts using a regular expression, and then using SimpleUnsortedWriter (provided by Cassandra), I write it to some file as a key, column and value. After processing 16 MB of data, it is flushed to disk.
But the processing logic is the same for all files, even one file of 330 MB in size, but less than about 1 million lines is processed in 30 seconds. What could be the reason?
deviceWriter = new SSTableSimpleUnsortedWriter( directory, keyspace, "Devices", UTF8Type.instance, null, 16); Pattern pattern = Pattern.compile("[\\[,\\]]"); while ((line = br.readLine()) != null) { //split the line in row column and value long timestamp = System.currentTimeMillis() * 1000; deviceWriter .newRow(bytes(rowKey)); deviceWriter .addColumn(bytes(colmName), bytes(value), timestamp); }
Changed -Xmx256M to -Xmx 1024M , but this does not help.
Update: According to my observations, since I write to the buffer (in physical memory), as not. buffer entries increase; new entries take time. (This is my guess)
Answer, please.