Why is this row counting program slow in Java? Using MappedByteBuffer

To try MappedByteBuffer (a memory mapped file in Java), I wrote a simple wc -l demo (text string of a string):

 int wordCount(String fileName) throws IOException { FileChannel fc = new RandomAccessFile(new File(fileName), "r").getChannel(); MappedByteBuffer mem = fc.map(FileChannel.MapMode.READ_ONLY, 0, fc.size()); int nlines = 0; byte newline = '\n'; for(long i = 0; i < fc.size(); i++) { if(mem.get() == newline) nlines += 1; } return nlines; } 

I tried this in a file about 15 MB in size (15008641 bytes) and 100 thousand lines. It takes about 13.8 sec on my laptop. Why is it so slow?

The full class code is here: http://pastebin.com/t8PLRGMa

For reference, I wrote the same idea in C: http://pastebin.com/hXnDvZm6

It works after about 28 ms or 490 times faster .

Out of curiosity, I also wrote a version of Scala using essentially the same algorithm and API as in Java. It runs 10 times faster , which suggests that something strange is definitely happening.

Update : the file is cached by the OS, so the disk boot time is not performed.

I wanted to use memory mapping for random access to larger files that might not fit into RAM. That is why I am not just using BufferedReader.

+5
source share
1 answer

The code is very slow because fc.size() is called in the loop.

The JVM, obviously, cannot eliminate fc.size() , since the file size can be changed at runtime. The file size request is relatively slow because it requires a system call to the underlying file system.

Change it to

  long size = fc.size(); for (long i = 0; i < size; i++) { ... } 
+10
source

Source: https://habr.com/ru/post/1246298/


All Articles