Here is a third implementation that does not use mapped buffers. Under the same conditions as before, it works sequentially at 220 ms. The default encoding on my machine is "windows-1252", if I use the simpler encoding "ISO-8859-1", decoding is even faster (about 150 ms).
It seems that using built-in functions such as mapped buffers actually hurt performance (for this very practical case). It is also interesting if I allocate direct buffers instead of heap buffers (look at the commented lines), then performance decreases (mileage then takes about 400 ms).
Until now, it seems that you need to: decode characters as fast as possible in Java (provided that you cannot use one encoding), use a decoder manually, write a decoding cycle using heap buffers, do not use mapped buffers or even your own. I must admit that I really do not know why this is so.
public static void readWithBuffers() throws Exception { FileInputStream fis = new FileInputStream(FILE); FileChannel channel = fis.getChannel(); CharsetDecoder decoder = Charset.defaultCharset().newDecoder(); // CharsetDecoder decoder = Charset.forName("ISO-8859-1").newDecoder(); ByteBuffer bbuf = ByteBuffer.allocate(4096); // ByteBuffer bbuf = ByteBuffer.allocateDirect(4096); CharBuffer cbuf = CharBuffer.allocate(4096); // CharBuffer cbuf = ByteBuffer.allocateDirect(2 * 4096).asCharBuffer(); for(;;) { if(-1 == channel.read(bbuf)) { decoder.decode(bbuf, cbuf, true); decoder.flush(cbuf); break; } bbuf.flip(); CoderResult res = decoder.decode(bbuf, cbuf, false); if(CoderResult.OVERFLOW == res) { cbuf.clear(); } else if (CoderResult.UNDERFLOW == res) { bbuf.compact(); } } fis.close(); }
source share