Java performance byte [] vs. char [] for file stream

I am writing a program that reads a file (uses a custom buffer, 8 KB), and then finds the keyword in this buffer. Since Java provides two types of streams: character and byte, I implemented this using byte[] and char[] for buffering.

I'm just wondering what would be faster and better for performance, since char is 2 byte , and when using Reader to read char[] , Reader will convert from byte to char , which I think can make it slower than using only byte[] .

+6
source share
3 answers

Using a byte array will be faster:

  • You do not have the step of decrypting characters in bytes, which is at least a copy cycle, and possibly more depending on the encoding used for decoding.

  • A byte array takes up less space and therefore saves processor cycles in GC / initialization.

However:

  • If you are not looking for huge files, the difference is unlikely to be significant.

  • The byte array approach may FAULT if the input file is not encoded in an 8-bit character set. And even if it works (as for UTF-8 and UTF-16), there are potential problems with matching characters that span the boundaries of the buffer.

(The reason that byte-wise processing works for UTF-8 and UTF-16 is because encoding makes it easy to distinguish between the first block (byte or short) and subsequent units of the encoded character.)

+6
source

If the binary you are reading is using an array of bytes.

If it is a text file, and you intend to use the contents, for example, lines later, you should use a char array.

+1
source

This file-streaming-in-java stack overflow question speaks of efficient streaming file in java.

I especially like this reference article.

In large files, you quickly get the speed advantage using only bytes, so if you can decode the pattern through bytes, you can finally get some precious cycles.

If your files are small, or if you don’t have many, this may not be a problem.

0
source

Source: https://habr.com/ru/post/895043/


All Articles