I need to agree with @gene, first try with BufferedReader and getLine, simple and easy to encode. Just be careful not to use the alias of the support array between the result of getLine and any substring you use. String.substring () is a particularly common criminal, and byte arrays with several megabytes were blocked in my memory because I was referred to by the 3 char substring.
Assuming ASCII, my preference for this is to go down to the byte level. Use mmap to view the file as ByteBuffer
, and then do a linear scan for 0x20 and 0x0A (assuming unix-style line separators). Then convert the corresponding bytes to a string. If you use 8-bit encoding, it is extremely difficult to be faster than this.
If you are using Unicode, the problem is quite complicated, so I highly recommend using BufferedReader
if this performance is not acceptable. If getLine()
does not work, consider only the loop when calling read()
.
Regardless, you should always specify a Charset when initializing a String from an external byte stream. This clearly expresses your encoding assumption. Therefore, I recommend a slight modification of the gene proposal, therefore one of:
int i = Integer.parseInt(new String(buffer, start, length, "US-ASCII")); int i = Integer.parseInt(new String(buffer, start, length, "ISO-8859-1")); int i = Integer.parseInt(new String(buffer, start, length, "UTF-8"));
as needed.
source share