Reading a file larger than 2 GB in memory in Java

Since ByteArrayInputStream limited to 2 GB, is there an alternative solution that allows me to store all the contents of a 2.3 GB file (and possibly more) in an InputStream to read Stax2?

Current code:

  XMLInputFactory xmlInputFactory = XMLInputFactory.newInstance(); XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(in); //ByteArrayInputStream???? try { SchemaFactory factory = SchemaFactory.newInstance("http://www.w3.org/2001/XMLSchema"); Schema schema = factory.newSchema(new StreamSource(schemaInputStream)); Validator validator = schema.newValidator(); validator.validate(new StAXSource(xmlStreamReader)); } finally { xmlStreamReader.close(); } 

For performance tuning, the in variable must not come from disk. I have a dimension of RAM.

+6
source share
4 answers

Use NIO to read the file into the giant ByteBuffer, and then create a stream class that reads ByteBuffer. There are several such floating sources in open sources.

+1
source

The whole point of StAX2 is that you do not need to read the file in memory. You can simply provide the source and let StAX StreamReader retrieve the data as needed.

What additional restrictions do you have that you do not see in your question?

If you have a lot of memory and want to get good performance, just wrap your InputStream with a large byte buffer and let the buffer do the buffering for you:

 // 4 meg buffer on the stream InputStream buffered = new BufferedInputStream(schemaInputStream, 1024 * 1024 * 4); 

An alternative to solving this in Java is to create a RAMDisk and save a file on it, which will fix the problem with Java, where the main limitation is that you can only have values ​​less than Integer.MAX_VALUE in one array.

+5
source

If you have a huge amount of memory, you still won’t get any performance boost. This is only a read anyway, and the disk cache will ensure its optimal performance. Just use a disk based input stream.

0
source

You can use memory that writes compressed data to

 ByteArrayOutputStream baos = new ByteArrayOutputStream ... new GZIPOutputStream(baos)); byte[] bytes = baos.toByteArray(); // < 100 MB? ByteArrayInputStream .... 

And then later wrap the input stream in a GZIPInputStream.

Still a slight slowdown, but should be ideal for XML.

-1
source

Source: https://habr.com/ru/post/976102/


All Articles