Best way to read structured binaries using Java

I need to read a binary file in legacy format with Java.

In a nutshell, the file has a header consisting of several integers, bytes and arrays with a fixed char length, followed by a list of entries that also consist of integers and characters.

In any other language, I would create struct (C / C ++) or record (Pascal / Delphi), which are byte representations of the header and record. Then I read the sizeof(header) bytes in the header variable and did the same for the records.

Something like this: (Delphi)

 type THeader = record Version: Integer; Type: Byte; BeginOfData: Integer; ID: array[0..15] of Char; end; ... procedure ReadData(S: TStream); var Header: THeader; begin S.ReadBuffer(Header, SizeOf(THeader)); ... end; 

What is the best way to do something similar with Java? Should I read each individual value on my own or is there any other way to do this "block reading"?

+47
java file
Nov 10 '08 at 14:11
source share
12 answers

As far as I know, Java forces you to read the file as bytes, and not block reading. If you serialized Java objects, that would be a different story.

In the other examples provided, use the DataInputStream class with the file, but you can also use the shortcut: RandomAccessFile class:

 RandomAccessFile in = new RandomAccessFile("filename", "r"); int version = in.readInt(); byte type = in.readByte(); int beginOfData = in.readInt(); byte[] tempId; in.read(tempId, 0, 16); String id = new String(tempId); 

Note that you can turn responce objects into a class if this makes the process easier.

+34
Nov 10 '08 at 14:41
source share

You can use the DataInputStream class as follows:

 DataInputStream in = new DataInputStream(new BufferedInputStream( new FileInputStream("filename"))); int x = in.readInt(); double y = in.readDouble(); etc. 

Once you get these values, you can do with them as you please. See the java.io.DataInputStream class in the API for more information.

+19
Nov 10 '08 at 14:31
source share

If you will use Preon , then all you need to do is the following:

 public class Header { @BoundNumber int version; @BoundNumber byte type; @BoundNumber int beginOfData; @BoundString(size="15") String id; } 

After that, you create the codec using one line:

 Codec<Header> codec = Codecs.create(Header.class); 

And you use Codec as follows:

 Header header = Codecs.decode(codec, file); 
+18
Aug 12 '09 at 14:52
source share

You may have misunderstood you, but it seems to me that you are creating memory structures that you hope will be byte-byte, an accurate representation of what you want to read from the hard drive, and then copy all the material to memory and to manipulate from there?

If this is true, you are playing a very dangerous game. At least in C, the standard does not apply such things as filling or aligning structural elements. Not to mention such things as big / small endianness or a parity bit ... So even if your code works very non-portable and risky, you depend on the creator of the compiler without changing it in future versions.

It is better to create an automaton for checking the validity of the read structure (byte per byte) from HD and filling the structure in memory, if it is really OK. You can lose a few milliseconds (not as much as modern OSs seem to do a lot of caching on disk), although you get platform and compiler independence. In addition, your code will be easily ported to another language.

Post Edit: To some extent, I sympathize with you. In the good days of DOS / Win3.11, I once created a C program to read BMP files. And he used the exact same technique. Everything was fine until I tried to compile it for Windows - oops !! Int was now 32 bits long, not 16! When I tried to compile Linux, it was discovered that gcc has very different rules for allocating bit fields than Microsoft C (6.0!). I had to resort to macro tricks to make it portable ...

+10
Nov 10 '08 at 15:58
source share

I used Javolution and javastruct, and handles the conversion between bytes and objects.

Javolution provides classes representing the types of C. All you have to do is write a class that describes the structure of C. For example, from the C header file

 struct Date { unsigned short year; unsigned byte month; unsigned byte day; }; 

should translate to:

 public static class Date extends Struct { public final Unsigned16 year = new Unsigned16(); public final Unsigned8 month = new Unsigned8(); public final Unsigned8 day = new Unsigned8(); } 

Then call setByteBuffer to initialize the object:

 Date date = new Date(); date.setByteBuffer(ByteBuffer.wrap(bytes), 0); 

javastruct uses annotation to define fields in the C structure.

 @StructClass public class Foo{ @StructField(order = 0) public byte b; @StructField(order = 1) public int i; } 

To initialize an object:

 Foo f2 = new Foo(); JavaStruct.unpack(f2, b); 
+7
Dec 02 2018-11-11T00:
source share

I think FileInputStream allows reading in bytes. So, open the file with FileInputStream and read it in sizeof (header). I assume that the header has a fixed format and size. I do not see this in the original post, but I suppose it is, because it will be much more complicated if the header has optional arguments and different sizes.

Once you have the information, there may be a header class in which you assign the contents of a buffer that you have already read. And then analyze the records in the same way.

+4
Nov 10 '08 at 14:18
source share

Here is a link to reading a byte using ByteBuffer (Java NIO)

http://exampledepot.com/egs/java.nio/ReadChannel.html

+4
Nov 10 '08 at 16:10
source share

As other people say, DataInputStream and Buffers are probably the low-level API that you use to process binary data in java.

However, you probably want something like Construct (the wiki page also has some good examples: http://en.wikipedia.org/wiki/Construct_(python_library) , but for Java.

I don't know any (Java versions), but adopting this approach (declaratively indicating the structure in the code) is likely to be the right way. Using a suitable free interface in Java, it will probably be very similar to DSL.

EDIT: The googling bit shows this:

http://javolution.org/api/javolution/io/Struct.html

What could be the kind of thing you're looking for. I have no idea if this works or something good, but it seems like a reasonable place to start.

+3
Nov 10 '08 at 16:15
source share

I would create an object that wraps the ByteBuffer view of the data and provides getters to read directly from the buffer. This way you avoid copying data from the buffer to primitive types. Alternatively, you can use MappedByteBuffer to get a byte buffer. If your binary data is complex, you can model it using classes and give each class a threaded version of your buffer.

 class SomeHeader { private final ByteBuffer buf; SomeHeader( ByteBuffer fileBuffer){ // you may need to set limits accordingly before // fileBuffer.limit(...) this.buf = fileBuffer.slice(); // you may need to skip the sliced region // fileBuffer.position(endPos) } public short getVersion(){ return buf.getShort(POSITION_OF_VERSION_IN_BUFFER); } } 

Methods for reading unsigned values from byte buffers are also useful.

NTN

+3
Mar 04 '10 at 11:52
source share

I wrote a technique to do similar things in java - like an old C-like id of reading bit fields. Please note that this is just the beginning, but can be expanded.

here

+2
May 05 '09 at 1:06 a.m.
source share

In the past, I used DataInputStream to read data of arbitrary types in the specified order. This will not allow you to easily take into account the problems of large and small businesses.

Starting with version 1.4, the java.nio.Buffer family may be capable, but it looks like your code may be more complex. These classes have support for handling the final problems.

+1
Nov 10 '08 at 14:32
source share

Some time ago I found this article about using reflection and parsing to read binary data. In this case, the author uses reflection to read binary java .class files. But if you are reading data into a class file, this may help.

+1
Nov 10 '08 at 15:53
source share



All Articles