Structure Check for Binary Files

I am exploring ways to formally specify the format for various binary streams and using a tool to check streams to meet specifications. Something like XSD + of any XML validation tools. Or as an extremely complicated binary expression running grep (preferably not - it's really hard to read).

Does anyone know of a spec / tool that would be useful?

[Rationale: Every day we get a lot of third-party generated binaries and many times we use bad tools that create invalid files. We want to give them a tool that they could use as a validator, and we don’t want to write a specific tool for each format.]

+4
source share
6 answers

If you think that the documentation for Java.class files is a good example of a specification, take a look at Preon. Preon captures it completely and generates documentation like this .

In fact, there are a couple of other initiatives to capture the "syntax" of binary encoded files. ASN.1 is useful, but it does not give you much mileage if you intend to capture - say - Java class files. The same goes for BSDL, Flavor, BFlavor, and a few other other initiatives. The problem is that there are a million ways to encode binary data, many binary compression methods, and I think that means that there will never be something that completely captures it if the language itself does not expand.

Google protocol buffers basically have the same problem. It defines something like a Corba CDR, and that’s good if you don’t need something more advanced. Google protocol buffers will not allow you to grab the Java class file format.

+3
source

try Preon :

  • annotation
  • conditional parts
  • expression language

each annotated class is a Codec description capable of generating both Encoder and Decoder .

+3
source

This is an interesting question, but I would be very surprised if such a specification language exists. This is due to the fact that the possibilities of binary metastructure are virtually endless. Compare this to XML, where the metastructure (tags contain other tags, only one attribute can have one name, etc.) is strictly indicated. And even with such a structure, writing schemas for XML is difficult! The only way to see the endless possibilities of binary file formats is to use what itself allows infinite variability - the complete Turing programming language.

This, of course, does not mean that a useful specification language and a processor for it cannot be created for your specific problem area. I just think that it will be difficult for you to find a pre-built one. Hope the answers here prove that I'm wrong!

+1
source

also check Google Protocol Buffers :

  • Java / Python / C ++ API
  • good dsl
+1
source

I believe a good example is the Java.class file specification: http://java.sun.com/docs/books/jvms/second_edition/html/ClassFile.doc.html

0
source

Abstract syntactic notation: ASN.1 . See Also NCBI Toolbar: http://www.ncbi.nlm.nih.gov/Sitemap/Summary/asn1.html

0
source

Source: https://habr.com/ru/post/1286124/


All Articles