C binary format testing module

I am writing a C library that reads a binary file format. I do not control the binary format; it is created using its own data collection program and is relatively complex. Since this is one of my first foray into C programming and parsing binary files, I have few problems figuring out how to structure the code for testing and portability.

For testing purposes, I thought that the easiest way to do this is to build a library to read an arbitrary stream of bytes. But I ended up introducing a stream data type that encapsulates a stream type ( memstream , filestream , etc.). The interface has functions such as stream_read_uint8 , so the client code should not know anything about where the bytes come from. My tests are against memstream , and filestream is just a wrapper around FILE* and fread , etc.

From an OOP point of view, I think this is a reasonable design. However, I get the feeling that I am imposing the wrong paradigm on the language and ending up with overly abstract, overly complex code as a result.

So my question is: is there a simpler, more idiomatic way to read binary format in simple C while keeping automated tests?

Note. I understand that FILE* is essentially an abstract stream interface. But the implementation of memory streams ( fmemopen ) is non-standard, and I want the C standard to be portable.

+4
source share
1 answer

What you described is the low-level I / O functionality. Since fmemopen() not 100% portable (with Linux, it creaks, I suspect), then you need to provide something portable that you write close enough to use your surrogate functions (only) native functions when necessary, when it is possible. Of course, you should be able to force your features to be used even in your native environment so that you can test your code.

This code can be tested using known data to make sure that you pick up all the characters in the input streams and can correctly return them. If the source data is in a specific value, you can make sure that your "larger" types are hypothetically, functions such as stream_read-uint2() , stream_read_uint4() , stream_read_string() , etc. - everyone is behaving appropriately. At this point, you really don't need actual data; You can make data for yourself and your testing.

After you get this place, you will also need to write code to read data with larger types and make sure that this higher-level function can really accurately interpret binary data and invoke the appropriate actions. To do this, you finally need examples of what the format provided; before this phase, you could probably get away from the data you produced. But as soon as you read the actual files, you need examples of those that you need to work on. Or you will have to make them from your understanding and test, as far as you can. How easily this depends on how well-documented the binary format is.


One of the key testing and debugging tools is the canonical dump functions that can present data to you. The scheme I use is:

 extern void dump_XyzType(FILE *fp, const char *tag, const XyzType *data); 

The flow is self-evident; this is usually stderr , but by making this an argument, you can get the data into any open file. tag included in the printed information; It must be unique to determine the location of the call. The final argument is a pointer to the data type. You can analyze and print this. You should take the opportunity to approve all the reality checks that you can think of in order to troubleshoot.

You can extend the interface with , const char *file, int line, const char *func and agree to add __FILE__ , __LINE__ and __func__ to the calls. I never needed this, but if I did, I would use:

 #define DUMP_XyzType(fp, tag, data) \ dump_XyzType(fp, tag, data, __FILE__, __LINE__, __func__) 

As an example, I am dealing with a DATETIME type, so I have a function

 extern void dump_datetime(FILE *fp, const char *tag, const ifx_dtime_t *dp); 

One of the tests that I used this week could be convinced to reset the datetime value, and it gave:

 DATETIME: Input value -- address 0x7FFF2F27CAF0 Qualifier: 3594 -- type DATETIME YEAR TO SECOND DECIMAL: +20120913212219 -- address 0x7FFF2F27CAF2 E: +7, S = 1 (+), N = 7, M = 20 12 09 13 21 22 19 

Perhaps you can see or not see the value 2012-09-13 21:22:19 . Interestingly, this function itself calls another function in the family, dump_decimal() to print the decimal value. Within one year, I update the qualifier print to include a hexadecimal version that is much easier to read (3594 is 0x0E0A, which is easy to understand for those who know 14 digits (E), starting from YEAR (second 0) to second (A) which, of course, is not so obvious from the decimal version. Of course, the information is in a line like: DATETIME YEAR TO SECOND. (The decimal format is somewhat incomprehensible to an outsider, but quite clear (E), sign (S), the number of (stage) numbers (N = 7) and the actual figures (M = ...). Yes, the name of the decimal is strictly incorrect, because it uses a basic pre- representation 100 or centesimal.)

The test does not give such a level of detail by default, but I just needed to run it with a fairly high level of debugging (using the command line parameter). I would consider this as another valuable feature.

The most silent way to run tests:

 test.bigintcvasc.......PASS (phases: 4 of 4 run, 4 pass, 0 fail)(tests: 92 run, 89 pass, 3 fail, 3 expected failures) test.deccvasc..........PASS (phases: 4 of 4 run, 4 pass, 0 fail)(tests: 60 run, 60 pass, 0 fail) test.decround..........PASS (phases: 1 of 1 run, 1 pass, 0 fail)(tests: 89 run, 89 pass, 0 fail) test.dtcvasc...........PASS (phases: 25 of 25 run, 25 pass, 0 fail)(tests: 97 run, 97 pass, 0 fail) test.interval..........PASS (phases: 15 of 15 run, 15 pass, 0 fail)(tests: 178 run, 178 pass, 0 fail) test.intofmtasc........PASS (phases: 2 of 2 run, 2 pass, 0 fail)(tests: 12 run, 8 pass, 4 fail, 4 expected failures) test.rdtaddinv.........PASS (phases: 3 of 3 run, 3 pass, 0 fail)(tests: 69 run, 69 pass, 0 fail) test.rdtimestr.........PASS (phases: 1 of 1 run, 1 pass, 0 fail)(tests: 16 run, 16 pass, 0 fail) test.rdtsub............PASS (phases: 1 of 1 run, 1 pass, 0 fail)(tests: 19 run, 15 pass, 4 fail, 4 expected failures) 

Each program identifies itself and its status (PASS or FAIL) and summary statistics. I searched for errors and fixed errors that were different from those that I accidentally found, so there are some “expected failures”. This should be a temporary state of affairs; this allows me to legitimately claim that all tests pass. If I wanted to get more detailed information, I could perform any of the tests with any of the phases (subsets of tests that are somewhat related to each other, although “several” are actually arbitrary), and see the results in full, etc. d. As shown, it takes less than a second to run this test suite.

I find this useful when there are repeated calculations - but I had to calculate or check the correct answer for each of these tests at some point.

+2
source

Source: https://habr.com/ru/post/1400672/


All Articles