Inconsistent fscanf () behavior for different compilers (consumption of trailing null character)

I wrote a complete application on C99 and tested it thoroughly on two GNU / Linux based systems. I was surprised when trying to compile it using Visual Studio on Windows caused the application to malfunction. At first, I could not say what was wrong, but I tried to use the VC debugger, and then I found that the fscanf() function declared in stdio.h did not fscanf() .

The following code is enough to demonstrate the problem:

 #include <stdio.h> int main() { unsigned num1, num2, num3; FILE *file = fopen("file.bin", "rb"); fscanf(file, "%u", &num1); fgetc(file); // consume and discard \0 fscanf(file, "%u", &num2); fgetc(file); // ditto fscanf(file, "%u", &num3); fgetc(file); // ditto fclose(file); printf("%d, %d, %d\n", num1, num2, num3); return 0; } 

Suppose file.bin contains exactly 512\0256\0128\0 :

 $ hexdump -C file.bin 00000000 35 31 32 00 32 35 36 00 31 32 38 00 |512.256.128.| 

Now that it compiles to GCC 4.8.4 on an Ubuntu machine, the resulting program reads the numbers as expected and prints 512, 256, 128 to stdout.
Compiling with MinGW 4.8.1 on Windows gives the same expected result.

However, it seems that there is a big difference when compiling code using Visual Studio Community 2015; namely:

 512, 56, 28 

As you can see, trailing null characters have already been used by fscanf() , so fgetc() captures and discards characters that are necessary for data integrity.

Commenting out the lines of fgetc() , the code works in VC, but splits it into GCC (and, possibly, to other compilers).

What is going on here, and how do I turn this into portable C code? Did I hit undefined behavior? Please note that I accept the C99 standard.

+5
source share
2 answers

TL DR : you were bitten by the inconsistency of the MSVC, a long-standing problem with which MS never showed much interest in the solution. If you must support MSVC in addition to performing C implementations, then one way to do this would be to use conditional compilation directives to suppress fgetc() calls when compiling a program through MSVC.


I tend to agree with the comments that reading binary data using formatted I / O functions is a dubious plan. However, even more doubtful is the combination

compile it with Visual Studio on Windows

and

assuming standard C99.

As far as I know, no MSVC version corresponds to C99. Very recent versions can do better with C2011, in part because C2011 makes some features optional, which were mandatory in C99.

However, whatever version of MSVC you use, I think that it does not meet the standard (both C99 and C2011) in this area. Here is the corresponding text from C99, section 7.19.6.2

The conversion specification is performed in the following steps:

[...]

The input element is read from the stream [...]. An input element is defined as the longest sequence of input characters that does not exceed a given field width and which is or is a prefix of the corresponding input sequence. The first character, if any, remains unread after the input element.

The standard is very clear that the first character that does not match the input sequence remains unread, so the only ways that can be considered compatible with the MSVC are that the characters \0 can be interpreted as part (and completion) of the corresponding input sequence, or if fgetc() was allowed to skip the \0 characters. I see no excuse for the latter, especially considering that the stream was opened in binary mode, so consider the first.

For the conversion specifier u corresponding input sequence is defined as the one

Matches an optionally signed decimal integer whose format is the same as expected for the subject sequence of the strtoul function with a value of 10 for the base argument.

The "user sequence of the strtoul function" is defined in the specifications of this function :

First, they decompose the input string into three parts: the initial, possibly empty, sequence of space characters (as determined by the isspace function), a sequence of objects resembling an integer represented in a certain radix determined by the base value and the final string of one or more unrecognized characters, including the ending null character of the input string.

Note in particular that the terminating null character is explicitly assigned to the ending string of unrecognized characters. It is not part of the subject line and therefore must not match fscanf() when it converts input according to the u specifier.

+8
source

The MSVC fscanf implementation seems to "destroy" the NUL next to 512 :

 fscanf(file, "%u", &num1); 

According to the fscanf documentation this should not happen (my attention):

For each conversion specifier other than n, the longest sequence of input characters not exceeding the specified field width and which is either exactly what the transform specifier expects , or is the prefix of the sequence that it would expect, is what is consumed from the stream. The first character, if any, remains unread after this consumed sequence.

Note that this is different from the situation where you would like to skip the final white characters, as in the following statement:

 fscanf(file, "%u ", &num1); // notice "%u " 

The spectrum says this only happens when the characters are identified by the isspace property, which, as noted, does not hold here (i.e. isspace('\0') gives 0).

A hacky, regex-like workaround that works in both MSVC and GCC may be to replace fgetc with:

 fscanf(file, "%*1[^0-9+-]"); // skip at most one non-%u character 

or more portable, replacing the character class 0-9 defined by the implementation with literal numbers:

 fscanf(file, "%*1[^0123456789+-]"); // skip at most one non-%u character 
+2
source

Source: https://habr.com/ru/post/1264637/


All Articles