Are standard C buffers supported behind a null terminator?

In various cases, when the buffer provides the standard library with many string functions, is it guaranteed that the buffer will not be changed outside of the null terminator? For example:

char buffer[17] = "abcdefghijklmnop"; sscanf("123", "%16s", buffer); 

"123\0efghijklmnop" buffer now require the equal "123\0efghijklmnop" ?

Another example:

 char buffer[10]; fgets(buffer, 10, fp); 

If the read line is only 3 characters long, can you be sure that the 6th character is the same as before fgets?

+42
c standards c-standard-library
Feb 25 '15 at 6:24
source share
7 answers

Each individual byte in the buffer is an object. If any part of the description of the sscanf or fgets function does not mention a change in these bytes, or even implies that their values ​​may change, for example. stating that their values ​​become unspecified, then the general rule applies: (emphasis mine)

6.2.4 Duration of storage of objects

2 [...] An object exists, has a permanent address, and retains its last stored value throughout its life . [...]

This is the very principle that ensures that

 #include <stdio.h> int a = 1; int main() { printf ("%d\n", a); printf ("%d\n", a); } 

tries to print 1. twice. Although a is global, printf can access global variables, and the description of printf does not mention a change to a .

Neither the fgets description nor the sscanf description mention the modification of the buffers behind the bytes that actually should have been written (except in the case of a read error), so these bytes do not change.

+23
Feb 25 '15 at 10:47
source share

The C99 draft standard does not explicitly indicate what should happen in these cases, but by considering several options, you can show that it should work in a certain way, it meets the specification in all cases.

The standard says:

% s - Matches a sequence of non-white space characters .252)

If there is no modifier of length l, the corresponding argument must be a pointer to the source element of the character array large enough to accept a sequence and a terminating null character to be added automatically.

Here are a couple of examples that show this, should work as you suggest, in order to comply with the standard.

Example A:

 char buffer[4] = "abcd"; char buffer2[10]; // Note the this could be placed at what would be buffer+4 sscanf("123 4", "%s %s", buffer, buffer2); // Result is buffer = "123\0" // buffer2 = "4\0" 

Example B:

 char buffer[17] = "abcdefghijklmnop"; char* buffer2 = &buffer[4]; sscanf("123 4", "%s %s", buffer, buffer2); // Result is buffer = "123\04\0" 

Note that the sscanf interface does not provide enough information to really know that they were different. So, if Example B should work correctly, it should not mess up with bytes after the null character in Example A. This is because it should work in both cases in accordance with this specification bit.

So implicitly it should work, as you stated because of the spec.

Similar arguments can be placed for other functions, but I think you can see the idea from this example.

Note: Providing size limits in a format such as "% 16s" may change the behavior. By specification, it would be functionally acceptable for sscanf to zero the buffer to its limits before writing data to the buffer. In practice, most implementations choose performance, which means that they leave the rest.

When the purpose of the specification is to perform such a nulling, it is usually indicated explicitly. Strncpy example. If the line length is less than the specified maximum buffer length, it fills the rest of the space with null characters. The fact that this same “string” function can return a string endlessly also makes this one of the most common functions for people to collapse their own version.

As for fgets, a similar situation may arise. The only problem is that the specification explicitly states that if nothing is read, the buffer remains untouched. A valid functional implementation can work around this by checking if there is at least one byte to read before zeroing the buffer.

+31
Feb 25 '15 at 7:06
source share

The standard is somewhat ambiguous about this, but I think its reasonable reading is that the answer is: yes, it is not allowed to write more bytes to the buffer than it reads + null. On the other hand, a more rigorous reading / interpretation of the text may be that the answer is no, there is no guarantee. Here a publicly announced project talks about fgets .

char *fgets(char * restrict s, int n, FILE * restrict stream) ;

The fgets function reads no more than one number, indicated by the number n from the stream pointed to by stream to the array pointed to by s . No additional characters are read after the newline character (which is saved) or after the end of the file. A null character is written immediately after the last character that is read into the array.

The fgets function returns s if successful. If the end of the file occurs and no characters are read into the array, the contents of the array remain unchanged and a null pointer is returned. If a read error occurs during the operation, the contents of the array are undefined and a null pointer is returned.

There is a guarantee of how much it should be read from the input, i.e. stop reading in a new line or EOF and do not read more than n-1 bytes. Although nothing is said explicitly about how much is allowed to write to the buffer, it is well known that the fgets n parameter is used to prevent buffer overflows. It's a little strange that the standard uses an ambiguous term, which may not necessarily imply that gets cannot write to the buffer more than n bytes if you want to use the terminology that it uses. But note that the same “read” terminology is used for both problems: n -limit and the EOF / newline limit. Therefore, if you interpret an n related “read” as a buffer write limit, then [for consistency] you can / should interpret another “read” in the same way, that is, do not write more than what it reads when the line is shorter buffers.

On the other hand, if you distinguish between the use of the verb phrase “read in” (= “write”) and just “read”, then you cannot read the committee text the same way. You are guaranteed that it will not "read" (= "write") to the array more than n bytes, but if the input line is interrupted before a new line or EOF, you are guaranteed only the rest (from the input) will not be "read", but regardless of whether this is implied, it will not be "read" (= "written in"), the buffer is unclear with this more strict reading. The key issue is the “in” keyword, which is fixed, so the problem is whether the assignment I have indicated in parentheses in the following modified quotation is the intended interpretation:

No additional characters are read [into the array] after the newline character (which is saved) or after the end of the file.

Honestly, one postcondition declared as a formula (and would be rather short in this case) would be much more useful than the word I quoted ...

I cannot bother trying to analyze their record of the *scanf family, because I suspect that it will be even more difficult, given all the other things that happen in these functions; their entry for fscanf is about five pages ... But I suspect that the same logic applies.

+8
Feb 25 '15 at 9:38
source share

Is it guaranteed that the buffer will not be changed outside of the null terminator?

No, there is no guarantee.

Now a buffer is required to equal "123 \ 0efghijklmnop"?

Yes. But this is only because you used the correct parameters for the string-related functions. If you mess up the length of the buffer, input modifiers to sscanf , etc., then the program will compile. But it most likely will fail at runtime.

If the read line is only 3 characters long, can you be sure that the 6th character is the same as before fgets?

Yes. After the fgets() digit, you have an input string of 3 characters, it stores the input in the provided buffer, and it does not care about resetting the provided space at all.

+4
Feb 25 '15 at 6:51
source share

Now a buffer is required to equal "123 \ 0efghijklmnop"?

Here buffer consists only of line 123 , guaranteed completion in NUL.

Yes, the memory allocated for the buffer array will not be de-allocated, however, you will make sure that / the restriction of your buffer string can contain only 16 char elements that you can read in it moment in time. Now it depends on whether you write only one char or the maximum, what buffer can do.

For example:

 char buffer[4096] = "abc";` 

actually doing something

 memcpy(buffer, "abc", sizeof("abc")); memset(&buffer[sizeof("abc")], 0, sizeof(buffer)-sizeof("abc")); 

The standard states that if any part of the char array is initialized, that is, everything it consists of at any moment, until it obeys its memory boundary.

+1
Feb 25 '15 at 6:28
source share

There are no guarantees from the standard, therefore it is recommended to use the sscanf and fgets functions (according to the buffer size), as you see in your question (and using fgets is considered preferable compared to gets ).

However, some standard functions use a null terminator in their work, for example. strlen (but I suppose you are asking about string modification)

EDIT:

In your example

 fgets(buffer, 10, fp); 

untouchable characters after the 10th are guaranteed (the contents and length of the buffer will not be considered fgets )

EDIT2:

Also, when using fgets remember that '\n' will be stored in buffers. eg.

  "123\n\0fghijklmnop" 

instead of the expected

  "123\0efghijklmnop" 
0
Feb 25 '15 at 6:36
source share

Depends on the function used (and to a lesser extent its implementation). sscanf will start writing when it encounters its first character without spaces and continues to write until the first character of a space, where it will add the finishing 0 and return. But a function like strncpy (cool) resets the rest of the buffer.

However, the C standard does not say anything about how these functions work.

0
Feb 27 '15 at 5:34
source share



All Articles