What values can I use in a C string?

Question

What values can I use in a C string?

I came across the following code:

char buf[100]; char buf2[100]; strcpy( buf, "áéíóúç" ); sprintf(buf2, "%s", buf);

And I was wondering if it is right or not. I tested it on Windows and Linux, and it really worked, but will it work on all OS / platforms of different languages?

Both strcpy and sprintf expect the C string to end with a null character, but can the contents of the C string be anything (except for the null character)?

Is it possible to do something like:

 strcpy( buf, "\x0a\x09\x08\x07\x06\x05\x04\x03\x02\x01\x00" ); sprintf(buf2, "%s", buf);

?

+4

c

Renan greinert Feb 14 '12 at 18:57

source share

5 answers

This question is in place, but:

String functions stop only at the NULL character, since the definition of a c-string is a zero-terminated byte. So your example is fine.

+2

MByD Feb 14 '12 at 18:59

source share

char is the smallest addressable unit in the machine. Everything you use today will have 8 bits or one byte. You can put anything represented by an 8-bit integer into it.

When working with character sets, there are sets that use 16 bits per character (encoded characters). In this case, you have a problem if you did not know about it, and your buffer was too small to store data.

Suggested reading: http://www.joelonsoftware.com/articles/Unicode.html

+2

Brian roach Feb 14 '12 at 19:02

source share

Yes. *

*) Note, however, that the second example will be a short character, the \0 character indicates the end of the line, and as such will not be printed.

+1

Blindy Feb 14 '12 at 19:02

source share

Almost always, your code will work.

However, I see two possible minor issues:

some older C compilers may not accept C source code outside of ASCII - or maybe EBCDIC on strange mainframes - characters (so accented characters may not be welcome even in lines and comments).
Even on a recent Linux system, you can compile the UTF8 encoding, but your executable will be launched with a different encoding (for example, ISO8859-1) and localization.

In practice, these points are negligible today, since the latest GCC compilers accept UTF8, and most Linux accept UTF8. I will not practice in practice.

^{perhaps by studying internationalization and gettext et al.} ^{may be useful}

+1

Basile starynkevitch Feb 14 '12 at 19:04

source share

Matti Virkkunen · Accepted Answer · 2012-02-14T18:59:52+0000

A char array is just an array of bytes, and all non-wide string functions work on this assumption. The only byte that has special meaning is null byte.

The C standard, as far as I remember, does not matter much with respect to character encodings (or text in general), so your program will fail on a platform where the expected encoding of the output character does not match your code.

What values ​​can I use in a C string?

More articles:

What values can I use in a C string?