What values ​​can I use in a C string?

I came across the following code:

char buf[100]; char buf2[100]; strcpy( buf, "áéíóúç" ); sprintf(buf2, "%s", buf); 

And I was wondering if it is right or not. I tested it on Windows and Linux, and it really worked, but will it work on all OS / platforms of different languages?

Both strcpy and sprintf expect the C string to end with a null character, but can the contents of the C string be anything (except for the null character)?

Is it possible to do something like:

 strcpy( buf, "\x0a\x09\x08\x07\x06\x05\x04\x03\x02\x01\x00" ); sprintf(buf2, "%s", buf); 

?

+4
source share
5 answers

A char array is just an array of bytes, and all non-wide string functions work on this assumption. The only byte that has special meaning is null byte.

The C standard, as far as I remember, does not matter much with respect to character encodings (or text in general), so your program will fail on a platform where the expected encoding of the output character does not match your code.

+3
source

This question is in place, but:

String functions stop only at the NULL character, since the definition of a c-string is a zero-terminated byte. So your example is fine.

+2
source

char is the smallest addressable unit in the machine. Everything you use today will have 8 bits or one byte. You can put anything represented by an 8-bit integer into it.

When working with character sets, there are sets that use 16 bits per character (encoded characters). In this case, you have a problem if you did not know about it, and your buffer was too small to store data.

Suggested reading: http://www.joelonsoftware.com/articles/Unicode.html

+2
source

Yes. *

*) Note, however, that the second example will be a short character, the \0 character indicates the end of the line, and as such will not be printed.

+1
source

Almost always, your code will work.

However, I see two possible minor issues:

  • some older C compilers may not accept C source code outside of ASCII - or maybe EBCDIC on strange mainframes - characters (so accented characters may not be welcome even in lines and comments).
  • Even on a recent Linux system, you can compile the UTF8 encoding, but your executable will be launched with a different encoding (for example, ISO8859-1) and localization.

In practice, these points are negligible today, since the latest GCC compilers accept UTF8, and most Linux accept UTF8. I will not practice in practice.

perhaps by studying internationalization and gettext et al. may be useful

+1
source

Source: https://habr.com/ru/post/1396470/


All Articles