How can a character be represented by a bit pattern containing three octal digits?

From chapter 2 (section 2.3 with the name Constants) of the K & R book in the C programming language:

Some characters can be represented by a character and a constant string using escape sequences such as \ n (newline); these sequences look like two characters, but represent only one. In addition, an arbitrary byte bit size can be specified

′\ooo′ 

where ooo is one to three octal digits (0 ... 7) or

 ′\xhh′ 

where hh is one or more hexadecimal digits (0 ... 9, a ... f, A ... F). So we could write

 #define VTAB ′\013′ /* ASCII vertical tab */ #define BELL ′\007′ /* ASCII bell character */ or, in hexadecimal, #define VTAB ′\xb′ /* ASCII vertical tab */ #define BELL ′\x7′ /* ASCII bell character */ 

The part that bothers me is the following formulations (my emphasis): where ooo is from one to three octal digits (0 ... 7). If there are three octal digits, the number of bits required will be 9 (3 for each digit), which exceeds the byte length required for the characters. Surely I missed something. What am I missing?

+4
source share
3 answers

\ooo (3 octal digits) really allows you to specify 9-bit values ​​from 0 to 111111111 (binary) or 511. If enabled, it depends on the size of the char .

Assignments, such as below, generate a warning in many environments, since char is 8 bits in these environments. Usually the maximum allowed octal sequence is \377 . But a char should not be 8 bits. The option "9 ... exceeds the byte length required for characters" is invalid.

 char *s = "\777"; //warning "Octal sequence out of range" char c = '\777'; //warning int i = '\777'; //warning 

The constant with 3 octal digits '\141' same as 'a' in a typical environment where ASCII is used. But in an alternate character set, 'a' may be different. Thus, if you need to assign the portable bit pattern 01100001, you can use '\141' instead of 'a' . One could do the same by assigning '\x61' . In some context, an octal pattern may be preferred.

C11 6.4.4.4.9 If the prefix is ​​not used, "the value of the octal or hexadecimal escape sequence must be in the range of representable values ​​for the corresponding type: unsigned char"

+4
source

The first octal digit is allowed only to 3 (two bits), not 7 (three bits), if we are talking about eight bit bytes. If we are talking about ASCII (7-bit values), the first digit can only be zero or one.

If K & R says otherwise, their description is incomplete or incorrect.

0
source

The range of character code numbers is not defined in K & R, as far as I remember. Previously, this was usually an ASCII range of 0 ... 127. Currently, it is often an 8-bit range, 0 ... 255, but it can be wider. In any case, the restrictions defined by the implementation for the char data type also imply restrictions on the designation of the output.

For example, if the range is 0 ... 127, then \177 is the largest eight-step escape allowed.

0
source

Source: https://habr.com/ru/post/1496205/


All Articles