Character constant EOF

From the C programming language:

int c; while ((c = getchar()) != EOF) putchar(c); 

"... The solution is that getchar returns a distinguishing value when there is no more input, a value that cannot be confused with any real character. This value is called EOF , for" end of file. " We must declare a type large enough to store any value returned by getchar . We cannot use char , since c must be large enough to hold EOF in addition to any possible char . "

I checked stdio.h and printed the EOF value on my system and set -1 . On my system, chars signed, although I understand that it depends on the system. That way EOF can fit in char for my system. I rewrote the small procedure above, specifying c as char , and the program works as intended. There is also a character in the ASCII character table here , which appears to have an empty character corresponding to 255, which appears to act as an EOF .

So, why does it appear that ASCII has the character (255) assigned to EOF? This, apparently, contradicts what is said in the book of the programming language C.

+6
source share
5 answers

So, why does it appear that ASCII has the character (255) assigned to EOF?

This is not true. More precisely, it is not a "EOF" symbol.

The trick is getchar() will always return non-negative values ​​if it has something to read. It will only return -1 (which means that EOF defined for your implementation) if it encounters the end of the file.

The fact that char :

  • 8 bits wide
  • signed and
  • uses a view with two additions,

- This is just a fad of your implementation (although currently it is generally common). Thus, if you use char to store the return value of getchar() , then reading the input may end prematurely: the character with code 255 will be mistaken for -1 a. to. and. EOF , which is a mistake. This is exactly what happened to you. This did not work - on the contrary, your second approach was completely broken.

+3
source

When getchar() reads byte 255, it returns 255. When getchar() detects that there is no more input, it returns -1.

If you store the result in char , you cannot distinguish between them. But when you store them in int , you can. (This statement is independent of the char signature).

Only if you know that the result is valid, can you convert it to char and get a regular C-style character type.

+5
source

According to the getchar () manual, it always returns an int value:

 #include <stdio.h> ... int getchar(void); ... RETURN VALUE fgetc(), getc() and getchar() return the character read as an unsigned char cast to an int or EOF on end of file or error. 

Thus, using char instead of int will truncate (int -1 (0xffffffff) will become char -1 (0xff)) and may cause errors.

+3
source

To understand how this works, imagine what was the getchar mindset writing guy. You need to read the file. Start by creating a procedure - for example:

 unsigned char get_me_a_byte(file)... // 0..255 

now you want to read all bytes from the file:

 unsigned char c; while( c = get_me_a_byte(file) ) // while( (c = get_me_a_byte(file)) != 0 ) { ... do sth } 

The problem is that it will stop when z zero is encountered, but you want to stop when everything is red. Now you are getting smarter - you know that files can be thought of as a sequence of bytes. What if your get_me_a_byte can return a 16 or 32 bit type? Then you can use some value that the byte cannot hold as the end of the file token.

lotto

Since your solution may have:

 int get_me_a byte_U(file) ... // returning bytes as 0..255 int get_me_a byte_S(file) ... // returning bytes as -128..127 

Now you can do:

 int c; while( (c = get_me_a_byte_U(file) != UUU ) .... 

where UUU can be anything: from 256 to MAXINT on your platform

Similarly:

 int c; while( (c = get_me_a_byte_S(file) != SSS ) .... 

where SSS can be anything from MININT ..- 129 and 128..MAXINT

Now, if you chose the first method, the question arises: what should UUU (your EOF) mean?

(- 1) is good for EOF because no matter what bit width of a variable you can assign to it, it will remain (-1). "-1 left", I mean that it will always be the whole template.

 char c = -1; // c = 11111111b / 0xFF / 255 (assuming your char is signed 8bit) short s = -1; // s = 1111111111111111b / 0xFFFF / 65535 int i = -1; // s = 11111111111111111111111111111111b / 0xFFFFFFFF / 4294967295 

This should now be obvious.

+2
source

There is no contradiction.

  • EOF is NOT a symbol, it is just a condition found while reading a file.
  • ASCII 255 sometimes corresponds to inextricable space aka HTML object &nbsp;

As noted in the comments, ASCII encodes only 128 characters, so in addition you will find different encodings.

From the table you're linked to, I would just say:

255 is a non-printable character

+1
source

Source: https://habr.com/ru/post/957148/


All Articles