Why do I need multiple EOF characters (CTRL + Z)?

As a small background, I'm fairly new to the C programming language, and as such I tried to complete some of the exercises in the second edition of Kernighan and Ritchie's tutorial. I understand that I could possibly deal with some problems in more detail using the standard library, but I try to keep my repertoire of useful commands in sync with the book as much as possible.

If that matters, I compile my source in Windows XP using the Tiny C Compiler (TCC) and execute the binaries in the XP console (cmd.exe).

Problem : Processing End-of-File (EOF) characters . I gave a small test case to illustrate the problem. The program seems to handle the character of the EOF (partially). I will try to demonstrate a problem with examples of inputs / outputs.

 #include <stdio.h> int main() { int character, count; character = 0; character = getchar(); for (count = 0; character != EOF; ++count) { character = getchar(); } printf("Count: %d", count); return 0; } 

Entry example 1: abcd^Z[enter] (where ^ Z / CTRL + Z represents the EOF character, and [enter] represents the Enter key.)

Output Example 1: Count: 4 (expects more input or ends with ^ C / ^ Z [enter])

Input Example 2: abcd^Zefgh

Output Example 2: Count: 4 (expects more input or ends with ^ C / ^ Z [enter])

As noted in both examples, the number of characters is not output until the sequence ^ C / ^ Z [enter] begins. Before initiation, the program expects (and does process) more input. However, as noted in Example 2, when the program encounters the original ^ Z, it stops processing this input line, waiting for more input or returns the correct counter if the sequence ^ C / ^ Z [input] is initiated.

I canโ€™t understand why the program only partially processes the EOF character. It seems to me that if it cuts off the end of sample 2, it should also completely exit the loop. Any ideas why, when recognizing the EOF character, the program does not immediately display the current counter and exit?

+4
source share
5 answers

This answer is unix-ish, but I think a similar phenomenon is happening on Windows. The main form of EOF is the zero read length. On interactive input devices (terminals) there is a special mechanism for EOF in the input stream, but if it is already entered for reading, it will be consumed with this input (the result is a non-zero read length), and therefore is never noticed by the application. Only when an EOF is encountered without first buffering input can it be noticed and acted upon by the application.

If you have access to a Linux system (or another * nix), write a similar test program and run it under strace . Watch the basic read calls that occur, and the reason for this otherwise-unintuitive behavior will make sense.

+6
source

This dates back to the era of stones. At least CP / M, possibly a longer back with the early DEC operating systems. CP / M did not save the file size; it only tracked the number of disk sectors, 128 bytes each. Not a problem for binary files, the program just stops reading when that's enough. But of course, the problem is for text files.

So, by convention, the end of the text file was marked with the code 0x1a, Control + Z. Sealed with a legacy of text files that were larger than the amount of text in them, this should be carried over in each subsequent generation of CRT implementations. Windows doesn't think it over, it's just a detail of CRT implementation. This is why typing Ctrl + Z on the console does nothing special. When you press Enter, the CRT in cmd.exe will again invoke the previous behavior and declare EOF.

+1
source

I donโ€™t know exactly with TCC, but in quite a few (more often?) Cases you need to enter ^ Z more or less independently so that it is recognized as EOF (ie you need the sequence from [enter] ^ z [enter]) .

0
source

EOF is not automatically created by Windows when you type ^ Z; this is just an agreement ported from DOS. The runtime of your C compiler should recognize it and set the EOF flag, and I assume Tiny C does not.

^ C, on the other hand, is recognized by the Windows command environment. This does not necessarily mean EOF, I think it is more an interrupt signal.

0
source

I would suggest that standard input is a string buffer (it is on Unix). DOS had some getch() and getche() functions that are lower than stdio, so they bypass stdio buffering. I donโ€™t know how to disable input buffering on Windows, but on Unix do this by setting the terminal to non-canonical mode.

0
source

Source: https://habr.com/ru/post/1347981/


All Articles