Are there any cases where fseek / ftell might give the wrong file size?

In C or C ++, you can use the following to return the file size:

const unsigned long long at_beg = (unsigned long long) ftell(filePtr); fseek(filePtr, 0, SEEK_END); const unsigned long long at_end = (unsigned long long) ftell(filePtr); const unsigned long long length_in_bytes = at_end - at_beg; fprintf(stdout, "file size: %llu\n", length_in_bytes); 

Are there development environments, compilers, or operating systems that can return the wrong file size from this code based on indentation or other information that is specific to the situation? Were there any changes to the C or C ++ specification around 1999, which would cause this code to no longer work in certain cases?

For this question, please assume I -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE=1 adding more file support by compiling the flags -D_FILE_OFFSET_BITS=64 -D_LARGEFILE64_SOURCE=1 . Thanks.

+4
source share
4 answers

It will not work with files like /proc/cpuinfo or /dev/stdin or /dev/tty , or files with files received using popen

And this will not work if this file is written by another process at the same time.

Using the Posix stat function is probably more efficient and reliable. Of course, this feature may not be available on systems without Posix.

+6
source

The fseek and ftell functions are both defined by the ISO C language standard.

Below is the latest public draft of the 2011 C standard, but the ISO C standards of 1990, 1999 and 2011 are very similar in this area, if not identical.

7.21.9.4:

The ftell function gets the current value of the file position of the stream indicator pointed to by the stream . For a binary stream, the value is the number of characters at the beginning of the file. For a text stream, the file position indicator contains unspecified information used by the fseek function to return the file position indicator for the stream to its position during the ftell call; the difference between two such return values ​​is not necessarily a significant measure of the number of characters written or read.

7.21.9.2:

The fseek function sets the file position indicator for the stream pointed to by the stream . If a read or write error occurs, the error indicator for the stream is set and fseek does not work.

For a binary stream, the new position, measured in characters from the beginning of the file, is obtained by adding offset to the position indicated from where . The indicated position is the beginning of the file, if SEEK_SET is coming from , the current value of the file position indicator, if SEEK_CUR , or the end of the file, if SEEK_END . The binary stream does not require significant support for fseek calls with a value of SEEK_END .

For a text stream, either offset must be zero, or offset must be the value returned by a successful ftell call previously for the stream associated with the same file, and where should SEEK_SET come from .

Violation of any of the provisions "must" makes the behavior of your program undefined.

So, if the file was opened in binary mode, ftell gives you the number of characters at the beginning of the file, but fseek relative to the end of the file ( SEEK_END ) is not necessarily meaningful. This allows you to use systems that store binary files in entire blocks and do not track how much was written to the last block.

If the file was opened in text mode, you can search for the beginning or end of the file with offset 0 or you can search for the position specified by the previous ftell call; fseek with any other arguments has undefined behavior. It hosts systems in which the number of characters read from a text file does not necessarily correspond to the number of bytes in the file. For example, when reading Windows, the CR-LF pair ( "\r\n" ) reads only one character, but advances 2 bytes in the file.

In practice, on Unix-like systems, text and binary modes behave the same, and the fseek / ftell method will work. I suspect it will work on Windows (I think ftell will give a byte offset that may not be the same as the number of times you could call getchar() in text mode).

Note that ftell() returns a result of type long . On systems where long is 32 bits, this method cannot work for files of 2 gigabytes or more.

It may be useful for you to use some kind of system method to get the file size. Since the fseek / ftell method is in any case a system method, for example stat() on Unix-like systems.

On the other hand, fseek and ftell can work, as you expect, on most systems that you are likely to encounter. I am sure that there are systems in which this will not work; Sorry, but I have no features.

If working with Linux and Windows is good enough and you are not dealing with large files, then the fseek / ftell method is probably appropriate. Otherwise, you should use the system method to determine the file size.

And keep in mind that everything that tells you the file size can only tell you the size of this moment. File size may change before access.

+3
source

1) Superficially, your code looks "OK" - I do not see any problems with it.

2) No - there is no "C or C ++ specification" that would affect fseek. There is a Posix specification:

3) If you want "file size", my first choice is probably to be "stat ()". Here is the Posix specification:

4) If something "goes wrong" with your method, then my first assumption will be "great file support."

For example, many operating systems had parallel fseek () and fseek64 () APIs.

'Hope this helps .. PSM

+1
source

POSIX defines the return value from fseek as "measured in bytes from the beginning of the file." Your at_beg will always be zero (if it is a newly opened file).

So, assuming that:

  • file is searchable
  • no concurrency issues related
  • the file size is represented in the data type used by the fseek / ftell option of your choice.

then your code should work on any POSIX-compatible system.

+1
source

Source: https://habr.com/ru/post/1394686/


All Articles