The real question is: what do you intend to use for?
Your first problem is that there are at least four for the “file size”:
The end-of-file offset, which is the number of bytes that you must skip to go from beginning to end of the file.
In other words, this is the number of bytes logically in the file (in terms of usage).
"Actual data length", which is equal to the offset of the first byte, which is not actually stored.
This is always less than or equal to the "end of file" and a multiple of the size of the cluster.
For example, a 1 GB file may have a valid data length of 1 MB. If you ask Windows to read the first 8 MB, it will read the first 1 MB and pretend that the rest of the data was there, returning it as zeros.
"Set size" file. This is always greater than or equal to the "end of file".
This is the number of clusters allocated by the OS for the file, multiplied by the cluster size.
Unlike the case where the “end of file” is larger than the “permissible data length”, redundant bytes are not considered part of the file data, so the OS does not fill the zero buffer if you try to read in the selected area outside the file.
"compressed size" of a file that is valid only for compressed (and sparse?) files.
It is equal to the cluster size multiplied by the number of clusters on the volume that are actually allocated for this file.
For uncompressed and non-sparse files there is no concept of "compressed size"; you would use the "allocated size" instead.
The second problem is that a “file”, such as C:\Foo , can actually have multiple data streams.
This name only applies to the default thread. The file may have alternative streams, such as C:\Foo:Bar , the size of which does not even appear in Explorer!
Your third problem is that a “file” can have multiple names (“hard links”).
For example, C:\Windows\notepad.exe and C:\Windows\System32\notepad.exe are two names for the same file. Any name can be used to open any file stream.
Your fourth problem is that the “file” (or directory) may not actually be a file (or directory):
This can be a soft link ("symbolic link" or "reprocessing point") to another file (or directory).
This other file may not even be on the same drive. It may even point to something on the net, or it may even be recursive! Should the size be infinite if it is recursive?
Fifth, there are “filter” drivers that make certain files or directories look like actual files or directories, although they are not. For example, Microsoft WIM image files (which are compressed) can be “mounted” in a folder using the ImageX tool, and they do not look like reprocessing points or links. They look just like directories, except that they are not really directories, and the concept of "size" does not really make sense to them.
Your sixth problem is that metadata is required for each file.
For example, having 10 names for the same file requires more metadata, which requires space. If the file names are short, having 10 names can be as cheap as 1 name — and if they are long, then having multiple names can use more disk space for metadata. (Same story with multiple threads, etc.)
Do you also consider them?
Mehrdad Sep 05 '13 at 3:04 am
source share