Does filecmp.cmp () ignore different os.stat () signatures?

Python 2 filecmp() for filecmp() say:

Unless specified shallow and false, files with an identical os.stat() signature are considered equal.

It seems that two files that are identical, with the exception of their signature os.stat() , will be considered unequal, but this does not seem to be the case as shown in the example of the following code fragment:

 import filecmp import os import shutil import time with open('test_file_1', 'w') as f: f.write('file contents') shutil.copy('test_file_1', 'test_file_2') time.sleep(5) # pause to get a different time-stamp os.utime('test_file_2', None) # change copied file time-stamp print 'test_file_1:', os.stat('test_file_1') print 'test_file_2:', os.stat('test_file_2') print 'filecmp.cmp():', filecmp.cmp('test_file_1', 'test_file_2') 

Output:

 test_file_1: nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0, st_uid=0, st_gid=0, st_size=13L, st_atime=1320719522L, st_mtime=1320720444L, st_ctime=1320719522L) test_file_2: nt.stat_result(st_mode=33206, st_ino=0L, st_dev=0, st_nlink=0, st_uid=0, st_gid=0, st_size=13L, st_atime=1320720504L, st_mtime=1320720504L, st_ctime=1320719539L) filecmp.cmp(): True 

As you can see, the two time stamps β€” st_atime , st_mtime and st_ctime β€” are clearly not the same, but filecmp.cmp() indicates that they are the same. I don’t understand something or is there an error in the implementation of filecmp.cmp() or its documentation?

Update

The Python 3 documentation has been rephrased, and currently says the following, IMHO is an improvement only in the sense that it better implies that files with different timestamps can still be considered equal, even if the shallow is True.

If not shallow, files with identical os.stat() signatures are considered equal. Otherwise, the contents of the files are compared.

FWIW I think it would be better to just say something like this:

If not shallow, the contents of the file are compared only when the os.stat() signatures are not equal.

+4
source share
2 answers

You misunderstand the documentation. Line number 2 says:

If the value is not specified shallow and false, files with identical os.stat() signals are considered equal.

Files with identical os.stat() signatures are considered equal, but the logical inverse is not true: files with unequal os.stat() signatures os.stat() not necessarily accepted to be unequal. Rather, they may be unequal, in which case the actual contents of the file are compared. Since the contents of the file are identical, filecmp.cmp() returns True .

According to the third sentence, when it determines that the files are equal, it will cache this result and not bother re-reading the contents of the file if you ask it to compare the same files again if these os.stat files do not change structure.

+6
source

It seems that "folding your own" is really what is required to obtain the desired result. It would be nice if the documentation was clear enough to make the casual reader through.

Here is the function I'm using right now:

 def cmp_stat_weak(a, b): sa = os.stat(a) sb = os.stat(b) return (sa.st_size == sb.st_size and sa.st_mtime == sb.st_mtime) 
+1
source

Source: https://habr.com/ru/post/1380116/


All Articles