For reference, here is a shorter version:
def sha1OfFile(filepath): import hashlib with open(filepath, 'rb') as f: return hashlib.sha1(f.read()).hexdigest()
The second time: although I have never seen this, I think that the potential of f.read() to return is less than the full file, or for a file with a large gigabyte, for f.read () to run out of memory. For each instruction, consider how to fix it: first fix:
def sha1OfFile(filepath): import hashlib sha = hashlib.sha1() with open(filepath, 'rb') as f: for line in f: sha.update(line) return sha.hexdigest()
However, there is no guarantee that '\n' appears in the file at all, so the fact that the for loop will give us file blocks ending in '\n' can give us the same problem that we originally had, Unfortunately, I I donβt see a similar Pythonic method for iterating over file blocks as much as possible, which, I think, means that we are stuck in a while True: ... break and with a magic number for the block size:
def sha1OfFile(filepath): import hashlib sha = hashlib.sha1() with open(filepath, 'rb') as f: while True: block = f.read(2**10)
Of course, who said that we can store one megabyte string. Maybe we can, but what if we are on a tiny embedded computer?
I'm sorry that I could not think of a cleaner method, which, as guaranteed, does not exhaust memory with huge files and has no magic numbers, and also fulfills Pythonic's initial simple solution.
Ben Oct 31 '13 at 16:14 2013-10-31 16:14
source share