Large (β 2**40 ) block sizes lead to a MemoryError , that is, there is no limit other than the available RAM. bufsize , bufsize other hand, bufsize limited to 2**31-1 on my machine:
import hashlib from functools import partial def md5(filename, chunksize=2**15, bufsize=-1): m = hashlib.md5() with open(filename, 'rb', bufsize) as f: for chunk in iter(partial(f.read, chunksize), b''): m.update(chunk) return m
A large chunksize can be as slow as a very small one. Measure it.
I find that for files β 10 MB 2**15 chunksize is the fastest for the files I tested.
jfs Feb 11 2018-11-11T00: 00Z
source share