Is there a hash function that is resistant to small text changes? I am looking for the opposite of a cryptographic hash where small changes in the source lead to huge changes in the result.
Something like a perceptual hash for text. Is there such a thing?
Edited: by "small changes in the text" I mean changes in punctuation, correction of spelling / grammar errors, etc. The text itself is an article similar to a Wikipedia entry (but it can be much smaller, for example, 2 or 3 paragraphs).
Bonus points if someone can point to a Python implementation.
source
share