Take the first m characters of the hash if it is unique in the first m. (This value of m will tend to be O (log (N)), where N is the number of objects.) Here is an example code:
set.seed(1)
v <- do.call(paste0, replicate(n=8, sample(LETTERS, size=100, replace=T), simplify=F))
unique_in_first_m_chars <- function(v, m) {
length(unique(substring(v, 1, m))) == length(v)
}
unique_in_first_m_chars(v, 4)
[1] TRUE
unique_in_first_m_chars(v, 3)
[1] FALSE
unique_in_first_m_chars(v, 2)
[1] FALSE
source
share