Base62 String Hash

I want to do something like fingerprint = Digest::SHA256.base64digest(str), but for base62 instead of base64. How to efficiently create a unique hash code from a base62 encoded string for any string?

+5
source share
1 answer

Base 64 is widely used for encoding binary data, because 6 bits correspond exactly to one character, but there are still enough printable ASCII characters to represent all possible patterns. In other words, 64 available characters represent all 64 different bit patterns from decimal from 0 to decimal 63.

There are several problems with encoding binary data as base 62, based on the fact that an alphabet of size 62 just doesn't fit. You can simply map the binary data from the digest algorithm into 32-bit chunks, and then assign each of these 5-bit fragments to a character. However, this means that the characters above "v" will no longer be used, so you will essentially get the base 32 encoding.

In terms of efficiency, base 62 will never even come close to base64. Base 64 encoding is dead simple: take 6 bits, draw them on a character, repeat to the end. It is so simple because 64 is power 2. With base 62, however, you will need to convert to an integer and start transferring the “remainder” with each step, because the patterns do not fit evenly.

, , , .

-

url, , , :

# sample string
str = 'foo'

# original base 64 method for comparison
Digest::SHA256.base64digest(str)
#=> "LCa0a2j/xo/5m0U8HTBBNBNCLXBkg7+g+YpeiGJm564="

# url safe variant (no slash or plus characters)
Base64.urlsafe_encode64(Digest::SHA256.digest(str))
#=> "LCa0a2j_xo_5m0U8HTBBNBNCLXBkg7-g-YpeiGJm564="

# hexadecimal (base 16)
Digest::SHA256.hexdigest(str)
#=> "2c26b46b68ffc68ff99b453c1d30413413422d706483bfa0f98a5e886266e7ae"

# or base 32
# gem install base32
require 'base32'
Base32.encode(Digest::SHA256.digest(str))
#=> "FQTLI23I77DI76M3IU6B2MCBGQJUELLQMSB37IHZRJPIQYTG46XA===="

# or with direct url encoding
# not pretty, but url safe!
require 'open-uri'
URI::encode(Digest::SHA256.digest(str))
#=> ",&%B4kh%FF%C6%8F%F9%9BE%3C%1D0A4%13B-pd%83%BF%A0%F9%8A%5E%88bf%E7%AE"

# or url url escaped base 64
# not pretty, but url safe!
require 'cgi'
CGI::escape(Digest::SHA256.base64digest(str))
#=> "LCa0a2j%2Fxo%2F5m0U8HTBBNBNCLXBkg7%2Bg%2BYpeiGJm564%3D"

-

: base62; -)

# gem install base62
require 'base62'

def pack_int(str)
  str.unpack('C*').each_with_index.reduce(0){|r,(x,i)| r + (x << 8*i) }
end

def unpack_int(int)
  n = (Math.log2(int)/8).ceil
  n.times.map{|i| (int >> 8*i) & 255 }.pack('C*')
end

def base62_encode(str)
  Base62.encode(pack_int(str))
end

def base62_decode(encoded)
  unpack_int(Base62.decode(encoded))
end

str = "foo"

# encode
digest = Digest::SHA256.digest(str)
fingerprint = base62_encode(digest)
#=> "fTSIMrZT3fDTvW7XDBq1b7nhWa24Zl55EVpsaO3TBBE"

# decode
recovered_digest = base62_decode(fingerprint)
#=> ",&\xB4kh\xFF\xC6\x8F\xF9\x9BE<\x1D0A4\x13B-pd\x83\xBF\xA0\xF9\x8A^\x88bf\xE7\xAE"

digest == recovered_digest
#=> true
+2

Source: https://habr.com/ru/post/1542341/


All Articles