Compress 1s and 0s series to the shortest possible ascii string

How could you convert series 1 and 0 to the shortest possible form, consisting of safe ascii characters in a URL?

eg.

 s = '00100101000101111010101' compress(s) 

Result:

Ysi8aaU

And obviously:

decompress(compress(s)) == s

(I ask this question solely out of curiosity)

+6
source share
3 answers

In this solution, I came up with (+ too many comments):

 # A set of 64 characters, which allows a maximum chunk length of 6 .. because # int('111111', 2) == 63 (plus zero) charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_' def encode(bin_string): # Split the string of 1s and 0s into lengths of 6. chunks = [bin_string[i:i+6] for i in range(0, len(bin_string), 6)] # Store the length of the last chunk so that we can add that as the last bit # of data so that we know how much to pad the last chunk when decoding. last_chunk_length = len(chunks[-1]) # Convert each chunk from binary into a decimal decimals = [int(chunk, 2) for chunk in chunks] # Add the length of our last chunk to our list of decimals. decimals.append(last_chunk_length) # Produce an ascii string by using each decimal as an index of our charset. ascii_string = ''.join([charset[i] for i in decimals]) return ascii_string def decode(ascii_string): # Convert each character to a decimal using its index in the charset. decimals = [charset.index(char) for char in ascii_string] # Take last decimal which is the final chunk length, and the second to last # decimal which is the final chunk, and keep them for later to be padded # appropriately and appended. last_chunk_length, last_decimal = decimals.pop(-1), decimals.pop(-1) # Take each decimal, convert it to a binary string (removing the 0b from the # beginning, and pad it to 6 digits long. bin_string = ''.join([bin(decimal)[2:].zfill(6) for decimal in decimals]) # Add the last decimal converted to binary padded to the appropriate length bin_string += bin(last_decimal)[2:].zfill(last_chunk_length) return bin_string 

So:

  >>> bin_string = '00011100001010101010101000101001110'
 >>> encode (bin_string)
 'hcQOPgd'
 >>> decode (encode (bin_string))
 '000111000010101010101000101001110'

And here it is in CoffeeScript:

 class Urlify constructor: -> @charset = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789-_' encode: (bits) -> chunks = (bits[i...i+6] for i in [0...bits.length] by 6) last_chunk_length = chunks[chunks.length-1].length decimals = (parseInt(chunk, 2) for chunk in chunks) decimals.push(last_chunk_length) encoded = (@charset[i] for i in decimals).join('') return encoded decode: (encoded) -> decimals = (@charset.indexOf(char) for char in encoded) [last_chunk_length, last_decimal] = [decimals.pop(), decimals.pop()] decoded = (('00000'+d.toString(2)).slice(-6) for d in decimals).join('') last_chunk = ('00000'+last_decimal.toString(2)).slice(-last_chunk_length) decoded += last_chunk return decoded 
+7
source

As one comment said, using base64 is likely to be a way. However, you do not want to embed binary without conversion.

Two options: first converting to int, and then packaging:

 import base64 s = '0110110' n = int(s, 2) result = base64.urlsafe_b64encode(str(n)).rstrip('=') 

Another option is to use a structural module to package the value in binary format and use it. (Code below from http://www.fuyun.org/2009/10/how-to-convert-an-integer-to-base64-in-python/ )

 import base64 import struct def encode(n): data = struct.pack('<Q', n).rstrip('\x00') if len(data)==0: data = '\x00' s = base64.urlsafe_b64encode(data).rstrip('=') return s def decode(s): data = base64.urlsafe_b64decode(s + '==') n = struct.unpack('<Q', data + '\x00'* (8-len(data)) ) return n[0] 
+5
source

I would convert 8 of these 0 and 1 to bytes using a lookup table and then encode these bytes with base64.

+1
source

Source: https://habr.com/ru/post/887647/


All Articles