How to compute a double-precision estimate from the first 8 bytes of a string in Python?

Trying to get double precision floating point evaluation from a UTF-8 encoded string object in Python. The idea is to capture the first 8 bytes of a string and create a float , so that the strings ordered by their count will be lexicographically sorted according to their first 8 bytes (or, possibly, their first 63 bits, after they force everything, to be positive, to avoid sign errors).

For instance:

 get_score(u'aaaaaaa') < get_score(u'aaaaaaab') < get_score(u'zzzzzzzz') 

I tried to compute the score in integer form using left-shift-bit and XOR, but I'm not sure how to translate this value into a float value. I am also not sure if there is a better way to do this.

How to calculate the score for a string so that the condition above is met?

Edit: The string object is encoded in UTF-8 encoding (according to @Bakuriu commment).

+6
source share
2 answers

float will not give you 64 bits of precision. Use integers instead.

 def get_score(s): return struct.unpack('>Q', (u'\0\0\0\0\0\0\0\0' + s[:8])[-8:])[0] 

In Python 3:

 def get_score(s): return struct.unpack('>Q', ('\0\0\0\0\0\0\0\0' + s[:8])[-8:].encode('ascii', 'error'))[0] 

EDIT:

For float s with 6 characters:

 def get_score(s): return struct.unpack('>d', (u'\0\1' + (u'\0\0\0\0\0\0\0\0' + s[:6])[-6:]).encode('ascii', 'error'))[0] 
+3
source

You will need to configure the entire alphabet and perform the conversion manually, since the transformations to the database> 36 are not built-in, for this you only need to determine the complete alphabet. If it was an ascii string, for example, you would create a conversion to a long base 256 from the input string, using the entire ascii table as an alphabet.

You have an example of the complete functions that can be done here: line for base number 62

Also, you do not need to worry about negative numbers in this case, since encoding a string with the first character in the alphabet will result in the lowest possible number in the representation, which is a negative value with the highest absolute value, in your case -2 ** 63, which is the correct value and allows you to use <> against it.

Hope this helps!

+1
source

Source: https://habr.com/ru/post/956585/


All Articles