How to compute a double-precision estimate from the first 8 bytes of a string in Python?

Question

How to compute a double-precision estimate from the first 8 bytes of a string in Python?

Trying to get double precision floating point evaluation from a UTF-8 encoded string object in Python. The idea is to capture the first 8 bytes of a string and create a float , so that the strings ordered by their count will be lexicographically sorted according to their first 8 bytes (or, possibly, their first 63 bits, after they force everything, to be positive, to avoid sign errors).

For instance:

 get_score(u'aaaaaaa') < get_score(u'aaaaaaab') < get_score(u'zzzzzzzz')

I tried to compute the score in integer form using left-shift-bit and XOR, but I'm not sure how to translate this value into a float value. I am also not sure if there is a better way to do this.

How to calculate the score for a string so that the condition above is met?

Edit: The string object is encoded in UTF-8 encoding (according to @Bakuriu commment).

+6

python sorting double floating-point unicode

Juan Carlos Coto Oct 23 '13 at 18:51

source share

2 answers

You will need to configure the entire alphabet and perform the conversion manually, since the transformations to the database> 36 are not built-in, for this you only need to determine the complete alphabet. If it was an ascii string, for example, you would create a conversion to a long base 256 from the input string, using the entire ascii table as an alphabet.

You have an example of the complete functions that can be done here: line for base number 62

Also, you do not need to worry about negative numbers in this case, since encoding a string with the first character in the alphabet will result in the lowest possible number in the representation, which is a negative value with the highest absolute value, in your case -2 ** 63, which is the correct value and allows you to use <> against it.

Hope this helps!

+1

Sergio Ayestarán Oct 23 '13 at 20:11

source share

Ignacio Vazquez-Abrams · Accepted Answer · 2013-10-23T18:55:52+0000

float will not give you 64 bits of precision. Use integers instead.

 def get_score(s): return struct.unpack('>Q', (u'\0\0\0\0\0\0\0\0' + s[:8])[-8:])[0]

In Python 3:

 def get_score(s): return struct.unpack('>Q', ('\0\0\0\0\0\0\0\0' + s[:8])[-8:].encode('ascii', 'error'))[0]

EDIT:

For float s with 6 characters:

 def get_score(s): return struct.unpack('>d', (u'\0\1' + (u'\0\0\0\0\0\0\0\0' + s[:6])[-6:]).encode('ascii', 'error'))[0]

How to compute a double-precision estimate from the first 8 bytes of a string in Python?

More articles: