How to convert Java long strings while maintaining a natural order

I'm currently looking at a simple programming problem that might be interesting for optimization - at least for those who think that programming is an art :) So, here it is:

What is the best way to imagine long lines while preserving their natural order?

In addition, the String representation must match ^[A-Za-z0-9]+$ . (I'm not too strict here, but avoid using control characters or anything that can cause headaches with encodings, is illegal in XML, has line breaks or similar characters that will undoubtedly cause problems)

Here is a JUnit test case:

 @Test public void longConversion() { final long[] longs = { Long.MIN_VALUE, Long.MAX_VALUE, -5664572164553633853L, -8089688774612278460L, 7275969614015446693L, 6698053890185294393L, 734107703014507538L, -350843201400906614L, -4760869192643699168L, -2113787362183747885L, -5933876587372268970L, -7214749093842310327L, }; // keep it reproducible //Collections.shuffle(Arrays.asList(longs)); final String[] strings = new String[longs.length]; for (int i = 0; i < longs.length; i++) { strings[i] = Converter.convertLong(longs[i]); } // Note: Comparator is not an option Arrays.sort(longs); Arrays.sort(strings); final Pattern allowed = Pattern.compile("^[A-Za-z0-9]+$"); for (int i = 0; i < longs.length; i++) { assertTrue("string: " + strings[i], allowed.matcher(strings[i]).matches()); assertEquals("string: " + strings[i], longs[i], Converter.parseLong(strings[i])); } } 

and here are the methods I'm looking for

 public static class Converter { public static String convertLong(final long value) { // TODO } public static long parseLong(final String value) { // TODO } } 

I already have some ideas on how to approach this problem. However, although I could get some good (creative) community suggestions.

It would also be nice if this conversion were

  • as short as possible
  • easy to implement in other languages

EDIT: I am very glad to see that two very respected programmers are facing the same problem as me: using "-" for negative numbers cannot work, because "-" does not cancel the sort order

  • -0001
  • -0002
  • 0000
  • 0001
  • 0002
+4
source share
4 answers

Ok, take two:

 class Converter { public static String convertLong(final long value) { return String.format("%016x", value - Long.MIN_VALUE); } public static long parseLong(final String value) { String first = value.substring(0, 8); String second = value.substring(8); long temp = (Long.parseLong(first, 16) << 32) | Long.parseLong(second, 16); return temp + Long.MIN_VALUE; } } 

That explains a little. First, let me demonstrate that it is reversible, and the resulting transformations must demonstrate order:

 for (long aLong : longs) { String out = Converter.convertLong(aLong); System.out.printf("%20d %16s %20d\n", aLong, out, Converter.parseLong(out)); } 

Conclusion:

 -9223372036854775808 0000000000000000 -9223372036854775808 9223372036854775807 ffffffffffffffff 9223372036854775807 -5664572164553633853 316365a0e7370fc3 -5664572164553633853 -8089688774612278460 0fbba6eba5c52344 -8089688774612278460 7275969614015446693 e4f96fd06fed3ea5 7275969614015446693 6698053890185294393 dcf444867aeaf239 6698053890185294393 734107703014507538 8a301311010ec412 734107703014507538 -350843201400906614 7b218df798a35c8a -350843201400906614 -4760869192643699168 3dedfeb1865f1e20 -4760869192643699168 -2113787362183747885 62aa5197ea53e6d3 -2113787362183747885 -5933876587372268970 2da6a2aeccab3256 -5933876587372268970 -7214749093842310327 1be00fecadf52b49 -7214749093842310327 

As you can see, Long.MIN_VALUE and Long.MAX_VALUE (the first two lines) are correct, and the other values ​​mostly fall in the line.

What does it do?

Assuming signed byte values:

  • -128 => 0x80
  • -1 => 0xFF
  • 0 => 0x00
  • 1 => 0x01
  • 127 => 0x7F

Now, if you add 0x80 to the values ​​you get:

  • -128 => 0x00
  • -1 => 0x7F
  • 0 => 0x80
  • 1 => 0x81
  • 127 => 0xFF

which is the correct order (with overflow).

Basically the above is done with 64-bit signed longs instead of 8-bit signed bytes.

Going back is a little more circular. You might think you can use:

 return Long.parseLong(value, 16); 

but you cannot. Go to 16 f to this function (-1) and it will throw an exception. This seems to be seen as an unsigned hex value that long cannot accommodate. So instead, I split it in half and took apart each part, combining them together, shifting the first half by 32 bits.

+13
source

EDIT: Okay, so just adding a negative sign for negative numbers doesn’t work ... but you can convert the value to effectively unsigned so that Long.MIN_VALUE is mapped to "0000000000000000" and Long. MAX_VALUE corresponds to "FFFFFFFFFFFFFFFFF". It’s harder to read, but get the right results.

Basically, you just need to add 2 ^ 63 to the value before converting it to hexadecimal - but it can be a minor pain in Java because it has no unsigned lengths ... it could be easiest to do using BigInteger :

 private static final BigInteger OFFSET = BigInteger.valueOf(Long.MIN_VALUE) .negate(); public static String convertLong(long value) { BigInteger afterOffset = BigInteger.valueOf(value).add(OFFSET); return String.format("%016x", afterOffset); } public static long parseLong(String text) { BigInteger beforeOffset = new BigInteger(text, 16); return beforeOffset.subtract(OFFSET).longValue(); } 

This would not be terribly effective, admittedly, but it works with all of your test cases.

+2
source

If you don't need a print string, you can encode a long one in four characters after you shift the value to Long.MIN_VALUE (-0x80000000) to emulate an unsigned long:

 public static String convertLong(long value) { value += Long.MIN_VALUE; return "" + (char)(value>>48) + (char)(value>>32) + (char)(value>>16) + (char)value; } public static long parseLong(String value) { return ( (((long)value.charAt(0))<<48) + (((long)value.charAt(1))<<32) + (((long)value.charAt(2))<<16) + (long)value.charAt(3)) + Long.MIN_VALUE; } 

The use of surrogate pairs is not a problem, since the natural order of the string is determined by the UTF-16 values ​​in its characters, and not by the UCS-2 code values.

0
source

The RFC2550 has an April 1 RFC-1 methodology dedicated to the Y10K issue with 4-digit dates that can be applied for this purpose. Essentially, every time an integer string representation grows to require a different digit, a different character or another (printable) character is added to maintain the desired sort order. Negative rules are more mysterious, giving way to lines that are harder to read at a glance ... but still easy to apply in code.

Well, for positive numbers, they are still readable.

Cm:

http://www.faqs.org/rfcs/rfc2550.html

0
source

Source: https://habr.com/ru/post/1300858/


All Articles