I am looking for a way to deserialize a String from byte[] in Java with as little garbage as possible. Since I create my own serializer and de-serializer, I have complete freedom to implement any solution on the server side (i.e., when serializing data) and on the client side (i.e. when de-serializing data).
I was able to efficiently serialize a String without any garbage crashes by repeating String's characters ( String.charAt(i) ) and converting each char (16-bit value) to 2x 8-bit value. There is a nice discussion about this here. An alternative is to use Reflection to directly access the String's underlying char[] , but this is beyond the scope of the problem.
However, it seems to me impossible to deserialize byte[] without creating a char[] twice , which seems, well, strange.
Procedure:
- Create
char[] - Iterate through
byte[] and fill in char[] - Create string with constructor
String(char[])
Due to the inviolability rules of Java String constructor copies char [], creating a 2x GC overhead. I can always use mechanisms to get around this (Unsafe String allocation + Reflection to set an instance of char[] ), but I just wanted to ask if there are any consequences for this other than me breaking every convention in String's immutability.
Of course, the wisest answer to this is: โStop, stop doing it and be sure of GC, the original char[] will be extremely short-lived, and G1 will get rid of it instantly,โ which actually makes sense if char[] less than 1/2 area size G1 . If it is larger, char [] will be directly allocated as an object with a large number of objects (i.e., it automatically extends outside the G1 area). Such objects are extremely difficult to effectively collect garbage collected in G1. That is why every distribution matters.
Any ideas on how to solve this problem?
Thank you very much.
source share