Zero-garbage big string deserialization in Java, Humongous object issue

I am looking for a way to deserialize a String from byte[] in Java with as little garbage as possible. Since I create my own serializer and de-serializer, I have complete freedom to implement any solution on the server side (i.e., when serializing data) and on the client side (i.e. when de-serializing data).

I was able to efficiently serialize a String without any garbage crashes by repeating String's characters ( String.charAt(i) ) and converting each char (16-bit value) to 2x 8-bit value. There is a nice discussion about this here. An alternative is to use Reflection to directly access the String's underlying char[] , but this is beyond the scope of the problem.

However, it seems to me impossible to deserialize byte[] without creating a char[] twice , which seems, well, strange.

Procedure:

  • Create char[]
  • Iterate through byte[] and fill in char[]
  • Create string with constructor String(char[])

Due to the inviolability rules of Java String constructor copies char [], creating a 2x GC overhead. I can always use mechanisms to get around this (Unsafe String allocation + Reflection to set an instance of char[] ), but I just wanted to ask if there are any consequences for this other than me breaking every convention in String's immutability.

Of course, the wisest answer to this is: โ€œStop, stop doing it and be sure of GC, the original char[] will be extremely short-lived, and G1 will get rid of it instantly,โ€ which actually makes sense if char[] less than 1/2 area size G1 . If it is larger, char [] will be directly allocated as an object with a large number of objects (i.e., it automatically extends outside the G1 area). Such objects are extremely difficult to effectively collect garbage collected in G1. That is why every distribution matters.

Any ideas on how to solve this problem?

Thank you very much.

+6
source share
2 answers

I found a solution that is useless if you have an unmanaged environment.

The java.lang.String class has a private-private String(char[] value, boolean share) constructor String(char[] value, boolean share) .

A source:

 /* * Package private constructor which shares value array for speed. * this constructor is always expected to be called with share==true. * a separate constructor is needed because we already have a public * String(char[]) constructor that makes a copy of the given char[]. */ String(char[] value, boolean share) { // assert share : "unshared not supported"; this.value = value; } 

This is widely used in Java, for example. in Integer.toString() , Long.toString() , String.concat(String) , String.replace(char, char) , String.valueOf(char) .

The solution (or hack what you want to call) is to move the class to the java.lang and gain access to the private-package constructor. It will not be good to get to know the security manager, but it can be circumvented.

+1
source

Such objects are extremely complex for efficient garbage collection in G1.

This may not be true, but you will need to evaluate it for your application. JDK Bugs 8027959 and 8048179 introduce new mechanisms for collecting bulky, short-lived objects. According to the error flags, you may have to work with jdk versions โ‰ฅ8u40 and โ‰ฅ8u60 to take advantage of their respective advantages.

Experimental option of interest:

 -XX:+G1ReclaimDeadHumongousObjectsAtYoungGC 

Tracing:

 -XX:+G1TraceReclaimDeadHumongousObjectsAtYoungGC 

For more tips and questions regarding these features, I would recommend using the hotspot-gc-use mailing list.

+3
source

Source: https://habr.com/ru/post/981336/


All Articles