String vs char []

I have a few slides from IBM: β€œFrom Java code to Java heap: understanding the memory usage of your application,” which says when we use String instead of char[] , there is

The maximum overhead will be 24: 1 for one character!

but I can’t understand what overhead is here. Can anybody help?

Source:

enter image description here

+46
java string memory-management memory
Nov 20 '13 at 12:21
source share
4 answers

This figure applies to JDK 6-32 bits.

Jdk 6

In the lines of the pre-Java-7 world that were implemented as a pointer to a char[] array region:

 // "8 (4)" reads "8 bytes for x64, 4 bytes for x32" class String{ //8 (4) house keeping + 8 (4) class pointer char[] buf; //12 (8) bytes + 2 bytes per char -> 24 (16) aligned int offset; //4 bytes -> three int int length; //4 bytes -> fields align to int hash; //4 bytes -> 16 (12) bytes } 

So, I calculated:

 36 bytes per new String("a") for JDK 6 x32 <-- the overhead from the article 56 bytes per new String("a") for JDK 6 x64. 


Jdk 7

Just for comparison, in JDK 7+ String is a class that contains only the char[] buffer and the hash field.

 class String{ //8 (4) + 8 (4) bytes -> 16 (8) aligned char[] buf; //12 (8) bytes + 2 bytes per char -> 24 (16) aligned int hash; //4 bytes -> 8 (4) aligned } 

So this is:

 28 bytes per String for JDK 7 x32 48 bytes per String for JDK 7 x64. 

UPDATE

For a 3.75:1 ratio, see @Andrey's explanation below. This proportion drops to 1 as the length of the string grows.

Useful links:

+37
Nov 20 '13 at 12:52
source share

In the JVM, a character variable is stored in one 16-bit memory allocation, and changes to this Java variable overwrite the same memory location. This makes creating or updating character variables very fast and cheap, but increases the JVM overhead compared to the static distribution used in strings.

The JVM stores Java strings in a variable-size memory space (essentially an array) that is exactly the same size (plus 1 for a line termination character) of a string when a String object is created or a value is first assigned. Thus, an object with an initial value of "HELP!" 96 bits of memory will be allocated (6 characters, each of 16 bits). This value is considered unchanged, allowing the JVM to embed references to this variable, making static line assignments very fast and very compact, and also very efficient from the point of view of the JVM.

Link

+9
Nov 20 '13 at
source share

I will try to explain the numbers mentioned in the original article.

The article describes object metadata, usually consisting of: class, flags, and locks.

The class and lock are stored in the object header and occupy 8 bytes on a 32-bit virtual machine. I did not find any information about the JVM implementation that has flag information in the object header. Perhaps this is stored somewhere outside (for example, by a garbage collector to count references to an object, etc.).

So, let's say that the article talks about some x32 AbstractJVM, which uses 12 bytes of memory to store meta-information about the object.

Then for char[] we have:

  • 12 bytes of meta information (8 bytes on x32 JDK 6, 16 bytes on x64 JDK)
  • 4 bytes for array size
  • 2 bytes for each character stored
  • 2 alignment bytes if the number of characters is odd (on x64 JDK: 2 * (4 - (length + 2) % 4) )

For java.lang.String we have:

  • 12 bytes of meta information (8 bytes on x32 JDK6, 16 bytes on x64 JDK6)
  • 16 bytes for string fields (this is for JDK6, 8 bytes for JDK7)
  • memory needed to store char [] as described above

So, let it count how much memory it takes to store "MyString" as a String object:

 12 + 16 + (12 + 4 + 2 * "MyString".length + 2 * ("MyString".length % 2)) = 60 bytes. 

On the other hand, we know that to store only data (without information about the data type, length or something else) we need:

 2 * "MyString".length = 16 bytes 

Overhead 60/16 60 / 16 = 3.75

Similarly, for a single character array, we get the "maximum overhead":

 12 + 16 + (12 + 4 + 2 * "a".length + 2 * ("a".length % 2)) = 48 bytes 2 * "a".length = 2 bytes 48 / 2 = 24 

In accordance with the logic of the authors of the article, ultimately, the maximum overhead cost of infinity of value is achieved when we store an empty string :).

+3
Nov 29 '13 at 11:00
source share

I read from an old stackoverflow answer that couldn't get it. In Oracle JDK, a row contains four instance level fields:

 A character array An integral offset An integral character count An integral hash value 

This means that each line introduces an additional reference to the object (the line itself) and three integers in addition to the character array itself. (There is an offset and number of characters to allow sharing an array of characters among String instances created using the String # substring () methods, a design choice that some other Java library developers have avoided.) In addition to the additional storage costs, there is also one higher access level, not to mention border checking, with which String protects its array of characters.

If you can get rid of the allocation and consumption of only the main array of characters, space will be saved there. Of course, this is not so idiomatic for Java; reasonable comments would be justified in order to justify the choice, preferably with reference to evidence because he profiled the difference.

+1
Nov 20 '13 at 13:37
source share



All Articles