What is the optimal initial capacity of a StringBuffer for inputs with highly variable lengths?

Good afternoon, I am using java.lang.StringBuilder to store some characters. I do not know how many characters I am going to store in advance, except that:

  • 60% of the time, it's only (exactly) 7 characters
  • 39% of the time, this is (approximately) 3,500 characters
  • 1% of the time, this is approximately 20 thousand characters

How can we calculate the optimal initial buffer length to be used?

I am currently using new java.lang.StringBuilder(4000) , but that is only because I was too lazy to think before.

+6
source share
1 answer

There are two factors here: time and memory consumption. The time mainly depends on the number of times java.lang.AbstractStringBuilder.expandCapacity() called. Of course, the cost of each call is linear with respect to the current buffer size, but I simplify it here and just count them:

Number expandCapacity() (time)

Default configuration (16 character capacity)

  • In 60% of cases, StringBuilder will expand 0 times
  • In 39% of cases, StringBuilder will expand 8 times
  • In 1% of cases, StringBuilder will expand 11 times

The expected expandCapacity is 3.23.

Starting capacity 4096 characters

  • In 99% of cases, StringBuilder will expand 0 times
  • In 1% of cases, StringBuilder will expand 3 times

The expected expandCapacity is 0.03.

As you can see, the second scenario looks much faster, since it is very rare to expand the StringBuilder (three times for every 100 inputs). Please note, however, that the first extensions are less significant (copying a small amount of memory); also, if you add lines to the builder in huge chunks, it will work more actively with less iterations.

On the other hand, memory consumption is increasing:

Memory consumption

Default configuration (16 character capacity)

  • In 60% of cases StringBuilder will occupy 16 characters
  • In 39% of cases, StringBuilder will occupy 4K characters
  • In 1% of cases StringBuilder will occupy 32K characters

Expected average memory consumption: 1935 .

Starting capacity 4096 characters

  • In 99% of cases, StringBuilder will occupy 4K characters
  • In 1% of cases StringBuilder will occupy 32K characters

Expected average memory consumption: 4383 .


TL DR

This makes me think that increasing the initial buffer to 4 K will increase the memory consumption by more than two times, speeding up the program by two orders of magnitude.

The bottom line is: try it! It’s not so difficult to write a test that will process millions of lines of different lengths with different initial powers. But I believe that a large buffer may be a good choice.

+12
source

Source: https://habr.com/ru/post/906539/


All Articles