What is the optimal initial capacity of a StringBuffer for inputs with highly variable lengths?

Question

What is the optimal initial capacity of a StringBuffer for inputs with highly variable lengths?

Good afternoon, I am using java.lang.StringBuilder to store some characters. I do not know how many characters I am going to store in advance, except that:

60% of the time, it's only (exactly) 7 characters
39% of the time, this is (approximately) 3,500 characters
1% of the time, this is approximately 20 thousand characters

How can we calculate the optimal initial buffer length to be used?

_{I am currently using new java.lang.StringBuilder(4000) , but that is only because I was too lazy to think before.}

+6

java language-agnostic stringbuilder math buffer

Pacerier Jan 21 '12 at 13:38

source share

1 answer

Tomasz Nurkiewicz · Accepted Answer · 2012-01-21T14:00:53+0000

There are two factors here: time and memory consumption. The time mainly depends on the number of times java.lang.AbstractStringBuilder.expandCapacity() called. Of course, the cost of each call is linear with respect to the current buffer size, but I simplify it here and just count them:

Number `expandCapacity()` (time)

Default configuration (16 character capacity)

In 60% of cases, StringBuilder will expand 0 times
In 39% of cases, StringBuilder will expand 8 times
In 1% of cases, StringBuilder will expand 11 times

The expected expandCapacity is 3.23.

Starting capacity 4096 characters

In 99% of cases, StringBuilder will expand 0 times
In 1% of cases, StringBuilder will expand 3 times

The expected expandCapacity is 0.03.

As you can see, the second scenario looks much faster, since it is very rare to expand the StringBuilder (three times for every 100 inputs). Please note, however, that the first extensions are less significant (copying a small amount of memory); also, if you add lines to the builder in huge chunks, it will work more actively with less iterations.

On the other hand, memory consumption is increasing:

Memory consumption

Default configuration (16 character capacity)

In 60% of cases StringBuilder will occupy 16 characters
In 39% of cases, StringBuilder will occupy 4K characters
In 1% of cases StringBuilder will occupy 32K characters

Expected average memory consumption: 1935 .

Starting capacity 4096 characters

In 99% of cases, StringBuilder will occupy 4K characters
In 1% of cases StringBuilder will occupy 32K characters

Expected average memory consumption: 4383 .

TL DR

This makes me think that increasing the initial buffer to 4 K will increase the memory consumption by more than two times, speeding up the program by two orders of magnitude.

The bottom line is: try it! It’s not so difficult to write a test that will process millions of lines of different lengths with different initial powers. But I believe that a large buffer may be a good choice.

What is the optimal initial capacity of a StringBuffer for inputs with highly variable lengths?

Number expandCapacity() (time)

Default configuration (16 character capacity)

Starting capacity 4096 characters

Memory consumption

Default configuration (16 character capacity)

Starting capacity 4096 characters

TL DR

More articles:

Number `expandCapacity()` (time)