Is doubling the capacity of a dynamic array required?

Question

Is doubling the capacity of a dynamic array required?

When creating auto-expanding arrays (e.g. C ++ std :: vector) in C, it is often common (or at least a general tip) to double the size of the array each time it is filled, in order to limit the number of realloc calls, to avoid as much as possible possible copying the entire array.

Eg. we start by allocating a room for 8 elements, insert 8 elements, then allocate space for 16 elements, add another 8 elements, allocate for 32 .. etc.

But realloc should not actually copy the data if it can expand the existing memory allocation. For example, the following code only 1 copies (the original distribution is NULL, so this is not exactly a copy) on my system, even if it calls realloc 10,000 times:

 #include <stdlib.h> #include <stdio.h> int main() { int i; int copies = 0; void *data = NULL; void *ndata; for (i = 0; i < 10000; i++) { ndata = realloc(data, i * sizeof(int)); if (data != ndata) copies++; data = ndata; } printf("%d\n", copies); }

I understand that this example is very clinical - the application in the real world will probably have more memory fragmentation and make more copies, but even if I make a bunch of random allocations before the realloc cycle, it will only be slightly worse with 2-4 copies.

So, is the "doubling" method really necessary? Wouldn't it be better to just call realloc every time an element is added to a dynamic array?

+4

c

Mike pedersen Dec 7 '13 at 23:30

source share

3 answers

Compared to almost any other type of operation, malloc , calloc and especially realloc very expensive. I personally rated 10,000,000 reallocs, and it requires a HUGE amount of time.

Despite the fact that I was performing other operations at the same time (in both tests), I found that I could literally cut HOURS from the runtime using max_size *= 2 instead of max_size += 1 .

+3

ciphermagi Dec 7 '13 at 23:39

source share

Q: "doubles the capacity of the dynamic array"
A: No. You can grow only to the extent that it is necessary. But then you can really copy data many times. This is a classic tradeoff between memory and processor time. A good growth algorithm takes into account what is known about the needs of program data, and also not overdo it with these needs. 2x exponential growth is a happy compromise.

But now to your statement "the following code only 1 copies."

The number of copies with extended memory allocators may not be what OP thinks. Obtaining the same address does not mean that the underlying memory mapping did not do significant work. All activities take place under the hood.

For memory allocations that grow and shrink dramatically over the life of the code, I like growing and shrinking thresholds geometrically spaced from each other.

 const size_t Grow[] = {1, 4, 16, 64, 256, 1024, 4096, ... }; const size_t Shrink[] = {0, 2, 8, 32, 128, 512, 2048, ... };

Using growth thresholds, increasing and contracting during compression, avoid collapse near the border. Sometimes a factor of 1.5 is used instead.

+2

chux Dec 08 '13 at 2:02

source share

Kerrek SB · Accepted Answer · 2013-12-08T00:40:58+0000

You have to step back from your code within a minute and things abstract. What is the cost of growing a dynamic container? Programmers and researchers do not think that "it took 2 ms," but rather from the point of view of asymptotic complexity: what is the cost of growth by one element, since I already have n elements; how does this change as n increases?

If you only ever grew on a constant (or limited) amount, you periodically had to move all the data, and therefore the cost of growth would depend and increase with the size of the container. On the contrary, when you produce a container geometrically, that is, multiply its size by a fixed coefficient, each time it is filled, the expected cost of the insert does not actually depend on the number of elements, that is, a constant.

This, of course, is not always constant, but it is depreciated by constant, which means that if you continue to insert elements, then the average cost per element is constant. From time to time you have to grow and move, but these events become less and less when you insert more and more elements.

I once asked if it makes sense for C ++ distributors to grow in the way realloc does. The answers I received showed that the non-moving behavior of realloc is actually a little red herring when you think asymptotically. In the end, you will no longer be able to grow, and you will have to move, and therefore, to study the asymptotic costs, it doesn’t actually matter if realloc be non-op or not. (Moreover, immovable growth seems to upset moderate, arena-distributed distributors who expect all their distributions to be the same size.)

Is doubling the capacity of a dynamic array required?

More articles: