Efficiency: Arrays vs. Pointers

Accessing memory through pointers is considered more efficient than accessing memory through an array. I study C, and the above is mentioned in K & R. In particular, they say

Any operation that can be achieved by signing the text of the array can also be performed using pointers. The pointer version will generally be faster

I shot the following code using Visual C ++. (Mine is the 686 processor. I turned off all optimizations.)

int a[10], *p = a, temp; void foo() { temp = a[0]; temp = *p; } 

To my surprise, I see that accessing memory through a pointer takes 3 commands for two received by accessing memory through an array. Below is the corresponding code.

 ; 5 : temp = a[0]; mov eax, DWORD PTR _a mov DWORD PTR _temp, eax ; 6 : temp = *p; mov eax, DWORD PTR _p mov ecx, DWORD PTR [eax] mov DWORD PTR _temp, ecx 

Please help me understand. What am I missing here?




As pointed out by many answers and comments, I used the compile-time constant as the index of the array, which makes it possibly easier to access through the array. Below is the assembly code with a variable as an index. Now I have an equal number of instructions to access through a pointer and arrays. My broader questions still persist. Accessing memory through a pointer does not provide itself with more efficiency.

 ; 7 : temp = a[i]; mov eax, DWORD PTR _i mov ecx, DWORD PTR _a[eax*4] mov DWORD PTR _temp, ecx ; 8 : ; 9 : ; 10 : temp = *p; mov eax, DWORD PTR _p mov ecx, DWORD PTR [eax] mov DWORD PTR _temp, ecx 
+46
performance c arrays pointers memory-access
Feb 21 2018-10-21
source share
12 answers

Accessing memory through pointers is considered more efficient than accessing memory through an array.

This could be true in the past when compilers were relatively stupid animals. You just need to take a look at some of the gcc code output in high optimization modes to find out that this is no longer the case. Some of this code is very hard to understand, but once you do, its brightness is obvious.

A decent compiler will generate the same code for accessing pointers and accessing an array, and you probably shouldn't worry about this level of performance. The people who write compilers know much more about their target architectures than mere mortals. Focus on the macro level when optimizing your code (choosing an algorithm, etc.) and trust your tools to do their job.




In fact, I am surprised that the compiler did not optimize the whole

 temp = a[0]; 

since temp overwritten on the very next line with a different value and a is in no way marked volatile .

I remember the urban myth a long time ago about the benchmark for the last VAX Fortran compiler (showing my age here), which was several orders of magnitude superior to its competitors.

It turns out that the compiler found out that the result of calculating the standard was not used anywhere, so he optimized the entire calculation cycle in oblivion. Consequently, a significant improvement in speed.




Update. The reason that optimized code is more efficient in your particular case is because of how you find the location. a will be in a fixed location selected at the time of the connection / download, and the link to it will be fixed at the same time. Thus, a[0] or indeed a[any constant] will be in a fixed place.

And p itself will also be in a fixed location for the same reason. But *p (contents of p ) is a variable and therefore will have an additional search to find the correct memory location.

You will probably find that having another variable x set to 0 (not const ), and using a[x] will also lead to additional calculations.




In one of your comments you indicate:

The execution, as you suggested, led to 3 instructions for accessing memory through arrays (selecting an index, selecting an array element, saving to temp). But I still do not see the effectiveness .: - (

My answer to this is that you most likely will not see the effectiveness of using pointers. Modern compilers are more than the task of finding out that array operations and pointer operations can be turned into the same basic machine code.

In fact, without optimization, pointer code may be less efficient. Consider the following translations:

 int *pa, i, a[10]; for (i = 0; i < 10; i++) a[i] = 100; /* movl $0, -16(%ebp) ; this is i, init to 0 L2: cmpl $9, -16(%ebp) ; from 0 to 9 jg L3 movl -16(%ebp), %eax ; load i into register movl $100, -72(%ebp,%eax,4) ; store 100 based on array/i leal -16(%ebp), %eax ; get address of i incl (%eax) ; increment jmp L2 ; and loop L3: */ for (pa = a; pa < a + 10; pa++) *pa = 100; /* leal -72(%ebp), %eax movl %eax, -12(%ebp) ; this is pa, init to &a[0] L5: leal -72(%ebp), %eax addl $40, %eax cmpl -12(%ebp), %eax ; is pa at &(a[10]) jbe L6 ; yes, stop movl -12(%ebp), %eax ; get pa movl $100, (%eax) ; store 100 leal -12(%ebp), %eax ; get pa addl $4, (%eax) ; add 4 (sizeof int) jmp L5 ; loop around L6: */ 

In this example, you can see that the pointer example is longer and redundant. It loads pa into %eax several times, without changing it or changing %eax between pa and &(a[10]) . The default optimization is mostly missing here.

When you switch to optimization level 2, you get the code:

  xorl %eax, %eax L5: movl $100, %edx movl %edx, -56(%ebp,%eax,4) incl %eax cmpl $9, %eax jle L5 

for the array version and:

  leal -56(%ebp), %eax leal -16(%ebp), %edx jmp L14 L16: movl $100, (%eax) addl $4, %eax L14: cmpl %eax, %edx ja L16 

for the pointer version.

I am not going to analyze the clock cycles here (since it works too much, and I'm mostly lazy), but I will point out one thing. There is not much difference in the code for both versions in terms of assembler instructions, and given the speeds that modern processors work with, you will not notice the difference if you do not perform billions of these operations. I always prefer to write code for readability and only worry about performance if this becomes a problem.

As an aside, this statement you refer to:

5.3 Pointers and arrays: the version of the pointer will generally be faster, but at least to the uninitiated, it’s a little harder to understand right away.

refers to the earliest versions of K & R, including my ancient 1978th, where functions were still written:

 getint(pn) int *pn; { ... } 

Compilers have come a very long way since.

+61
Feb 21 '10 at 12:01
source share

In the first case, the compiler directly knows the address of the array (which is also the address of the first element) and accesses it. In the second case, it knows the address of the pointer and reads the value of the pointer, which points to this memory location. This is actually one additional indirectness, so it is supposedly lower.

+8
Feb 21 '10 at 12:00
source share

Speed ​​is achieved in a cycle, above all. When you use an array, you will use a counter that you increment. To calculate the position, the system multiplies this counter by the size of the array element, and then adds the address of the first element to obtain the address. With pointers, all you have to do to move to the next element is to increase the current pointer with the size of the element to get the next, assuming all the elements are next to each other in memory.

Pointer arithmetic, therefore, takes a little less computation when executing loops. In addition, having pointers to the right element is faster than using an index inside an array.

Modern development is gradually getting rid of many operations with pointers. Processors are getting faster and faster, and arrays are easier to manage than pointers. In addition, arrays tend to reduce the number of errors in the code. The array will allow you to check indexes, making sure that you are not accessing data outside the array.

+7
Feb 21 '10 at 12:19
source share

If you program embedded platforms, you will quickly find out that the pointer method is much faster than using the index.

 struct bar a[10], *p; void foo() { int i; // slow loop for (i = 0; i < 10; ++i) printf( a[i].value); // faster loop for (p = a; p < &a[10]; ++p) printf( p->value); } 

The slow cycle should each time calculate a + (i * sizeof (struct bar)), while the second just needs to add sizeof (struct bar) to p each time. The multiply operation uses more clock cycles than adding to many processors.

You really start to see improvements if you refer to [i] several times within the loop. Some compilers do not cache this address, so it can be recounted several times inside the loop.

Try updating the sample to use the structure and link to multiple elements.

+7
Feb 21 2018-10-21
source share

Pointers naturally express simple induction variables, while indexes essentially require more complex compiler optimizations




In many cases, just using a pronounced subscript requires adding an extra layer to the problem. The cycle that increases the index i can be, although it is a state machine, and the expression a [i] technically requires each time it is used to multiply I by the size of each element and add it to the base address.

To convert this access pattern to using pointers, the compiler must analyze the entire loop and determine what, say, each element is accessed. Then the compiler can replace several instances of multiplying the index by the size of the element by simply incrementing the previous value of the loop. This process combines optimizations called eliminating common subexpression and reducing the strength of variable induction .

When writing with pointers, the entire optimization process is not required, because the programmer usually just dials an array to begin with.

Sometimes the compiler can perform optimizations, and sometimes it cannot. In recent years, the presence of a complex compiler has been more common, so code with a pointer is not always faster .

Because attributes usually need to be contiguous, another advantage for pointers is the creation of additionally distributed composite structures.

+7
Feb 22 2018-10-22T00
source share

As paxdiablo said, any new compiler will make them very similar.

Moreover, I saw situations when the array was faster than pointers. It was on a DSP processor that uses vector operations.

In this case, using arrays was like using restrict pointers. Because, using two arrays, the compiler - implicitly - knows that they do not point to the same location. But if you are dealing with 2 pointers, the compiler might think that they point to the same location and skip the pipe lining.

eg:

 int a[10],b[10],c[10]; int *pa=a, *pb=b, *pc=c; int i; // fill a and b. fill_arrays(a,b); // set c[i] = a[i]+b[i]; for (i = 0; i<10; i++) { c[i] = a[i] + b[i]; } // set *pc++ = *pa++ + *pb++; for (i = 0; i<10; i++) { *pc++ = *pa++ + *pb++; } 

In case 1, the compiler will easily align pipes with the addition of a and b and store the value of c.

In case 2, the compiler will not be connected to the pipeline, because it can overwrite a or b, while maintaining C.

+6
Feb 21 '10 at 12:21
source share

This is a very old question that has been answered, so I do not need to answer! However, I did not notice a simple answer, so I provide it.

ANSWER: indirect access (pointer / array) "can" add one additional instruction to load the (base) address, but all calls after that (elements in the case of an array / members in the case of a pointer to a structure) should be just one instruction, because it just adding an offset to the (base) address that is already loaded. Thus, it will be as good as direct access. Thus, in most cases, access through an array / pointer is equivalent, and access to elements is as good as direct access to a variable.

Ex. if I have an array (or pointer) with 10 elements or a structure with 10 members (access through the pointer to the structure) and I access the element / member, one possible additional instruction is required only once at the beginning. After that, all calls to the element / member should be only one instruction.

+3
Aug 02
source share

You get good answers to your question here, but as you study, it is worth noting that effectiveness at this level is rarely felt.

When you tune the program for maximum performance, you should pay at least as much attention to finding and fixing big problems in the structure of the program. Once they have been fixed, low-level optimization can make more sense.

Here is an example of how this can be done.

+2
Feb 21 2018-10-21
source share

Pointers earlier than arrays. Of course, when the C language was developed, pointers were pretty fast. But these days, optimizers can usually better optimize their work with arrays than with pointers, because arrays are more limited.

The instruction sets of modern processors have also been developed to optimize array access.

So, the bottom line is that today arrays are often faster, especially when used in a loop with index variables.

Of course, you still want to use pointers for things like linked lists, but the old time optimization as the pointer goes through the array, rather than using an index variable, is likely to be a preventive measure.

+2
Feb 21 '10 at 21:37
source share

“The pointer version will generally be faster” means that in most cases it is easier for the compiler to generate more efficient code with a pointer (which just needs to be dereferenced) than having an array and an index (which means that the compiler needs to transfer the address from the beginning of the array). However, with modern processors and compiler optimization, access to the array is typically no slower than access to the pointer.

In particular, in your case, you will need to enable optimization in order to get the same result.

+1
Feb 21 2018-10-21
source share

Since 0 is defined as a constant, a [0] is also a constant, and the compiler knows where it is at compile time. In the "normal" case, the compiler would have to calculate the address of the element from the base + offset (the offset is scaled according to the size of the element).

OTOH, p is a variable, and indirection requires additional movement.

In general, an array index is internally treated as pointer arithmetic, so I'm not sure what I see what K & R was trying to do.

+1
Feb 21 '10 at 12:05
source share

Since most people have already given detailed answers, I will just give an intuitive example. If you use an array and pointer on a larger scale, the efficiency of using the pointer will be more significant. For example, if you want to sort a large, long data set by sorting it into several subsets, then combine them.

long int * testData = calloc(N, sizeof(long int));

For daily 8G ram machines in 2017, we can set N to 400000000, which means that you will use approximately 1.5 GB of memory for this initial dataset. And if you use MPI , you can quickly separate your data using

 MPI_Scatterv(testData, partitionLength, partitionIndex, MPI_LONG, MPI_IN_PLACE, N/number_of_thread, MPI_LONG, 0, MPI_COMM_WORLD); 

You can simply treat paritionLength as a pointer that stores N/number_of_thread as a length for each identical piece and treats partitionIndex as a pointer that stores N / number_of_threads looking at the index differently. Suppose you have a 4-core processor, and you only separate your work in 4 threads. MPI will certainly do the job in the short sense of links. But if you are using an array, this procedure should start the pointer arithmetic on the array in order to first find the section point. This is not as straightforward as a pointer. In addition, when you combine a partitioned dataset, you can use the K-way merge to speed it up. You need temporary space to store four sorted data sets. Here, if you use a pointer, you only need to save 4 addresses. However, if you use an array, it will store 4 whole auxiliary arrays, which is inefficient. Sometimes, if you do not use MPI_Barrier to make sure your program is thread safe, MPI may even complain about the poor execution of your memory. I got a 32G machine for sorting 400,000,000 long values ​​in 8 threads using the array method and the pointer method, I got 11.054980s and 13.182739s respectively. And if I increase the size to 1,000,000,000, my sorting program will not succeed if I use an array. Therefore, many people use pointers for every data structure except scalars in C.

0
May 03 '17 at 10:55 pm
source share



All Articles