Is memset () more efficient than looping in C?

Question

Is memset () more efficient than looping in C?

memset is more effective than for a cycle. so if i have

char x[500]; memset(x,0,sizeof(x));

or

 char x[500]; for(int i = 0 ; i < 500 ; i ++) x[i] = 0;

which one is more effective and why? Is there any special instruction in the equipment for initializing the block level.

+30

performance c memset

David Sep 09 '11 at 21:32

source share

7 answers

Well, why don't we take a look at the generated assembly code, full optimization in VS 2010.

 char x[500]; char y[500]; int i; memset(x, 0, sizeof(x) ); 003A1014 push 1F4h 003A1019 lea eax,[ebp-1F8h] 003A101F push 0 003A1021 push eax 003A1022 call memset (3A1844h)

And your loop ...

 char x[500]; char y[500]; int i; for( i = 0; i < 500; ++i ) { x[i] = 0; 00E81014 push 1F4h 00E81019 lea eax,[ebp-1F8h] 00E8101F push 0 00E81021 push eax 00E81022 call memset (0E81844h) /* note that this is *replacing* the loop, not being called once for each iteration. */ }

So, in this compiler, the generated code will be exactly the same. memset is fast, and the compiler is smart enough to know that you are doing the same thing as calling memset once, so it does it for you.

If the compiler actually left the loop as-is, then it will probably be slower as you can set more than one block of byte size at a time (i.e. you can expand your loop a bit. memset will be at least as fast as a naive implementation, such as a loop. Try it in a debug assembly and you will notice that the loop is not replaced.

However, it depends on what the compiler does for you. Looking at the showdown, there is always a good way to know exactly what is going on.

+30

Ed S. Sep 09 '11 at 21:45

source share

It really depends on the compiler and the library. For older compilers or simple compilers, memset can be implemented in the library and will not work better than a custom loop.

For almost all compilers that are worth using, memset is an integral function, and the compiler will generate optimized inline code for it.

Others offered profiling and comparison, but I would not bother. Just use memset. The code is simple and straightforward. Don't worry about it until your tests tell you that this piece of code is a performance hot spot.

+12

Michael 09 Sep 2018-11-11T00:

source share

Answer: "It depends." memset MAY be more efficient or use the for loop internally. I cannot think of a case where a memset will be less effective. In this case, it can turn into a more efficient cycle: the cycle repeats 500 times, each time setting the byte value of array 0. On a 64-bit machine, you can scroll by setting 8 bytes (long long) at a time, which would be almost 8 times faster and just occupied the remaining 4 bytes (500% 8) at the end.

EDIT:

in fact, this is what memset does in glibc:

http://repo.or.cz/w/glibc.git/blob/HEAD:/string/memset.c

As Michael noted, in some cases (when the length of the array is known at compile time), the C compiler can embed memset , eliminating the overhead of calling the function. Glibc also has memset build optimizations for most major platforms, such as amd64:

http://repo.or.cz/w/glibc.git/blob/HEAD:/sysdeps/x86_64/memset.S

+8

Bobby Powers Sep 09 2018-11-11T00:

source share

Good compilers recognize the for loop and replace it with either the optimal sequence or the memset call. They will also replace memset with the optimal sequence when the buffer size is small.

In practice, with the optimizing compiler, the generated code (and therefore performance) will be identical.

+3

Stephen Canon Sep 09 2018-11-11T00:

source share

Agree with the above. It depends. But, for sure, memset is faster or equal to the for loop. If you are unsure of your environment or too lazy to check, take a safe route and go with memset.

+2

beetree Sep 09 '11 at 21:40

source share

 void fill_array(void* array, size_t size_of_item, size_t length, void* value) { uint8_t* bytes = value; uint8_t first_byte = bytes[0]; if (size_of_item == 1) { memset(array, first_byte, length); return; } // size_of_item > 1 here. bool all_bytes_are_identical = true; for (size_t byte_index = 1; byte_index < size_of_item; byte_index++) { if (bytes[byte_index] != first_byte) { all_bytes_are_identical = false; break; } } if (all_bytes_are_identical) { memset(array, first_byte, size_of_item * length); return; } for (size_t index = 0; index < length; index++) { memcpy((uint8_t*)array + size_of_item * index, value, size_of_item); } }

memset more efficient, it does not have to worry about asymmetric values (where all_bytes_are_identical is false ). This way you will look for how to wrap it.

This is my option. It works for both small and large systems.

-one

puchu Nov 23 '18 at 15:51

source share

Diego Sevilla · Accepted Answer · 2011-09-09 21:37

Most likely, memset will be much faster than this cycle. Notice how you process one character at a time, but these functions are so optimized that they specify several bytes at a time, even using MMX and SSE instructions when available.

I think that the paradigmatic example of these optimizations, which usually go unnoticed, is the GNU C strlen library. It would seem that it has at least O (n) performance, but actually has O (n / 4) or O (n / 8) depending on the architecture (yes, I know, in large O () it will be the same, but you actually get the eighth of the time). How? Hard, but nice: strlen .

Is memset () more efficient than looping in C?

More articles: