What performance can I expect from std :: fill_n (ptr, n, 0) relative to memset?

For the ptr iterator, which is a pointer, std::fill_n(ptr, n, 0) should do the same as memset(ptr, 0, n * sizeof(*ptr)) (but see @KeithThompson's comment on this answer ).

For a C ++ compiler in C ++ 11 / C ++ 14 / C ++ 17 mode, under what conditions can I expect them to be compiled into the same code? And when / if they do not compile with the same code, is there a significant performance difference with -O0? -O3?

Note. Of course, some / most of the answers may be compiler specific. I'm only interested in one or two specific compilers, but write about a compiler for which you know the answer.

+5
source share
2 answers

The answer depends on the implementation of the standard library.

MSVC, for example, has several implementations of std::fill_n based on the types of what you are trying to fill.

Call std::fill_n with char* or signed char* or unsigned char* , and it will directly call memset to populate the array.

 inline char *_Fill_n(char *_Dest, size_t _Count, char _Val) { // copy char _Val _Count times through [_Dest, ...) _CSTD memset(_Dest, _Val, _Count); return (_Dest + _Count); } 

If you call another type, it fills the loop:

 template<class _OutIt, class _Diff, class _Ty> inline _OutIt _Fill_n(_OutIt _Dest, _Diff _Count, const _Ty& _Val) { // copy _Val _Count times through [_Dest, ...) for (; 0 < _Count; --_Count, (void)++_Dest) *_Dest = _Val; return (_Dest); } 

The best way to determine the overhead on your particular compiler and standard library implementation is with a code profile with both calls.

+5
source

For all scenarios where memset is appropriate (i.e. all your objects are PODs), you will most likely find that these two statements are equivalent when some level of optimization is enabled.

For scenarios where the memset not suitable, the comparison is controversial, since using memset will lead to the wrong program.

You can easily test yourself using tools like godbolt (and many others):

for example, on gcc6.2, these two functions generate literally identical code with an optimization level of -O3:

 #include <algorithm> #include <cstring> __attribute__((noinline)) void test1(int (&x) [100]) { std::fill_n(&x[0], 100, 0); } __attribute__((noinline)) void test2(int (&x) [100]) { std::memset(&x[0], 0, 100 * sizeof(int)); } int main() { int x[100]; test1(x); test2(x); } 

https://godbolt.org/g/JIwI5l

0
source

Source: https://habr.com/ru/post/1261662/


All Articles