Why does it help to assign const and scalar value to a constant before the loop?

In GCC 5.4.0 stl_algobase.h we have:

  template<typename _ForwardIterator, typename _Tp> inline typename __gnu_cxx::__enable_if<!__is_scalar<_Tp>::__value, void>::__type __fill_a(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) { for (; __first != __last; ++__first) *__first = __value; } template<typename _ForwardIterator, typename _Tp> inline typename __gnu_cxx::__enable_if<__is_scalar<_Tp>::__value, void>::__type __fill_a(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) { const _Tp __tmp = __value; for (; __first != __last; ++__first) *__first = __tmp; } 

I don’t understand why the option for scalars has some advantage over the general option. I mean, they will not be compiled into the same thing? Loading __value from the stack into a register and using this register during the loop?

+5
source share
1 answer

This happened in SVN rev 83645 (git commit 8ba26e53) in 2004, when both variants are __fill_a , where they are implemented as auxiliary structures:

 template<typename> struct __fill { template<typename _ForwardIterator, typename _Tp> static void fill(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) { for (; __first != __last; ++__first) *__first = __value; } }; template<> struct __fill<__true_type> { template<typename _ForwardIterator, typename _Tp> static void fill(_ForwardIterator __first, _ForwardIterator __last, const _Tp& __value) { const _Tp __tmp = __value; for (; __first != __last; ++__first) *__first = __tmp; } }; 

The documentation on this topic is sparse, but the original commit by Dan Nikolaescu and Paolo Carlini contains a hint in the commit message:

  • include / bits / stl_algobase.h (__ fill, __fill_n): new helpers for fill and fill_n respectively: when copying is cheap, use a temporary one to avoid reading memory at each iteration.

Given that they are / supporting the standard library, I think they knew what they were doing: they solve the problem that links are usually implemented as pointers . In the end, it's just a new alias for an existing memory location. That is why there were originally two options. Note that __true_type was defined in the fill call:

  typedef typename __type_traits<_Tp>::has_trivial_copy_constructor _Trivial; std::__fill<_Trivial>::fill(__first, __last, __value); 

With std::enable_if , or rather, its GCC variant, Carlini removed these helpers and replaced them with the version that you already provided. The logic is still preserved: for scalar types, you want to have a local value. If your range is in a different memory area than your value and spans several pages and spills your L1 cache, you do not want the cache bit to be locked for this value. And this is trivial with a local variable.

However, semantics are important. std::fill generates exactly std::distance(first, last) copies. With scalar values, we know that an extra copy will not have a side effect. With custom types? Well, we do not know. And why can't you use the option const auto tmp = __value; In the first case.

And that's why you end up with two, well, actually three options:

  • for scalar values, you can save the close value and help the optimizer
  • one for byte values ​​where you can use memset
  • one for all other types where you cannot interfere with semantics.
+4
source

Source: https://habr.com/ru/post/1258251/


All Articles