Optimization of std :: visit possible?

When using std::visit / std::variant I see in the profiler output that the functions std::__detail::__variant::__gen_vtable_impl take more time.

I did a test like this:

 // 3 class families, all like this class ElementDerivedN: public ElementBase { ... std::variant<ElementDerived1*, ElementDerived2*,... > GetVariant() override { return this; } } std::vector<Element*> elements; std::vector<Visitor*> visitors; std::vector<Third*> thirds; // prepare a hack to get dynamic function object: template<class... Ts> struct funcs : Ts... { using Ts::operator()...; }; template<class... Ts> funcs(Ts...) -> funcs<Ts...>; // demo functions: struct Actions { template < typename R, typename S, typename T> void operator()( R*, S*, T* ) {} }; struct SpecialActionForElement1{ template < typename S, typename T > void operator()( Element1*, S*, T* ) {} }; for ( auto el: elements ) { for ( auto vis: visitors ) { for ( auto th: thirds ) { std::visit( funcs{ Actions(), SpecialActionForElement1Derived1()}, el->GetVariant(), vis->GetVariant(), th->GetVariant() ); } } } 

As said, std::__detail::__variant::__gen_vtable_impl<...> takes the longest amount of time.

Q: Since the generated array of n-dimensional functions generated during each call to the visit is a call to call the same, it would be nice to save it between calls to std::visit . Is it possible?

Maybe I'm on the wrong track, if so, let me know!

EDIT: The gcc7.3 compiler used from the standard fedora installation. std-lib is used as standard in g ++ (whatever that is)

build options:

 g++ --std=c++17 -fno-rtti main.cpp -O3 -g -o go 
+5
source share
1 answer

I just looked at a simpler example . The table is generated at compile time. Time is probably spent in lambdas generated in std::__detail::__variant::__gen_vtable_impl<...> . For some reason, these lambdas, which are mostly called the visitor, do not skip checking the actual type of the variant.

This function allows the compiler to create code for four different versions of the guest lambda built into lambdas, created deep in std::visit , and stores pointers to these lambdas in a static array:

 double test(std::variant<int, double> v1, std::variant<int, double> v2) { return std::visit([](auto a, auto b) -> double { return a + b; }, v1, v2); } 

This is created in the test:

  (...) ; load variant tags and check for bad variant lea rax, [rcx+rax*2] ; compute index in array mov rdx, rsi mov rsi, rdi lea rdi, [rsp+15] ; index into vtable with rax call [QWORD PTR std::__detail::__variant::(... bla lambda bla ...)::S_vtable[0+rax*8]] 

This is generated for the visitor <double, double> :

 std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<double (*)(test(std::variant<int, double>, std::variant<int, double>)::{lambda(auto:1, auto:2)#1}&&, std::variant<int, double>&, test(std::variant<int, double>, std::variant<int, double>)::{lambda(auto:1, auto:2)#1}&&)>, std::tuple<test(std::variant<int, double>, std::variant<int, double>)::{lambda(auto:1, auto:2)#1}&&, test(std::variant<int, double>, std::variant<int, double>)::{lambda(auto:1, auto:2)#1}&&>, std::integer_sequence<unsigned long, 1ul, 1ul> >::__visit_invoke(test(std::variant<int, double>, std::variant<int, double>)::{lambda(auto:1, auto:2)#1}, test(std::variant<int, double>, std::variant<int, double>)::{lambda(auto:1, auto:2)#1}&&, test(std::variant<int, double>, std::variant<int, double>)::{lambda(auto:1, auto:2)#1}&&): ; whew, that is a long name :-) ; redundant checks are performed whether we are accessing variants of the correct type: cmp BYTE PTR [rdx+8], 1 jne .L15 cmp BYTE PTR [rsi+8], 1 jne .L15 ; the actual computation: movsd xmm0, QWORD PTR [rsi] addsd xmm0, QWORD PTR [rdx] ret 

I won’t be surprised if the profiler attributed both the time for these type checks and the time of your built-in visitors std::__detail::__variant::__gen_vtable_impl<...> instead of giving you the full 800-digit name of the deeply embedded lambda.

The only common optimization potential that I see here is to omit the checks for the bad option in lambda. Since lambdas is called through a function pointer with only the appropriate options, it will be very difficult for the compiler to statically detect that checks are redundant.

I looked at the same example compiled with clang and libC ++ . In libC ++, exceptions of the redundant type are eliminated, therefore libstdC ++ is still not quite optimal.

 decltype(auto) std::__1::__variant_detail::__visitation::__base::__dispatcher<1ul, 1ul>::__dispatch<std::__1::__variant_detail::__visitation::__variant::__value_visitor<test(std::__1::variant<int, double>, std::__1::variant<int, double>)::$_0>&&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)0, int, double>&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)0, int, double>&>(std::__1::__variant_detail::__visitation::__variant::__value_visitor<test(std::__1::variant<int, double>, std::__1::variant<int, double>)::$_0>&&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)0, int, double>&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)0, int, double>&): # @"decltype(auto) std::__1::__variant_detail::__visitation::__base::__dispatcher<1ul, 1ul>::__dispatch<std::__1::__variant_detail::__visitation::__variant::__value_visitor<test(std::__1::variant<int, double>, std::__1::variant<int, double>)::$_0>&&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)0, int, double>&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)0, int, double>&>(std::__1::__variant_detail::__visitation::__variant::__value_visitor<test(std::__1::variant<int, double>, std::__1::variant<int, double>)::$_0>&&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)0, int, double>&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)0, int, double>&)" ; no redundant check here movsd xmm0, qword ptr [rsi] # xmm0 = mem[0],zero addsd xmm0, qword ptr [rdx] ret 

Perhaps you can check what code is actually generated in your software, just in case it doesn't look like what I found with my example.

+1
source

Source: https://habr.com/ru/post/1275748/


All Articles