I just looked at a simpler example . The table is generated at compile time. Time is probably spent in lambdas generated in std::__detail::__variant::__gen_vtable_impl<...> . For some reason, these lambdas, which are mostly called the visitor, do not skip checking the actual type of the variant.
This function allows the compiler to create code for four different versions of the guest lambda built into lambdas, created deep in std::visit , and stores pointers to these lambdas in a static array:
double test(std::variant<int, double> v1, std::variant<int, double> v2) { return std::visit([](auto a, auto b) -> double { return a + b; }, v1, v2); }
This is created in the test:
(...) ; load variant tags and check for bad variant lea rax, [rcx+rax*2] ; compute index in array mov rdx, rsi mov rsi, rdi lea rdi, [rsp+15] ; index into vtable with rax call [QWORD PTR std::__detail::__variant::(... bla lambda bla ...)::S_vtable[0+rax*8]]
This is generated for the visitor <double, double> :
std::__detail::__variant::__gen_vtable_impl<std::__detail::__variant::_Multi_array<double (*)(test(std::variant<int, double>, std::variant<int, double>)::{lambda(auto:1, auto:2)#1}&&, std::variant<int, double>&, test(std::variant<int, double>, std::variant<int, double>)::{lambda(auto:1, auto:2)#1}&&)>, std::tuple<test(std::variant<int, double>, std::variant<int, double>)::{lambda(auto:1, auto:2)#1}&&, test(std::variant<int, double>, std::variant<int, double>)::{lambda(auto:1, auto:2)#1}&&>, std::integer_sequence<unsigned long, 1ul, 1ul> >::__visit_invoke(test(std::variant<int, double>, std::variant<int, double>)::{lambda(auto:1, auto:2)#1}, test(std::variant<int, double>, std::variant<int, double>)::{lambda(auto:1, auto:2)#1}&&, test(std::variant<int, double>, std::variant<int, double>)::{lambda(auto:1, auto:2)#1}&&): ; whew, that is a long name :-) ; redundant checks are performed whether we are accessing variants of the correct type: cmp BYTE PTR [rdx+8], 1 jne .L15 cmp BYTE PTR [rsi+8], 1 jne .L15 ; the actual computation: movsd xmm0, QWORD PTR [rsi] addsd xmm0, QWORD PTR [rdx] ret
I wonβt be surprised if the profiler attributed both the time for these type checks and the time of your built-in visitors std::__detail::__variant::__gen_vtable_impl<...> instead of giving you the full 800-digit name of the deeply embedded lambda.
The only common optimization potential that I see here is to omit the checks for the bad option in lambda. Since lambdas is called through a function pointer with only the appropriate options, it will be very difficult for the compiler to statically detect that checks are redundant.
I looked at the same example compiled with clang and libC ++ . In libC ++, exceptions of the redundant type are eliminated, therefore libstdC ++ is still not quite optimal.
decltype(auto) std::__1::__variant_detail::__visitation::__base::__dispatcher<1ul, 1ul>::__dispatch<std::__1::__variant_detail::__visitation::__variant::__value_visitor<test(std::__1::variant<int, double>, std::__1::variant<int, double>)::$_0>&&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)0, int, double>&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)0, int, double>&>(std::__1::__variant_detail::__visitation::__variant::__value_visitor<test(std::__1::variant<int, double>, std::__1::variant<int, double>)::$_0>&&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)0, int, double>&, std::__1::__variant_detail::__base<(std::__1::__variant_detail::_Trait)0, int, double>&):
Perhaps you can check what code is actually generated in your software, just in case it doesn't look like what I found with my example.