I use a third-party library that relies on thread_local. This leads to the fact that my program calls __tls_init()several times, even in each iteration of some loops (I did not check them all), despite the fact that the variables thread_localwere unconditionally initialized by another call earlier within the same function (and in fact, near the beginning the entire program).
The first commands in __tls_init()my x86_64:
cmpb $0, %fs:__tls_guard@tpoff
je .L530
ret
.L530:
pushq %rbp
pushq %rbx
subq (some stack space), %rsp
movb $1, %fs:__tls_guard@tpoff
therefore, the first time this is called for each thread, the value in is %fs:__tls_guard@tpoffset to 1, and subsequent calls are immediately returned. But still, this means all the overhead callevery time a variable is executed thread_local, right?
Please note that this is a statically related (actually generated!) Function, therefore the compiler βknowsβ it starts with this condition, and it is quite possible that the flow analysis discovers that it is not necessary to call this function more than once. But this is not so.
Is it possible to get rid of unnecessary instructions, call __tls_initor at least stop the compiler from emitting them in time-critical sections?
An example of a situation from a real compilation: (-O3)
pushq %r13
movq %rdi, %r13
pushq %r12
pushq %rbp
pushq %rbx
movq %rsi, %rbx
subq $88, %rsp
call __tls_init // always gets called
movq (%rbx), %rdi
call <some local function>
movq 8(%rax), %r12
subq (%rax), %r12
movq %rax, %rbp
sarq $4, %r12
cmpq $1, %r12
jbe .L6512
leaq -2(%r12), %rax
movq $0, (%rsp)
leaq 48(%rsp), %rbx
movq %rax, 8(%rsp)
.L6506:
call __tls_init // needless and called potentially very many times!
movq %rsp, %rsi
movq %rsp, %rdi
addq $8, %rbx
call <some other local function>
movq %rax, -8(%rbx)
leaq 80(%rsp), %rax
cmpq %rbx, %rax
jne .L6506 // cycle
Update : the source code of the above is too complicated. Here's the MWE:
void external(int);
struct X {
volatile int a;
X() { a = 5; }
void f() { external(a); }
};
thread_local X x;
void f() {
x.f();
for(int j = 0; j < 10; j++)
x.f();
}
, ( ), fs:__tls_guard@tpoff 0 1, .L4 ( , ), __tls_init - .
g++, CLang (. Compiler Explorer) .
, . ? , . , . , , , ( , MWE, , , - ).