LLVM internal functions

When creating a project with LLVM, some function calls will be replaced by built-in functions. Is the replacement a complete interface (e.g. clang) or LLVM back-end?

Online discussions show that replacing built-in functions is related to optimization options. So does this mean that if there is no optimization option, then there will be no internal replacement? Or, in fact, is there a default replacement for built-in functions that cannot be disabled?

If there is any method to disable all built-in functions, how do I do this?

+6
source share
1 answer

It depends. Internal records written in code are directly transmitted through the interface. Intrinsics, such as llvm.memset, are injected into the code during optimization at the IR level (for example, an external interface or background interface performs this optimization).

Here is an example (pretty dumb):

int main(int argc, char** argv) { int a[8]; for (int i = 0; i != 8; ++i) a[i] = 0; for (int i = 7; i >= 0; --i) a[i] = a[i+1] + argc; return a[0]; } 

Compiled with clang 3.5 (clang -S -emit-llvm), you will get the following IR without any changes:

 ; Function Attrs: nounwind uwtable define i32 @main(i32 %argc, i8** %argv) #0 { %1 = alloca i32, align 4 %2 = alloca i32, align 4 %3 = alloca i8**, align 8 %a = alloca [8 x i32], align 16 %i = alloca i32, align 4 %i1 = alloca i32, align 4 store i32 0, i32* %1 store i32 %argc, i32* %2, align 4 store i8** %argv, i8*** %3, align 8 store i32 0, i32* %i, align 4 br label %4 ; <label>:4 ; preds = %11, %0 %5 = load i32* %i, align 4 %6 = icmp ne i32 %5, 8 br i1 %6, label %7, label %14 ; <label>:7 ; preds = %4 %8 = load i32* %i, align 4 %9 = sext i32 %8 to i64 %10 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 %9 store i32 0, i32* %10, align 4 br label %11 ; <label>:11 ; preds = %7 %12 = load i32* %i, align 4 %13 = add nsw i32 %12, 1 store i32 %13, i32* %i, align 4 br label %4 ; <label>:14 ; preds = %4 store i32 7, i32* %i1, align 4 br label %15 ; <label>:15 ; preds = %29, %14 %16 = load i32* %i1, align 4 %17 = icmp sge i32 %16, 0 br i1 %17, label %18, label %32 ; <label>:18 ; preds = %15 %19 = load i32* %i1, align 4 %20 = add nsw i32 %19, 1 %21 = sext i32 %20 to i64 %22 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 %21 %23 = load i32* %22, align 4 %24 = load i32* %2, align 4 %25 = add nsw i32 %23, %24 %26 = load i32* %i1, align 4 %27 = sext i32 %26 to i64 %28 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 %27 store i32 %25, i32* %28, align 4 br label %29 ; <label>:29 ; preds = %18 %30 = load i32* %i1, align 4 %31 = add nsw i32 %30, -1 store i32 %31, i32* %i1, align 4 br label %15 ; <label>:32 ; preds = %15 %33 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 0 %34 = load i32* %33, align 4 ret i32 %34 } 

Compiled again with clang -emit-llvm -O1 , you will see the following:

 ; Function Attrs: nounwind readnone uwtable define i32 @main(i32 %argc, i8** nocapture readnone %argv) #0 { .preheader: %a = alloca [8 x i32], align 16 %a6 = bitcast [8 x i32]* %a to i8* call void @llvm.memset.p0i8.i64(i8* %a6, i8 0, i64 32, i32 4, i1 false) br label %0 ; <label>:0 ; preds = %.preheader, %0 %indvars.iv = phi i64 [ 7, %.preheader ], [ %indvars.iv.next, %0 ] %1 = add nsw i64 %indvars.iv, 1 %2 = getelementptr inbounds [8 x i32]* %a, i64 0, i64 %1 %3 = load i32* %2, align 4, !tbaa !1 %4 = add nsw i32 %3, %argc %5 = getelementptr inbounds [8 x i32]* %a, i64 0, i64 %indvars.iv store i32 %4, i32* %5, align 4, !tbaa !1 %indvars.iv.next = add nsw i64 %indvars.iv, -1 %6 = trunc i64 %indvars.iv to i32 %7 = icmp sgt i32 %6, 0 br i1 %7, label %0, label %8 ; <label>:8 ; preds = %0 %9 = getelementptr inbounds [8 x i32]* %a, i64 0, i64 0 %10 = load i32* %9, align 16, !tbaa !1 ret i32 %10 } 

The initialization cycle has been replaced by an internal llvm.memset. The inside is free to handle the inside as it wants, but usually llvm.memset is omitted to call the libc library.

To answer your first question: yes, if you do not optimize your code, then you will not receive inside information in your IR.

To prevent embedded code from being embedded in your code, you need to find an optimization pass on your IR and not run it. Here's a related question, how do I know which passes are made on IR: Where can I find the optimization sequence for clang -OX?

for -O1 we get:

prune-eh -inline-cost-allways-inline -functionattrs -sroa -domtree -early-cse -lazy-value-info -jump-threading -correlated-propagation -simplifycfg -instcombine -tailcallelim -simplifycfg -reassociate -domtree -loops - loop-simplify -lcssa -loop-rotate -licm -loop-unswitch -instcombine -scalar-evolution -lcssa -indvars -loop-idiom -loop-deletion -loop-unroll -memdep -memcpyopt -sccp -instcombine -lazy-value- info -jump-threading -correlated-propagation -domtree -memdep -dse -adce -simplifycfg -instcombine -barrier -domtree -loops -loop-simplify -lcssa -branch-prob -block-freq -scalar-evolution -loop-vectorize - instcombine -simplifycfg -strip-dead-prototypes -verify

Wild guess: instcombine introduces llvm.memset. I run passages without instcombine and select a non-optimized IR and get the following:

 ; Function Attrs: nounwind readnone uwtable define i32 @main(i32 %argc, i8** %argv) #0 { %a = alloca [8 x i32], align 16 %1 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 8 %2 = load i32* %1, align 4 %3 = add nsw i32 %2, %argc %4 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 7 store i32 %3, i32* %4, align 4 %5 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 7 %6 = load i32* %5, align 4 %7 = add nsw i32 %6, %argc %8 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 6 store i32 %7, i32* %8, align 4 %9 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 6 %10 = load i32* %9, align 4 %11 = add nsw i32 %10, %argc %12 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 5 store i32 %11, i32* %12, align 4 %13 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 5 %14 = load i32* %13, align 4 %15 = add nsw i32 %14, %argc %16 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 4 store i32 %15, i32* %16, align 4 %17 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 4 %18 = load i32* %17, align 4 %19 = add nsw i32 %18, %argc %20 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 3 store i32 %19, i32* %20, align 4 %21 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 3 %22 = load i32* %21, align 4 %23 = add nsw i32 %22, %argc %24 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 2 store i32 %23, i32* %24, align 4 %25 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 2 %26 = load i32* %25, align 4 %27 = add nsw i32 %26, %argc %28 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 1 store i32 %27, i32* %28, align 4 %29 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 1 %30 = load i32* %29, align 4 %31 = add nsw i32 %30, %argc %32 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 0 store i32 %31, i32* %32, align 4 %33 = getelementptr inbounds [8 x i32]* %a, i32 0, i64 0 %34 = load i32* %33, align 4 ret i32 %34 } 

No instructions. Thus, to prevent (at least memset) internals in your code, do not run instcombine on your IR. However, instcombine is a powerful transition that really shortens the code.

You now have two options:

  • do not use options that enter intrinsics
  • write your own llvm opt pass, which converts the internals back to everything that they can be replaced by starting after optimization and before the background starts working.

Hope this helps you. Hooray!

+12
source

Source: https://habr.com/ru/post/979727/


All Articles