I made a function to calculate the length of a C string (I'm trying to beat the clang optimizer with -O3). I am running macOS.
_string_length1:
push rbp
mov rbp, rsp
xor rax, rax
.body:
cmp byte [rdi], 0
je .exit
inc rdi
inc rax
jmp .body
.exit:
pop rbp
ret
This is the C function I'm trying to defeat:
size_t string_length2(const char *str) {
size_t ret = 0;
while (str[ret]) {
ret++;
}
return ret;
}
And he figured it out:
string_length2:
push rbp
mov rbp, rsp
mov rax, -1
LBB0_1:
cmp byte ptr [rdi + rax + 1], 0
lea rax, [rax + 1]
jne LBB0_1
pop rbp
ret
Each C function sets the stack frame with push rbpand mov rbp, rspand splits it with pop rbp. But I do not use the stack in any way, I only use processor registers. It worked without using the stack frame (when I tested on x86-64), but is this necessary?
source
share