How are values ​​returned from a function?

I recently had a serious error, and I forgot to return the value to the function. The problem was that although nothing was returned, it worked perfectly under Linux / Windows and only crashed under Mac. I found an error when I turned on all compiler warnings.

So here is a simple example:

#include <iostream> class A{ public: A(int p1, int p2, int p3): v1(p1), v2(p2), v3(p3) { } int v1; int v2; int v3; }; A* getA(){ A* p = new A(1,2,3); // return p; } int main(){ A* a = getA(); std::cerr << "A: v1=" << a->v1 << " v2=" << a->v2 << " v3=" << a->v3 << std::endl; return 0; } 

My question is, how can this work under Linux / Windows without crashes? How are return values ​​performed at a lower level?

+6
source share
8 answers

In Intel architecture, primes (integers and pointers) are usually returned in the eax register. This register (in particular) is also used as temporary storage when moving values ​​in memory and as an operand during calculations. Thus, any value remaining in this register is considered as a return value, and in your case it turned out to be exactly what you would like to return.

+7
source

Probably luck, "a" is left in the register, which is used to return the results of a single pointer, something like this.

The outputs / conventions and results of the function result depend on the architecture, so it is not surprising that your code works on Windows / Linux, but not on Mac.

+3
source

First of all, you need to slightly modify your example so that it can be compiled. A function must have at least an execution path that returns a value.

 A* getA(){ if(false) return NULL; A* p = new A(1,2,3); // return p; } 

Secondly, this is obviously undefined behavior, which means that everything can happen, but I think this answer will not satisfy you.

Thirdly, in Windows it works in debug mode, but if you compile in Release, it is not.

The Debug section compiles the following:

  A* p = new A(1,2,3); 00021535 push 0Ch 00021537 call operator new (211FEh) 0002153C add esp,4 0002153F mov dword ptr [ebp-0E0h],eax 00021545 mov dword ptr [ebp-4],0 0002154C cmp dword ptr [ebp-0E0h],0 00021553 je getA+7Eh (2156Eh) 00021555 push 3 00021557 push 2 00021559 push 1 0002155B mov ecx,dword ptr [ebp-0E0h] 00021561 call A::A (21271h) 00021566 mov dword ptr [ebp-0F4h],eax 0002156C jmp getA+88h (21578h) 0002156E mov dword ptr [ebp-0F4h],0 00021578 mov eax,dword ptr [ebp-0F4h] 0002157E mov dword ptr [ebp-0ECh],eax 00021584 mov dword ptr [ebp-4],0FFFFFFFFh 0002158B mov ecx,dword ptr [ebp-0ECh] 00021591 mov dword ptr [ebp-14h],ecx 

The second command, calling operator new , goes into eax pointer to the newly created instance.

  A* a = getA(); 0010484E call getA (1012ADh) 00104853 mov dword ptr [a],eax 

The calling context expects eax to contain the return value, but it does not contain the last pointer allocated by new , by the way, p .

So why does this work.

+2
source

There are two main ways the compiler returns a value:

  • Put the value in register before returning and
  • Ask the subscriber to transfer the stack memory block for the return value and write the value to this block [more]

# 1 is usually used with everything that fits into the register; # 2 for everything else (large structures, arrays, etc.).

In your case, the compiler uses # 1 both to return new , and to return your function . On Linux and Windows, the compiler did not perform any case-distortion operations with a return value between writing to the pointer variable and returning from your function; on a Mac, it was. Therefore, the difference in the results that you see: in the first case, the value on the left in the return register occurred inside inside with the value that you would like to return in any case.

+2
source

As Kerrek SB mentioned, your code has ventured into the undefined behavior area.

Basically, your code is going to compile before assembly. In the assembly, there is no concept of a function that requires a return type; there is simply an expectation. I am most comfortable with MIPS, so I will use MIPS for illustration.

Suppose you have the following code:

 int add(x, y) { return x + y; } 

This will translate to something like:

 add: add $v0, $a0, $a1 #add $a0 and $a1 and store it in $v0 jr $ra #jump back to where ever this code was jumped to from 

To add 5 and 4, the code will be called something like this:

 addi $a0, $0, 5 # 5 is the first param addi $a1, $0, 4 # 4 is the second param jal add # $v0 now contains 9 

Note that unlike C, there is no explicit requirement that $ v0 contain a return value, just an expectation. So what happens if you don't really impose anything in $ v0? Well, $ v0 always has some meaning, so the value will be what was the last.

Note. This post makes some simplifications. In addition, your computer most likely does not work MIPS ... But, hopefully, this example does occur, and if you studied assembly at the university, MIPS may be what you know anyway.

+1
source

The method of returning a value from a function depends on the architecture and type of value. This can be done through registers or through the stack. Typically, in an x86 architecture, a value is returned in the EAX register if it is an integral type: char, int, or pointer. If you do not specify a return value, this value is undefined. It is only your luck that your code sometimes worked correctly.

0
source

Regarding the following statement from n3242 of the draft C ++ standard, clause 6.6.3.2, your example gives undefined behavior:

Flowing off the end of a function is equivalent to returning without cost; this leads to undefined behavior when returning a function value.

The best way to see what is actually happening is to verify the assembly code generated by this compiler in the given architecture. For the following code:

 #pragma warning(default:4716) int foo(int a, int b) { int c = a + b; } int main() { int n = foo(1, 2); } 

... The VS2010 compiler (in debug mode, on a 32-bit Intel machine) generates the following assembly:

 #pragma warning(default:4716) int foo(int a, int b) { 011C1490 push ebp 011C1491 mov ebp,esp 011C1493 sub esp,0CCh 011C1499 push ebx 011C149A push esi 011C149B push edi 011C149C lea edi,[ebp-0CCh] 011C14A2 mov ecx,33h 011C14A7 mov eax,0CCCCCCCCh 011C14AC rep stos dword ptr es:[edi] int c = a + b; 011C14AE mov eax,dword ptr [a] 011C14B1 add eax,dword ptr [b] 011C14B4 mov dword ptr [c],eax } ... int main() { 011C14D0 push ebp 011C14D1 mov ebp,esp 011C14D3 sub esp,0CCh 011C14D9 push ebx 011C14DA push esi 011C14DB push edi 011C14DC lea edi,[ebp-0CCh] 011C14E2 mov ecx,33h 011C14E7 mov eax,0CCCCCCCCh 011C14EC rep stos dword ptr es:[edi] int n = foo(1, 2); 011C14EE push 2 011C14F0 push 1 011C14F2 call foo (11C1122h) 011C14F7 add esp,8 011C14FA mov dword ptr [n],eax } 

The result of the add operation in foo() is stored in the eax register (drive), and its contents are used as the return value of the function, moved to the variable n .

eax used to store the return value (pointer) in the following example:

 #pragma warning(default:4716) int* foo(int a) { int* p = new int(a); } int main() { int* pn = foo(1); if(pn) { int n = *pn; delete pn; } } 

Build Code:

 #pragma warning(default:4716) int* foo(int a) { 000C1520 push ebp 000C1521 mov ebp,esp 000C1523 sub esp,0DCh 000C1529 push ebx 000C152A push esi 000C152B push edi 000C152C lea edi,[ebp-0DCh] 000C1532 mov ecx,37h 000C1537 mov eax,0CCCCCCCCh 000C153C rep stos dword ptr es:[edi] int* p = new int(a); 000C153E push 4 000C1540 call operator new (0C1253h) 000C1545 add esp,4 000C1548 mov dword ptr [ebp-0D4h],eax 000C154E cmp dword ptr [ebp-0D4h],0 000C1555 je foo+50h (0C1570h) 000C1557 mov eax,dword ptr [ebp-0D4h] 000C155D mov ecx,dword ptr [a] 000C1560 mov dword ptr [eax],ecx 000C1562 mov edx,dword ptr [ebp-0D4h] 000C1568 mov dword ptr [ebp-0DCh],edx 000C156E jmp foo+5Ah (0C157Ah) std::operator<<<std::char_traits<char> >: 000C1570 mov dword ptr [ebp-0DCh],0 000C157A mov eax,dword ptr [ebp-0DCh] 000C1580 mov dword ptr [p],eax } ... int main() { 000C1610 push ebp 000C1611 mov ebp,esp 000C1613 sub esp,0E4h 000C1619 push ebx 000C161A push esi 000C161B push edi 000C161C lea edi,[ebp-0E4h] 000C1622 mov ecx,39h 000C1627 mov eax,0CCCCCCCCh 000C162C rep stos dword ptr es:[edi] int* pn = foo(1); 000C162E push 1 000C1630 call foo (0C124Eh) 000C1635 add esp,4 000C1638 mov dword ptr [pn],eax if(pn) 000C163B cmp dword ptr [pn],0 000C163F je main+51h (0C1661h) { int n = *pn; 000C1641 mov eax,dword ptr [pn] 000C1644 mov ecx,dword ptr [eax] 000C1646 mov dword ptr [n],ecx delete pn; 000C1649 mov eax,dword ptr [pn] 000C164C mov dword ptr [ebp-0E0h],eax 000C1652 mov ecx,dword ptr [ebp-0E0h] 000C1658 push ecx 000C1659 call operator delete (0C1249h) 000C165E add esp,4 } } 

The VS2010 compiler issues warning 4716 in both examples. By default, this warning is assigned an error.

0
source

When values ​​from the stack appear in the IBM PC architecture, there is no physical destruction of old values ​​of the data stored there. They simply become unavailable due to the stack, but still remain in the same memory location.

Of course, previous values ​​of this data will be destroyed during the subsequent pushing of new data on the stack.

So, probably, you were just lucky and nothing was added to the stack during the function call and returning the surrounding code.

0
source

Source: https://habr.com/ru/post/910434/


All Articles