When calling the exec * () family of functions, should the char * argv elements be unique?

Question

When calling the exec * () family of functions, should the char * argv elements be unique?

I am trying to write a small utility that passes an argument list to an exec'd process, except that some of the incoming arguments are repeated when creating a new process argument list.

Below is a very simplified version of what I'm looking for, which simply duplicates each argument once:

#include <stdlib.h> #include <unistd.h> #define PROG "ls" int main(int argc, char* argv[] ) { int progArgCount = (argc-1)*2; char** execArgv = malloc(sizeof(char*)*(progArgCount+2)); // +2 for PROG and final 0 execArgv[0] = PROG; for (int i = 0; i<progArgCount; ++i) execArgv[i+1] = argv[i/2+1]; execArgv[progArgCount+1] = 0; execvp(PROG, execArgv ); } // end main()

Note that execArgv elements execArgv not unique. In particular, the two elements in each duplication are the same, that is, they point to the same address in memory.

Does standard C know about this usage? Is this incorrect or undefined behavior? If not, is this still impractical since the exec'd program may depend on the uniqueness of its argv elements? Please correct me if I am wrong, but is it not possible for programs to directly modify their argv elements since they are not constants? Would this not pose a risk to the exec'd blithely modification of argv[1] (say) and then access to argv[2] , falsely assuming that these two elements point to independent strings? I'm pretty sure I did it myself a few years ago when I started learning C / C ++, and I don't think at that time it seemed to me that the argv elements might not be unique.

I know that exec'ing means "replacing the process image", but I'm not sure what this entails. I can imagine that this could lead to a deep copy of this argv argument ( execArgv in my example above) to new memory allocations, which will probably lead to its unique identification, but I don’t know enough about the internal functions of exec to say. And that would be wasteful, at least if the original data structure could be preserved during the “replace” operation, so I could doubt that this would happen. And maybe different platforms / implementations behave differently in this regard? Can you answer that?

I tried to find documentation on this, but I managed to find the following: http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html :

The arguments specified by the program with one of the exec functions must be passed to the new image in the corresponding main () arguments.

The above does not clarify whether it is a unique depth of arguments passed to the new process or not.

The argv argument is an array of pointers to null terminated strings. The application must ensure that the last element of this array is a null pointer. These lines should list the arguments available for the new process image. The value in argv [0] should point to the file name associated with the process launched by one of the exec functions.

The same goes for the above.

Argvy argv [] and envp [] pointers and strings for which these array points should not be changed by calling one of the exec functions, except as a result of replacing the process image.

I honestly don’t know how to interpret the above. "Replacing a process image" is the whole point of exec functions! If he is going to change the array or strings, then this will be a “consequence of replacing the process image” in one sense or another. This almost means that the exec functions will change argv . This passage merely reinforces my confusion.

The statement about the constants argv [] and envp [], which are constants, is included to make it clear to future writers of language bindings that these objects are completely constant. Due to the limitations of the ISO C standard, it is not possible to state this idea in the C standard. Defining two levels of const qualification for the argv [] and envp [] parameters for exec functions may seem like a natural choice, given that these functions do not change either the pointer array or characters pointed to by the function, but this will forbid the existing valid code. Instead, only an array of pointers is marked as a constant. The assignment compatibility table for dst = src, derived from the ISO C standard, summarizes compatibility:

It is not clear what the "statement about argv [] and envp [] being constants" means; my leading theory is that it refers to the const qualification of parameters in the prototypes indicated at the top of the documentation page. But since these qualifiers mark only pointers and not char data, it is unlikely to make it explicit "that these objects are completely constant." Secondly, I do not know why the paragraph refers to "language binding writers"; binding to what? How does this relate to the general exec function documentation page? Thirdly, the main idea of the paragraph seems to say that we are delaying the actual content of char strings pointed to by argv elements, as non-const for backward compatibility with the established ISO C and the "existing correct code" that matches it. This is confirmed by the table on the documentation page, which I will not give here. None of this decisively answers my basic questions, although he states quite clearly in the middle of the excerpt that the exec functions themselves do not modify this argv object in any way.

I would be very grateful for the information related to my main questions, as well as for comments on my interpretations and understanding of excerpts from the cited documentation (in particular, if my interpretations are erroneous in any way). Thanks!

+5

c posix exec

bgoldst Dec 14 '17 at 14:17

source share

3 answers

There are many questions in your post, so I will focus only on its most important parts (IMO):

Does standard C know about this usage? Is this incorrect or undefined behavior?

If by C standard you mean POSIX, then you have already found the specification for exec* . If this does not mean that the arguments should be different, then they should not be different.

And as pointed out in the comments of @SomeProgrammerDude, it is very likely that in the case of string literals, odd strings can occur, since the compiler can deduplicate them (for example, execl("foo", "bar", "foo") ).

still impractical since exec'd can depend on the uniqueness of its argv elements?

In the C standard itself, no separate lines are specified in argv , so you cannot rely on their differences.

The above does not clarify whether this is a unique argument depth

We can confidently say that copies must be made in some way, since otherwise it would be possible to change string literals (which is unacceptable).

However, the details of how this is achieved are apparently left as an implementation choice. Therefore, it is best not to rely on any particular behavior.

+3

Oliver Charlesworth Dec 14 '17 at 14:44

source share

Nowhere in the POSIX manual has it been stated that the arguments in argv must be unique. The arguments must be null terminated strings and have a null pointer as the last argument for variative:

The arguments represented by arg0, ... are pointers to character strings with a null character. These lines should list the arguments available for the new process image. The list ends with a null pointer. The argument arg0 must point to a file name string associated with the process launched by one of the exec functions.
The argv argument is an array of pointers to null terminated strings. The application must ensure that the last element of this array is a null pointer. These lines should list the arguments available for the new process image. The value in argv [0] should point to a file name string that is associated with the process launched by one of the exec functions.

And all that POSIX requires. Therefore, there is no explicit requirement that the arguments be unique. Therefore, if the implementation requires that the arguments be unique, this is contrary to the standard. Since standard functions cannot impose unspecified requirements or have effects not specified in the standard.

"Replacing a process image" is the whole point of exec functions! If he is going to change the array or strings, then this will be a “consequence of replacing the process image” in one sense or another. This almost means that the exec functions will change argv.

Modification is allowed only upon successful execution; otherwise, “image replacement” will not occur, and therefore there will be no “consequences”. Essentially, this prevents argv and envp from being envp unusable when exec calls fail in the original process.

exec cannot make a shallow copy, because there is no way to find out about the storage duration of the arguments it gave. Therefore, even the following should be good:

 char *p = "argument"; execvp("cmd", (char *[]){"cmd", p, p + 2, (char*)0});

+3

usr Dec 14 '17 at 15:21

source share

Hs · Accepted Answer · 2017-12-14T16:55:51+0000

Does standard C know about this usage? Is this incorrect or undefined behavior?

No problem if two pointers point to the same memory location. This is not undefined behavior.

If this is not the case, it is still impractical since the exec'd program may depend on the uniqueness of its argv elements?

POSIX standards do not indicate anything about the uniqueness of argv elements.

Please correct me if I am wrong, but is it not possible for programs to directly modify their argv elements, since they are not constants?

From C Standards # 5.1.2.2.1p2

The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.

So the answer is yes, perhaps.

Wouldn't that create the risk that exec'd blithely changed argv [1] (say) and then turned to argv [2] , mistakenly believing that these two elements point to independent lines?

In computing, exec is the functionality of the operating system that runs the executable in the context of an existing process, replacing the previous executable.

So, when the exec family system call is made, the program specified in the argument will be loaded into the address space of the caller and will rewrite the program there. As a result, as soon as the specified program file starts execution, the original program in the address space of the caller will disappear and replaced with a new program and the argv argument list stored in the new replaced address space.

The POSIX standard says:

The number of bytes available for the new process' combined argument and environment lists is {ARG_MAX}. It is implementation-defined whether null terminators, pointers, and/or any alignment bytes are included in this total.

And ARG_MAX :

{ARG_MAX} Maximum length of argument to the exec functions including environment data.

This means that there is some space for the new process arguments, and we can safely assume that the argument lines are copied into this space.

I know that exec'ing means "replacing the process image", but I'm not sure what this entails.

Check this one .

And maybe different platforms / implementations behave differently in this regard? Can you answer that?

Implementation may vary from platform to platform, but all Unix variants must conform to the same POSIX standard for compatibility. Therefore, I believe that the behavior should be the same on all platforms.

When calling the exec * () family of functions, should the char * argv elements be unique?

More articles: