I am trying to write a small utility that passes an argument list to an exec'd process, except that some of the incoming arguments are repeated when creating a new process argument list.
Below is a very simplified version of what I'm looking for, which simply duplicates each argument once:
#include <stdlib.h> #include <unistd.h> #define PROG "ls" int main(int argc, char* argv[] ) { int progArgCount = (argc-1)*2; char** execArgv = malloc(sizeof(char*)*(progArgCount+2)); // +2 for PROG and final 0 execArgv[0] = PROG; for (int i = 0; i<progArgCount; ++i) execArgv[i+1] = argv[i/2+1]; execArgv[progArgCount+1] = 0; execvp(PROG, execArgv ); } // end main()
Note that execArgv elements execArgv not unique. In particular, the two elements in each duplication are the same, that is, they point to the same address in memory.
Does standard C know about this usage? Is this incorrect or undefined behavior? If not, is this still impractical since the exec'd program may depend on the uniqueness of its argv elements? Please correct me if I am wrong, but is it not possible for programs to directly modify their argv elements since they are not constants? Would this not pose a risk to the exec'd blithely modification of argv[1] (say) and then access to argv[2] , falsely assuming that these two elements point to independent strings? I'm pretty sure I did it myself a few years ago when I started learning C / C ++, and I don't think at that time it seemed to me that the argv elements might not be unique.
I know that exec'ing means "replacing the process image", but I'm not sure what this entails. I can imagine that this could lead to a deep copy of this argv argument ( execArgv in my example above) to new memory allocations, which will probably lead to its unique identification, but I don’t know enough about the internal functions of exec to say. And that would be wasteful, at least if the original data structure could be preserved during the “replace” operation, so I could doubt that this would happen. And maybe different platforms / implementations behave differently in this regard? Can you answer that?
I tried to find documentation on this, but I managed to find the following: http://pubs.opengroup.org/onlinepubs/9699919799/functions/exec.html :
The arguments specified by the program with one of the exec functions must be passed to the new image in the corresponding main () arguments.
The above does not clarify whether it is a unique depth of arguments passed to the new process or not.
The argv argument is an array of pointers to null terminated strings. The application must ensure that the last element of this array is a null pointer. These lines should list the arguments available for the new process image. The value in argv [0] should point to the file name associated with the process launched by one of the exec functions.
The same goes for the above.
Argvy argv [] and envp [] pointers and strings for which these array points should not be changed by calling one of the exec functions, except as a result of replacing the process image.
I honestly don’t know how to interpret the above. "Replacing a process image" is the whole point of exec functions! If he is going to change the array or strings, then this will be a “consequence of replacing the process image” in one sense or another. This almost means that the exec functions will change argv . This passage merely reinforces my confusion.
The statement about the constants argv [] and envp [], which are constants, is included to make it clear to future writers of language bindings that these objects are completely constant. Due to the limitations of the ISO C standard, it is not possible to state this idea in the C standard. Defining two levels of const qualification for the argv [] and envp [] parameters for exec functions may seem like a natural choice, given that these functions do not change either the pointer array or characters pointed to by the function, but this will forbid the existing valid code. Instead, only an array of pointers is marked as a constant. The assignment compatibility table for dst = src, derived from the ISO C standard, summarizes compatibility:
It is not clear what the "statement about argv [] and envp [] being constants" means; my leading theory is that it refers to the const qualification of parameters in the prototypes indicated at the top of the documentation page. But since these qualifiers mark only pointers and not char data, it is unlikely to make it explicit "that these objects are completely constant." Secondly, I do not know why the paragraph refers to "language binding writers"; binding to what? How does this relate to the general exec function documentation page? Thirdly, the main idea of the paragraph seems to say that we are delaying the actual content of char strings pointed to by argv elements, as non-const for backward compatibility with the established ISO C and the "existing correct code" that matches it. This is confirmed by the table on the documentation page, which I will not give here. None of this decisively answers my basic questions, although he states quite clearly in the middle of the excerpt that the exec functions themselves do not modify this argv object in any way.
I would be very grateful for the information related to my main questions, as well as for comments on my interpretations and understanding of excerpts from the cited documentation (in particular, if my interpretations are erroneous in any way). Thanks!