Note that socket operations are not standard C, but standardized by POSIX.1 , also known as IEEE Std. 1003-1. So the posix tag added by OP is important to note.
In particular, IEEE Std. 1003-1 for <sys/socket.h> and socket() require implementations to behave in a very specific way, regardless of whether the C standard declares such implementation behavior or even undefined behavior.
The POSIX.1 definition for getaddrinfo() contains an example program that looks for an IPv4 or IPv6 socket address ( struct sockaddr_in and struct sockaddr_in6 respectively) for UDP. As explained in the definition of <sys/socket.h> , the struct sockaddr_storage type can be used for static storage when the socket type is unknown.
Initially, struct sockaddr used as an opaque type to simplify the socket interface while maintaining minimal type checking. The form shown in the question dates back to the ANSI C era (ISO C89). Thanks to the added pointer rules in later versions of the ISO C standard, the actual structures used by the POSIX.1 implementations are slightly different; struct sockaddr is actually a structure containing the union at this time.
If the socket API used the void pointer, void * , for the socket address structure, there was no type check. With a generic type, developers should point to the structure pointer of its socket to struct sockaddr * in order to avoid a warning (or error, depending on the compiler options used), which, we hope, is enough to avoid gross errors - for example, delivery. a string instead, and wonder why it doesn't work, although the compiler doesn't complain at all.
In general, this approach — using the generic-ish type instead of the specific type — is very useful in many situations in C. It allows you to create data types while maintaining a simple interface, but keep at least some type checking. With well-designed structures, you can do things like common binary tree structures for any data, but only to implement one set of functions (compare, for example, qsort() in C). In this regard, I will show how to define such structures / unions without calling undefined Behavior in standard C.
What is the use of this type cast?
A function that takes a pointer argument has two parameters. If the pointer argument is of type void * , the compiler will happily convert any object pointer to void * without warning or complaint. If we want to accept only certain types of pointers, we need to specify exactly one type.
There are many types of socket addresses, and each type of socket address has its own type of structure. It is impossible to tell the compiler to accept a pointer to one of perhaps a dozen types of structures. Therefore, in this case, the pointer must be discarded or printed as a "general" type struct sockaddr .
Again, this approach does not lead to undefined Behavior in the C standard as a whole, if structures (in particular, the “general” type) are defined by the standard C standard. It is just that the OPs shown are historical, not current, and cannot really be used as is in current C due to strict alias requirements. I will explain later how to do this.
In short, this type-punning is useful when a function accepts pointers to specific types, and you want only those types to be provided. In my opinion, the cast acts as a reminder to the developer that he is using the correct type.
How can I access members of other types?
Well, you can’t.
Thing is, each type of socket address structure has a common field sa_family_t , which is set to a value corresponding to the type of the specified socket address. If you use sockaddr_in , the value is AF_INET ; if you use sockaddr_in6 , the value is AF_INET6 ; if sockaddr_un , the value is AF_UNIX (or AF_LOCAL , which is evaluated with the same value as AF_UNIX ), etc.
You can examine this general field for type determination only. However, you can view it through any type supported by the struct sockaddr type.
For example, if you have struct sockaddr *foo , you can use ((struct sockaddr_storage *)foo)->ss_family (or even ((struct sockaddr_in *)foo)->sin_family ) to examine the type of structure. If this is the type that contains the element you are interested in, you can access it.
For example, to return uint32_t corresponding to an IPv4 address in network byte order (first high byte), you can use
uint32_t ip_address_of(const struct sockaddr *addr, uint32_t none) { if (!addr) return none; if (((const struct sockaddr_storage *)addr)->ss_family == AF_INET) return ((const struct sockaddr_in *)addr)->sin_addr.s_addr; return none; }
The second parameter, none , is returned if the pointer is NULL or a pointer to a non-IPv4 socket address structure. Usually (but not in all cases of use), a value corresponding to broadcast addresses ( 0U or 0xFFFFFFFFU ) can be used.
History reference:
Using the structures shown in the question is not undefined Behavior in ANSI C - the C standard of the era when they were widely used - as 3.5.2.1 says
A pointer to a structure object that is appropriately discarded points to its initial element (or if this element is a bit field, and then to the block in which it is located) and vice versa. Therefore, there may be unnamed holes in the structure object, but not at the very beginning, as necessary, to achieve proper alignment.
and ANSI C has unacceptable punning rules than the later C standards (C99 and C11), which allow backward and immediate casting between pointer types without any problems. In particular, 3.3.4,
However, it is guaranteed that a pointer to an object of a given alignment can be converted to a pointer to an object with the same alignment or less strict alignment and vice versa; The result is compared with the original pointer.
This means that ANSI C has no problem casting a socket address structure pointer to or from struct sockaddr * ; no information is lost in casting.
(There is no problem that different socket address structures may have different alignment requirements. The source element is safe to access in any case, since the pointer to the structure points to the initial element. This is mainly a problem for users who want to support several different types of sockets using the same of the same code, they should use, for example, pooling or dynamic memory allocation for socket address structures.)
In the current era, we need to define structures ( struct sockaddr , to be precise) a little differently, to ensure compatibility with the C standard.
Please note that this means that the following approach is valid even for non-POSIX systems that support the C standard.
First, no changes are required for the individual socket address structures. (This also means that there is no backward compatibility problem.) For example, in the GNU C library, struct sockaddr_in and struct sockaddr_in6 are essentially defined as
struct sockaddr_in { sa_family_t sin_family; in_port_t sin_port; struct in_addr sin_addr; }; struct sockaddr_in6 { sa_family_t sin6_family; in_port_t sin6_port; uint32_t sin6_flowinfo; struct in6_addr sin6_addr; uint32_t sin6_scope_id; };
The only important change that needs to be made is that struct sockaddr must contain one union (preferably an anonymous union for simplicity, but it requires C11 support, or at least anonymous union from the C compiler used, and not many support the current standard C completely back in 2016):
struct sockaddr { union { struct sockaddr_in sa_in; struct sockaddr_in6 sa_in6; } u; };
The POSIX.1 socket interface described above works in standard C (from ANSI C or ISO C89 to C99 to C11).
You see that ANSI C 3.3.2.3 says that "if the union contains several structures that have a common initial sequence, and if the union object currently contains one of these structures, it is allowed to check the common initial part of any of them" with later standards adding “wherever the declaration of the completed type of union is visible”. Standards continue: "Two structures have a common initial sequence if the corresponding members have compatible types for a sequence of one or more initial elements."
Above, the members sin_family and sin6_family (of type sa_family_t ) are such a common initial part and can be checked using any of the members in struct sockaddr .
ANSI C 3.5.2.1 says that "A pointer to a merged object, appropriately cast, points to each member, [..] and vice versa." Later versions of the C standard have the same (or fairly similar) language.
This means that if you have an interface that can use a pointer to any of the types of struct sockaddr_ , you can instead use struct sockaddr * as a "generic pointer". If you have, say struct sockaddr *sa , you can use sa->u.sa_in.sin_family or sa->u.sa_in6.sin6_family to access the normal original member (which indicates the type of socket address in question). Since struct sockaddr is a union (or rather, because it is a structure with a union as its initial element), you can also use ((struct sockaddr_in *)sa)->sin_family or ((struct sockaddr_in6 *)sa)->sin6_family for access to the type of family. Since the family is a common initial member, you can do this using any type; just remember that other members are only available if the family matches the type to which the members belong.
For current C, you can make the union anonymous (dropping the u name closer to the end), in which case sa->sa_in.sin_family or sa->sa_in6.sin_family would be sa->sa_in6.sin_family .
As for how this union-based struct sockaddr works on the other hand, consider a possible implementation of bind() :
int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen) { if (sockfd == -1) { errno = EBADF; return -1; } if (addr == NULL || addrlen == 0) { errno = EINVAL; return -1; } switch (addr->u.sin_family) { case AF_INET: if (addrlen != sizeof (struct sockaddr_in)) { errno = EINVAL; return -1; } return bind_inet(sockfd, (struct sockaddr_in *)addr); case AF_INET6: if (addrlen != sizeof (struct sockaddr_in6)) { errno = EINVAL; return -1; } return bind_inet6(sockfd, (struct sockaddr_in6 *)addr); default: errno = EINVAL; return -1; } }
Conjugate socket type binding calls can be written equivalently as
return bind_inet(sockfd, &(addr->u.sa_in));
and
return bind_inet6(sockfd, &(addr->u.sa_in6));
i.e. taking the address of a union member instead of simply sketching a pointer to the entire union.
When developing your own multi-subtypes, there are four things that you need to remember in order to remain standard C-compatible:
Use a union type containing all subtypes as members as a "generic" type.
A union contains only one subtype at a time; the one that was assigned to him.
If necessary, add a subtype to access the type (and possibly other members that are common to all subtypes) with a simple name, and use it consistently in the documentation.
Always check the item that matches the actual type first.
For example, if you create an abstract binary tree of some kind - perhaps a calculator? - with different types of data stored in each node, you can use
typedef struct node node; typedef enum { DATA_NONE = 0, DATA_LONG, DATA_DOUBLE, } node_data; struct node_minimal { node *left; node *right; node_data data; }; struct node_long { node *left; node *right; node_data data; long value; }; struct node_double { node *left; node *right; node_data data; double value; }; struct node { union { struct node_minimal of; struct node_long long_data; struct node_double double_data; } type; };
To cross such a tree, recursively, you can use, for example,
int node_traverse(const node *root, int (*preorder)(const node *, void *), int (*inorder)(const node *, void *), int (*postorder)(const node *, void *), void *custom) { int retval; if (!root) return 0; if (preorder) { retval = preorder(root, custom); if (retval) return retval; } if (root->type.of.left) { retval = node_traverse(root->type.of.left, preorder, inorder, postorder, custom); if (retval) return retval; } if (inorder) { retval = inorder(root, custom); if (retval) return retval; } if (root->type.of.right) { retval = node_traverse(root->type.of.right, preorder, inorder, postorder, custom); if (retval) return retval; } if (postorder) { retval = postorder(root, custom); if (retval) return retval; } return 0; }
where you provide a function called by each node one (or more) parameters preorder , inorder , postorder ; custom exists only if you want to provide functions in some context.
Note that with node *root :
root->type refers to the union of all subtypes.
root->type.of refers to a union member of type struct node_minimal ; I named him so as to be playful. The goal is that you use this to access nodes of an unknown type.
root->type.of.data depends only on the type actually used for node, one of the DATA_ enumerations.
root->type.of.left and root->type.of.right also available regardless of the type of node and are used when you just walk through a tree and don't care about the exact appearance of the nodes.
root->type.long_data refers to a union member of type struct node_long (but you should try to access it if root->type.of.data == DATA_LONG ). Therefore, root->type.long_data.value is a member of long value a struct node_long .
root->type.double_data refers to a union member of type struct node_double (but you should try to access it if root->type.of.data == DATA_DOUBLE ). Therefore, root->type.double_data.value is a member of double value a struct node_long .
root->type.of.data == root->type.long_data.data == root->type.double_data.data , root->type.of.left == root->type.long_data.left == root->type.double_data.left and root->type.of.right == root->type.long_data.right == root->type.double_data.right , because these are all normal source elements, and are explicitly allowed in C access their value through any of the types in the union.
Note that the bypass function above is just an example; he uses a lot of stack for deep trees and does not even try to detect loops. Thus, there are many improvements that can make it an "accessible library."
To print the value of node, you can use, for example,
int node_print(const node *n, void *out) { if (!out || !n) { errno = EINVAL; return -1; } if (n->type.of.data == DATA_DOUBLE) return fprintf((FILE *)out, "%.16g", n->type.double_data.value); if (n->type.of.data == DATA_LONG) return fprintf((FILE *)out, "%lu", n->type.long_data.value); errno = 0; return -1; }
which is designed to work with the tree traversal function. You can print the values of the tree tree in order (from left to right) to standard output using
node_traverse(tree, NULL, node_print, NULL, stdout);
I hope the above example shows you enough to give you ideas, but also enough to be careful and think a lot about the interface you are developing.
If you think (many of them) that I am incorrect in my reading of the C standard, please indicate a section that, in your opinion, contradicts the above. My point of view should not be popular, but I want to be corrected when I am wrong.
Note: rewritten on 2016-17-11.