Typical common data structures in regular C?

I did a lot more C ++ programming than programming "plain old C". One thing that I really miss when programming in simple C is the typical data structures that are provided in C ++ using templates.

For concreteness, consider a general list linked separately. In C ++, just define your own template class and then create it for the types you need.

In C, I can come up with several ways to implement a common sibling list:

  • Write the type of the linked list and supporting procedures once, using void pointers to get around the type system.
  • Record preprocessor macros with the required type names, etc., to generate a version of the data structure and supporting type procedures.
  • Use the more sophisticated stand-alone tool to generate code for the types you need.

I do not like option 1, as it undermines the type system and is likely to have worse performance than a specialized type implementation. Using a uniform representation of the data structure for all types and dropping void pointers to / from, as far as I can see, requires the indirectness that an implementation specialized for the element type will avoid.

Option 2 does not require any additional tools, but it feels somewhat awkward and may give bad compiler errors if used improperly.

Option 3 can give better compiler error messages than option 2, since the specialized code for the data structure will be in an extended form that can be opened in the editor and checked by the programmer (unlike the code created by the preprocessor macros). However, this option is the most heavyweight, a kind of "bad template." I used this approach before, using a simple sed script to specialize the "templated" version of some C code.

I would like to program my future "low-level" projects in C rather than C ++, but was frightened by the thought of rewriting the general data structures for each particular type.

What is people's experience with this problem? Are there good libraries of common data structures and algorithms in C that don't go with Option 1 (i.e. casting to and from void pointers that sacrifices type safety and adds a level of indirection)?

+46
c generics data-structures code-generation
Jun 14 '10 at 17:44
source share
10 answers

Option 1 is the approach used by most C implementations of shared containers that I see. The set of drivers for Windows and the Linux kernel use a macro to allow binding of containers anywhere in the structure using the macro used to get the structure pointer from the pointer to the link field:

Option 2 is the binding performed by BSD tree.h and the implementation of the queue.h container:

I do not think that I will consider any of these approaches of type safe. Useful but not safe.

+18
Jun 14 '10 at 18:21
source share

C has a different beauty for it than C ++, and is gaining security and can always see that everything happens when tracing through code without involving drops in your debugger is usually not one of them.

C beauty comes from the lack of type security, work around the type system and the raw level of bits and bytes. Because of this, there are some things that it can do more easily without fighting the language, for example, variable-length structures, using the stack even for arrays whose sizes are determined at runtime, etc. It is also, as a rule, much easier to keep the ABI when you work at this lower level.

So, there is another aesthetics, as well as various problems, and I would recommend changing your mindset when you work on C. To really appreciate this, I would suggest doing what many take for granted for example, for example, to implement your own dispenser memory or device driver. When you work at such a low level, you cannot help but look at everything as a mock memory of bits and bytes, unlike "objects" with attached behavior. In addition, in such a low-level bit / byte control code, a point may appear where C becomes easier to understand than C ++ code heaped with reinterpret_casts , for example.

As for the linked list example, I would suggest a non-intrusive version of the linked node (one that does not require storing the list pointers in the element type, T itself, allowing the logic of the linked list and the view to be separated from T itself), for example:

 struct ListNode { struct ListNode* prev; struct ListNode* next; MAX_ALIGN char element[1]; // Watch out for alignment here. // see your compiler specific info on // aligning data members. }; 

Now we can create a node list as follows:

 struct ListNode* list_new_node(int element_size) { // Watch out for alignment here. return malloc_max_aligned(sizeof(struct ListNode) + element_size - 1); } // create a list node for 'struct Foo' void foo_init(struct Foo*); struct ListNode* foo_node = list_new_node(sizeof(struct Foo)); foo_init(foo_node->element); 

To extract an item from a list as T *:

 T* element = list_node->element; 

Since this is C, there is no type check when pointing to the pan in this way, and this will probably also make you feel uneasy if you come from C ++ background.

The hard part here is to make sure that this member element correctly aligned for whatever type you want to keep. When you can solve this problem as mobile as possible, you will have a powerful solution for creating efficient memory layouts and allocators. Often this means that you simply use maximum alignment for everything that may seem wasteful, but usually it is not if you use appropriate data structures and allocators that do not pay this overhead for many small items on an individual basis.

Now this solution is still related to type casting. There is little that can be done if you have a separate version of the code for this list node and the corresponding logic for working with it for each type of T that you want to support (with the exception of dynamic polymorphism). However, it does not require an additional level of indirection, as you might think, and still selects the entire node list and element in one distribution.

And I would recommend this easy way to achieve versatility in C in many cases. Just replace T with a buffer whose length matches sizeof(T) and is correctly aligned. If you have a sufficiently portable and safe way that you can generalize to ensure proper alignment, you will have a very powerful way of working with memory, which often improves cache hits, reduces the frequency of heap allocation / deallocation, the amount of required direction, build time, etc. d.

If you need additional automation, for example, list_new_node automatically initialize struct Foo , I would recommend creating a general type table structure that you can pass, which contains information such as big T, a function pointer that points to the function to create a default instance for T , the other for copying T, cloning T, destroying T, comparator, etc. In C ++, you can automatically generate this table using templates and built-in language concepts such as copy constructors and destructors. C requires a bit more manual effort, but you can still slightly reduce its pattern with macros.

Another trick that can be useful if you are following the route of generating more macro-oriented code is to use a prefix or suffix convention for identifier names. For example, CLONE (Type, ptr) can be defined to return Type##Clone(ptr) , so CLONE(Foo, foo) can call FooClone(foo) . This is a kind of cheat to get something like an overload function in C and is useful when generating code in bulk (when CLONE is used to implement another macro) or even a bit of copying and pasting template-type code, at least to improve the uniformity of the template.

+15
May 03 '15 at 14:21
source share

Option 1, using void * or some union option, is used by most C programs, and it can give you better performance than the C ++ / macro style, which has several implementations for different types, since it has less code duplication and therefore less icache pressure and fewer icache misses.

+2
Jun 14 '10 at 18:47
source share

GLib has a bunch of common data structures in it, http://www.gtk.org/

CCAN has tons of useful snippets and http://ccan.ozlabs.org/

+2
Jun 15 '10 at 18:28
source share

Your option 1 is what the oldest programmers of the time will strive for, perhaps salty ones with a little 2 to reduce repetitive typing, and it’s just possible to use several function pointers to taste polymorphism.

+1
Jun 14 '10 at 17:55
source share

There is a general option for option 1, which is more efficient, since it uses unions to store values ​​in the nodes of the list, that is, there is no additional indirectness. This has the disadvantage that the list only accepts values ​​of certain types and potentially takes up some memory if the types have different sizes.

However, you can get rid of union by using the flexible member of the array instead if you want to break a strict alias. Code Example C99:

 #include <assert.h> #include <stdio.h> #include <stdlib.h> #include <string.h> struct ll_node { struct ll_node *next; long long data[]; // use `long long` for alignment }; extern struct ll_node *ll_unshift( struct ll_node *head, size_t size, void *value); extern void *ll_get(struct ll_node *head, size_t index); #define ll_unshift_value(LIST, TYPE, ...) \ ll_unshift((LIST), sizeof (TYPE), &(TYPE){ __VA_ARGS__ }) #define ll_get_value(LIST, INDEX, TYPE) \ (*(TYPE *)ll_get((LIST), (INDEX))) struct ll_node *ll_unshift(struct ll_node *head, size_t size, void *value) { struct ll_node *node = malloc(sizeof *node + size); if(!node) assert(!"PANIC"); memcpy(node->data, value, size); node->next = head; return node; } void *ll_get(struct ll_node *head, size_t index) { struct ll_node *current = head; while(current && index--) current = current->next; return current ? current->data : NULL; } int main(void) { struct ll_node *head = NULL; head = ll_unshift_value(head, int, 1); head = ll_unshift_value(head, int, 2); head = ll_unshift_value(head, int, 3); printf("%i\n", ll_get_value(head, 0, int)); printf("%i\n", ll_get_value(head, 1, int)); printf("%i\n", ll_get_value(head, 2, int)); return 0; } 
+1
Jun 15 '10 at 8:37
source share

An old question, I know, but in case it is still of interest: I experimented with option 2) (macros in front of the processor) today, and came up with an example that I will insert below. A little clumsy, but not scary. The code is not completely type safe, but contains health checks to ensure a reasonable level of security. And working with compiler error messages while writing was moderate compared to what I saw when running C ++ templates. You should probably start reading this in the sample use code in the "main" function.

 #include <stdio.h> #define LIST_ELEMENT(type) \ struct \ { \ void *pvNext; \ type value; \ } #define ASSERT_POINTER_TO_LIST_ELEMENT(type, pElement) \ do { \ (void)(&(pElement)->value == (type *)&(pElement)->value); \ (void)(sizeof(*(pElement)) == sizeof(LIST_ELEMENT(type))); \ } while(0) #define SET_POINTER_TO_LIST_ELEMENT(type, pDest, pSource) \ do { \ ASSERT_POINTER_TO_LIST_ELEMENT(type, pSource); \ ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \ void **pvDest = (void **)&(pDest); \ *pvDest = ((void *)(pSource)); \ } while(0) #define LINK_LIST_ELEMENT(type, pDest, pSource) \ do { \ ASSERT_POINTER_TO_LIST_ELEMENT(type, pSource); \ ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \ (pDest)->pvNext = ((void *)(pSource)); \ } while(0) #define TERMINATE_LIST_AT_ELEMENT(type, pDest) \ do { \ ASSERT_POINTER_TO_LIST_ELEMENT(type, pDest); \ (pDest)->pvNext = NULL; \ } while(0) #define ADVANCE_POINTER_TO_LIST_ELEMENT(type, pElement) \ do { \ ASSERT_POINTER_TO_LIST_ELEMENT(type, pElement); \ void **pvElement = (void **)&(pElement); \ *pvElement = (pElement)->pvNext; \ } while(0) typedef struct { int a; int b; } mytype; int main(int argc, char **argv) { LIST_ELEMENT(mytype) el1; LIST_ELEMENT(mytype) el2; LIST_ELEMENT(mytype) *pEl; el1.value.a = 1; el1.value.b = 2; el2.value.a = 3; el2.value.b = 4; LINK_LIST_ELEMENT(mytype, &el1, &el2); TERMINATE_LIST_AT_ELEMENT(mytype, &el2); printf("Testing.\n"); SET_POINTER_TO_LIST_ELEMENT(mytype, pEl, &el1); if (pEl->value.a != 1) printf("pEl->value.a != 1: %d.\n", pEl->value.a); ADVANCE_POINTER_TO_LIST_ELEMENT(mytype, pEl); if (pEl->value.a != 3) printf("pEl->value.a != 3: %d.\n", pEl->value.a); ADVANCE_POINTER_TO_LIST_ELEMENT(mytype, pEl); if (pEl != NULL) printf("pEl != NULL.\n"); printf("Done.\n"); return 0; } 
+1
Jul 16 '13 at 10:28
source share

I use void (void *) pointers to represent common data structures defined with structs and typedefs. Below I will talk about my lib implementation that I am working on.

With such an implementation, you can think of every new type defined with typedef as a pseudo-class. Here, this pseudo-class is a collection of source code (some_type_implementation.c) and its header file (some_type_implementation.h).

In the source code, you need to define a structure that will represent the new type. Notice the structure in the source file node.c. There I made a void pointer to the attribute "info". This pointer can carry any type of pointer (I think), but the price you have to pay is the type identifier inside the structure (int type) and all switches to make a hint handle for each specific type. So, in the node.h header file, I defined the type “Node” (just not to type a struct node every time), and I also had to define the constants “EMPTY_NODE”, COMPLEX_NODE ”and“ MATRIX_NODE ”.

You can compile manually with "gcc * .c -lm".

main.c Source File

 #include <stdio.h> #include <math.h> #define PI M_PI #include "complex.h" #include "matrix.h" #include "node.h" int main() { //testCpx(); //testMtx(); testNode(); return 0; } 

node.c Source File

 #include <stdio.h> #include <stdlib.h> #include <math.h> #include "node.h" #include "complex.h" #include "matrix.h" #define PI M_PI struct node { int type; void* info; }; Node* newNode(int type,void* info) { Node* newNode = (Node*) malloc(sizeof(Node)); newNode->type = type; if(info != NULL) { switch(type) { case COMPLEX_NODE: newNode->info = (Complex*) info; break; case MATRIX_NODE: newNode->info = (Matrix*) info; break; } } else newNode->info = NULL; return newNode; } int emptyInfoNode(Node* node) { return (node->info == NULL); } void printNode(Node* node) { if(emptyInfoNode(node)) { printf("Type:%d\n",node->type); printf("Empty info\n"); } else { switch(node->type) { case COMPLEX_NODE: printCpx(node->info); break; case MATRIX_NODE: printMtx(node->info); break; } } } void testNode() { Node *node1,*node2, *node3; Complex *Z; Matrix *M; Z = mkCpx(POLAR,5,3*PI/4); M = newMtx(3,4,PI); node1 = newNode(COMPLEX_NODE,Z); node2 = newNode(MATRIX_NODE,M); node3 = newNode(EMPTY_NODE,NULL); printNode(node1); printNode(node2); printNode(node3); } 

node.h header file

 #define EMPTY_NODE 0 #define COMPLEX_NODE 1 #define MATRIX_NODE 2 typedef struct node Node; Node* newNode(int type,void* info); int emptyInfoNode(Node* node); void printNode(Node* node); void testNode(); 

matrix.c Source File

 #include <stdio.h> #include <stdlib.h> #include <math.h> #include "matrix.h" struct matrix { // Meta-information about the matrix int rows; int cols; // The elements of the matrix, in the form of a vector double** MTX; }; Matrix* newMtx(int rows,int cols,double value) { register int row , col; Matrix* M = (Matrix*)malloc(sizeof(Matrix)); M->rows = rows; M->cols = cols; M->MTX = (double**) malloc(rows*sizeof(double*)); for(row = 0; row < rows ; row++) { M->MTX[row] = (double*) malloc(cols*sizeof(double)); for(col = 0; col < cols ; col++) M->MTX[row][col] = value; } return M; } Matrix* mkMtx(int rows,int cols,double** MTX) { Matrix* M; if(MTX == NULL) { M = newMtx(rows,cols,0); } else { M = (Matrix*)malloc(sizeof(Matrix)); M->rows = rows; M->cols = cols; M->MTX = MTX; } return M; } double getElemMtx(Matrix* M , int row , int col) { return M->MTX[row][col]; } void printRowMtx(double* row,int cols) { register int j; for(j = 0 ; j < cols ; j++) printf("%g ",row[j]); } void printMtx(Matrix* M) { register int row = 0, col = 0; printf("\vSize\n"); printf("\tRows:%d\n",M->rows); printf("\tCols:%d\n",M->cols); printf("\n"); for(; row < M->rows ; row++) { printRowMtx(M->MTX[row],M->cols); printf("\n"); } printf("\n"); } void testMtx() { Matrix* M = mkMtx(10,10,NULL); printMtx(M); } 

matrix.h header file

 typedef struct matrix Matrix; Matrix* newMtx(int rows,int cols,double value); Matrix* mkMatrix(int rows,int cols,double** MTX); void print(Matrix* M); double getMtx(Matrix* M , int row , int col); void printRowMtx(double* row,int cols); void printMtx(Matrix* M); void testMtx(); 

complex.c Source File

 #include <stdio.h> #include <stdlib.h> #include <math.h> #include "complex.h" struct complex { int type; double a; double b; }; Complex* mkCpx(int type,double a,double b) { /** Doc - {{{ * This function makes a new Complex number. * * @params: * |-->type: Is an interger that denotes if the number is in * | the analitic or in the polar form. * | ANALITIC:0 * | POLAR :1 * | * |-->a: Is the real part if type = 0 and is the radius if * | type = 1 * | * `-->b: Is the imaginary part if type = 0 and is the argument * if type = 1 * * @return: * Returns the new Complex number initialized with the values * passed *}}} */ Complex* number = (Complex*)malloc(sizeof(Complex)); number->type = type; number->a = a; number->b = b; return number; } void printCpx(Complex* number) { switch(number->type) { case ANALITIC: printf("Re:%g | Im:%g\n",number->a,number->b); break; case POLAR: printf("Radius:%g | Arg:%g\n",number->a,number->b); break; } } void testCpx() { Complex* Z = mkCpx(ANALITIC,3,2); printCpx(Z); } 

complex.h header file

 #define ANALITIC 0 #define POLAR 1 typedef struct complex Complex; Complex* mkCpx(int type,double a,double b); void printCpx(Complex* number); void testCpx(); 

I hope I haven’t missed anything.

+1
Jul 27 '15 at 13:16
source share

I would like to program my future "low-level" projects in C, and not in C ++ ...

Why? Is your target a C ++ compiler or a C ++ runtime missing?

0
Jun 16 '10 at 20:37
source share

I use option 2 for a couple of high-performance collections, and it takes a lot of time working on the amount of macro logic needed to make something really compiled and common. I do this solely for raw performance (games). The X-macro method is used.

The painful problem that constantly arises in Option 2 is this: "Assuming some finite number of options, for example 8/16/32/64 bit keys, I make the indicated value a constant and define several functions, each with a different element in this set "which constant can take, or am I just making it a member variable?" The first means a less efficient command cache, since you have many duplicate functions with one or two numbers, and the last means that you need to refer to the selected variables, which in the worst case means a data cache miss. Since option 1 is purely dynamic, you will make such member variables without even thinking about it. However, this is really micro-optimization.

Also consider the tradeoff between returning pointers and values: the latter is most effective when the size of the data item is less than or equal to the size of the pointer; whereas if the data element is larger, it is most likely better to return pointers than force a copy of a large object, returning a value.

I would strongly suggest switching to Option 1 in any scenario where you are not 100% sure that collection performance will be your bottleneck. Even when using Option 2, my collections library provides a “quick setup” that is similar to option 1, i.e. Using void * values ​​in my list and map. This is enough for 90 %% of circumstances.

0
May 03 '15 at 11:35
source share



All Articles