C / C ++ on the char array body

I have several data structures, each of which has a 4 byte field.

Since on my platform 4 bytes are equal to 1 int , I want to use them in case labels:

 switch (* ((int*) &structure->id)) { case (* ((int*) "sqrt")): printf("its a sqrt!"); break; case (* ((int*) "log2")): printf("its a log2!"); break; case (((int) 'A')<<8 + (int) 'B'): printf("works somehow, but unreadable"); break; default: printf("unknown id"); } 

This results in a compilation error saying that the case expression does not boil down to int .

How can I use char arrays of limited size and use them in numeric types for use in switch / case ?

+6
source share
7 answers

Disclaimer: Do not use this other than for pleasure or training. For serious code, use common idioms; never rely on the specific behavior of the compiler in the general case; if this is done, incompatible platforms should cause a compile-time error or use good general code.


It seems that the standard allows multi-character character constants according to the grammar. I have not yet checked whether the following is legal.

 ~/$ cat main.cc #include <iostream> #ifdef I_AM_CERTAIN_THAT_MY_PLATFORM_SUPPORTS_THIS_CRAP int main () { const char *foo = "fooo"; switch ((foo[0]<<24) | (foo[1]<<16) | (foo[2]<<8) | (foo[3]<<0)) { case 'fooo': std::cout << "fooo!\n"; break; default: std::cout << "bwaah!\n"; break; }; } #else #error oh oh oh #endif ~/$ g++ -Wall -Wextra main.cc && ./a.out main.cc:5:10: warning: multi-character character constant fooo! 

edit: Oh look, right under the grammar extract there are 2.13.2 Character literals, Bullet 1 :

[...] A regular literal containing more than one c-char is a multi-channel literal. A multicharacter is of type int and has a value defined by the implementation.

But in the second pool:

[...] The meaning of a wide-character literal containing several c-characters is determined by the implementation.

So be careful.

+2
source

Follow the exact method used in encoding video with FourCC codes:

Set FourCC value in C ++

 #define FOURCC(a,b,c,d) ( (uint32) (((d)<<24) | ((c)<<16) | ((b)<<8) | (a)) ) 

It is probably a good idea to use the listed types or macros for each identifier:

 enum { ID_SQRT = FOURCC( 's', 'q', 'r', 't'), ID_LOG2 = FOURCC( 'l', 'o', 'g', '2') }; int structure_id = FOURCC( structure->id[0], structure->id[1], structure->id[2], structure->id[3] ); switch (structure_id) { case ID_SQRT: ... case ID_LOG2: ... } 
+4
source

I believe the problem here is that in C, every case label in the switch should be an integer constant expression. From the C ISO Specification, & section; 6.8.4.2/3:

The label expression of each case must be an expression, an integer constant expression [...]

(my emphasis)

Then, C spec defines an โ€œinteger constant expressionโ€ as a constant expression, where (& section 6.6 / 6):

Integer constant expression) must be of integer type and must have only operands that are integer constants, enumeration constants, symbolic constants, sizeof expressions, the results of which are integer constants and floating constants, which are direct operands of castings. Translation operators in an expression of an integer constant must convert arithmetic types to integer types, except that part of the operand is for sizeof operator.

(my emphasis again). This means that you cannot output a character literal (pointer) to an integer in a case expression, since this cast is not allowed in an integer constant expression.

Intuitively, the reason for this may be that in some implementations the actual location of the lines in the generated executable file is not necessarily indicated before the binding. Therefore, the compiler may not be able to produce very good code for the switch if the labels depend on a constant expression, which indirectly depends on the address of these lines, since it may miss the possibilities for compiling transition tables, for example. This is just an example, but the stricter specification language clearly prohibits you from doing what you described above.

Hope this helps!

+2
source

The problem is that case a switch branches expect a constant value. In particular, a constant that is known at compile time. The line address is not known at compile time - the linker knows the address, but not even the last address. I think the final, moved, address is only available at runtime.

You can simplify your problem to

 void f() { int x[*(int*)"x"]; } 

This gives the same error, since the address of the "x" literal is unknown at compile time. This is different from, for example,

 void f() { int x[sizeof("x")]; } 

Since the compiler knows the size of the pointer (4 bytes in 32-bit strings).

Now, how to fix your problem? Two things come to mind:

  • Do not add a string, but an integer to the id field, and then use the list of constants in your case .

  • I suspect that you will need to make a switch like this in several places, so my other suggestion is: do not use switch to execute code primarily depending on the type of structure. Instead, the structure may offer a pointer to a function that can be called to make the correct printf call. When creating the structure, the function pointer is set to the correct function.

Here's a code snippet illustrating the second idea:

 struct MyStructure { const char *id; void (*printType)(struct MyStructure *, void); void (*doThat)(struct MyStructure *, int arg, int arg); /* ... */ }; static void printSqrtType( struct MyStructure * ) { printf( "its a sqrt\n" ); } static void printLog2Type( struct MyStructure * ) { printf( "its a log2\n" ); } static void printLog2Type( struct MyStructure * ) { printf( "works somehow, but unreadable\n" ); } /* Initializes the function pointers in the structure depending on the id. */ void setupVTable( struct MyStructure *s ) { if ( !strcmp( s->id, "sqrt" ) ) { s->printType = printSqrtType; } else if ( !strcmp( s->id, "log2" ) ) { s->printType = printLog2Type; } else { s->printType = printUnreadableType; } } 

With this in place, your source code can simply do:

 void f( struct MyStruct *s ) { s->printType( s ); } 

This way you centralize type checking in one place, rather than cluttering your code with a multitude of switch .

+2
source

This is especially dangerous due to alignment: on many architectures, int aligned by 4 bytes, but character arrays are not. For example, on sparc, even if this code could be compiled (which cannot, because the string address is unknown before the link time), it will immediately raise SIGBUS .

+1
source

I just finished using this macro, as is the case with question 3 in the answer to the question or phresnels.

 #define CHAR4_TO_INT32(a, b, c, d) ((((int32_t)a)<<24)+ (((int32_t)b)<<16) + (((int32_t)c)<<8)+ (((int32_t)d)<<0)) switch (* ((int*) &structure->id)) { case (CHAR4_TO_INT32('S','Q','R','T')): printf("its a sqrt!"); break; } 
+1
source

it is more C than C ++.

union int_char4 {int_32 x; char [4] y;}

the union declares, defines its members to run at the same address, essentially providing different types for the same set of bytes.

int_char4 ic4; ic4.x is an int and ic4.y is a pointer to the first byte of the char array.

since you want to know, the implementation is up to you.

0
source

Source: https://habr.com/ru/post/895183/


All Articles