Are there C functions or macros specifically designed to compile 1 to 1 with assembly instructions for manipulating bits in cross-platform form?

I have a project involving emulation (if you look at my message history, you will see how far I have come!), And I want to do pseudo-binary translation using C and play with optimizers and / or compilers using C code, which compiles my contents of the switch statement for one assembly instruction, mainly for very standard instructions like mov s, add , SR and other simple bit manipulations and arithmetic instructions. I hope to do this for ARM and x86-64 at the same time, writing as little as possible in both assemblies as much as possible.

If the thing I am describing does not exist, then I wonder if there is some kind of "assembly language" that I can use to write my code, and then compile this assembly into x86-64 and ARM.

+3
source share
3 answers

To clearly answer this part:

... then I wonder if there is any “assembly language” that I can use to write my code, and then compile this assembly into x86-64 and ARM.

That is what LLVM IR means.

The LLVM view is intended to be lightweight and low-level, while being expressive, typed, and extensible at the same time. It aims to be a "universal IR", being low enough so that high-level ideas can be clearly displayed on it (just as microprocessors are "universal IR", which allows many source languages ​​to compare them).

For example :

You can imagine this function C

 int mul_add(int x, int y, int z) { return x * y + z; } 

with this LLVM IR

 define i32 @mul_add(i32 %x, i32 %y, i32 %z) { entry: %tmp = mul i32 %x, %y %tmp2 = add i32 %tmp, %z ret i32 %tmp2 } 
+3
source

If you want to emit machine code at runtime, you will need the Just In Time translation library. You can consider GNU lightning , libjit , LLVM , GCCJIT , asmjit ...

You can also (on Linux) generate some C code in some file, break the compilation of this file into a common object, then dlopen (3) -ing that the .so plugin ...

As I said, cross-platform builds do not exist and cannot exist (because the systems have different commands and ABIs ): instead, create C code or, possibly, LLVM IR code .

If you are writing an interpreter (and this includes many emulators ), consider also threaded code methods and bytecode generation.

+3
source

To say it in the narrow sense, the "assembler language" you are talking about is ... C.

This is because many C expressions have direct mappings for individual assembly instructions even on different platforms. The following is partially hypothetical, but it shows some of the instructions that a particular C expression can evaluate to x86, ARM, or SPARC (choosing these three because the ones that I know best):

 C code x86 asm ARM asm SPARC asm { enter push lr save %fp, ..., %sp } leave pop pc restore a += b; add %ebx, %eax add R0, R1 add %l0, %l1, %l0 a = b + c; lea (%ebx, %ecx), %eax add R0, R1, R2 add %l2, %l1, %l0 a = 0; xor %eax, %eax mov R0, #0 clr %l0 a++; inc %eax add R0, #1 inc %l0 a--; dec %eax sub R0, #1 dec %l0 *ptr++; inc (%eax) - - a = ~b; mov %ebx, %eax; not %eax mvn R0, R1 not %l1, %l0 ptr = &a; lea a, %eax ldr R0, =a set a, %l0 a = b[c]; mov (%ebx, %ecx), %eax ldr R0, [R1+R2] ld [%l1+%l2], %l0 (void)func(); call func blx func call func if (a) test %eax, %eax; jnz tst R0, R0; bnz tst %l0; bnz 

Of course, not everything you can write, since one line of C code is converted to one assembly instruction. It also depends heavily on the instruction set if certain multi-segment operations can be “flattened” to a single instruction with several operands or require a sequence of “more primitive” instructions.

Compilers

C for a long time performed an "intermediate representation" before the final conversion to assembly; this step is similar to the way it was done these days on equipment with x86 processors to “compile” the x86 assembly in lower-level microoperations that will process the actual chip actuators. The fact that the middle tier got codified / documented, as happened for LLVM IR, is not new or ... since, for example, Java Bytecode or Forth is conceptually suitable for this scheme.

I would go to C ... and look at the assembly. It is unlikely to be as compact as possible, and on platforms where the corresponding “complex” operation is available, it is unlikely to be more compact than LLVM IR (say, on a processor with smooth multiplication-addition, the auselen example would go to one instruction, out of three in LLVM IR).

+3
source

Source: https://habr.com/ru/post/1239333/


All Articles