The easiest way to work with the intermediate format

The tool I'm working on is to take the intermediate format generated by the compiler, add some code to it, and then pass this modified intermediate code to the compiler backend to generate the final code.

After doing a little research on gcc, I found that the GIMPLE format is easy to understand, but I’m not sure how difficult it is to change the GIMPLE code and I don’t know how to restart compilation there besides using plugins and adding my own pass. People have also warned me that the documentation is not enough, and it gets harder when you get stuck when working with gcc.

Another option is to use LLVM bytecode. But I never worked with LLVM, so I don’t know how difficult my task is with LLVM. There are perhaps even better options that I don’t know about. So I just want to know the best option. My preferences are as follows.

  • Platform independence
  • Ease of use
  • Well documented
  • More people use it, so more help is available.
+1
source share
3 answers

As you probably already know, MELT is a high-level domain language for the GCC extension. You can easily work with Gimple (etc.) with it (and also modify internal representations in Gcc)

However, the GCC extension means some work, because the Gimple (as well as Tree) views (with others, like Edges ..) are complex ...

+1
source

According to your description, LLVM is great for counting. One of its main goals is to serve as a flexible library and the basis for manipulating IR code. Countless optimizations, transformations, and analyzes "pass"; this happens as evidence and as great examples. IMO LLVM also answers 4 questions that you list very well in your question:

  • Platform Independence: LLVM runs on major platforms (Linux, Mac, and Windows) and knows how to generate code for many types of processors.
  • Easy to use: IR and compilers are a difficult area to crack, but as far as possible, LLVM is a good candidate because it is a relatively new project, well documented, with a very clean code base.
  • Well-documented: knock yourself out
  • More people use this: very active development and use, and some corporations have already invested heavily in it (primarily Apple and Google).
+1
source

This may not be useful at all, but I wondered about going through gcc processing. The abbreviated (reduced mainly for exec / fork calls) is derived from strace -f -o gcc.strace gcc -c tstamp.c :

 7141 execve("/usr/bin/gcc", ["gcc", "-c", "tstamp.c"], [/* 52 vars */]) = 0 7141 open("/tmp/ccqzaCI4.s", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0600) = 3 7141 close(3) = 0 7141 vfork( <unfinished ...> 7142 execve("/usr/libexec/gcc/i686-redhat-linux/4.6.1/cc1", ["/usr/libexec/gcc/i686-redhat-lin"..., "-quiet", "tstamp.c", "-quiet", "-dumpbase", "tstamp.c", "-mtune=generic", "-march=i686", "-auxbase", "tstamp", "-o", "/tmp/ccqzaCI4.s"], [/* 55 vars */] <unfinished ...> 7141 <... vfork resumed> ) = 7142 7141 waitpid(7142, <unfinished ...> 7142 <... execve resumed> ) = 0 7142 open("tstamp.c", O_RDONLY|O_NOCTTY|O_LARGEFILE) = 3 7142 close(3) = 0 7142 open("/tmp/ccqzaCI4.s", O_RDWR|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3 7142 open("/usr/include/stdio.h", O_RDONLY|O_NOCTTY|O_LARGEFILE) = 4 ... (opens and closes every include file) 7142 close(4) = 0 7142 close(3) = 0 7142 exit_group(0) = ? 7141 <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 7142 7141 vfork( <unfinished ...> 7143 execve("/usr/bin/as", ["as", "--32", "-o", "tstamp.o", "/tmp/ccqzaCI4.s"], [/* 55 vars */] <unfinished ...> 7141 <... vfork resumed> ) = 7143 7141 waitpid(7143, <unfinished ...> 7143 <... execve resumed> ) = 0 7143 unlink("tstamp.o") = 0 7143 open("tstamp.o", O_RDWR|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3 7143 open("/tmp/ccqzaCI4.s", O_RDONLY|O_LARGEFILE) = 4 7143 close(4) = 0 7143 close(3) = 0 7143 exit_group(0) = ? 7141 <... waitpid resumed> [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 7143 7141 unlink("/tmp/ccqzaCI4.s") = 0 7141 exit_group(0) = ? 

cc1 has all the applicable logic. I assume this is a complex program, especially after entering:

 /usr/libexec/gcc/i686-redhat-linux/4.6.1/cc1 --help 

and

 /usr/libexec/gcc/i686-redhat-linux/4.6.1/cc1 --help=C 
0
source

Source: https://habr.com/ru/post/1395130/


All Articles