What is the most interesting and promising approach to compiler implementation in C #?

I am only at the beginning of my graduation project, which should last 6 months. The goal of the project is to implement a .NET compiler for one scripting language. I had a compiler structure as a subject in my curriculum, and I know the basic steps of how to implement the compiler as a whole, but we used Bison and a simple compiler with GCC as a back-end, and therefore I know little about implementing compilers on .Net platform.

After doing some research on this topic, I found the following alternative solutions for generating code (I’m not talking about other essential parts of the compiler, such as the parser - it’s not here):

  • Direct code generation using Reflection.Emit .
  • Using the general compiler interface abstraction over Reflection.Emit to automate the generation of some code.
  • Using CodeDOM to compile C # and VB at runtime.
  • A new C # compiler called Roslyn , available as CTP, has appeared as a service.
  • DLR offers support for generating dynamic code and has some interfaces for generating runtime code through expression trees, etc.
  • Mono ships with the Mono.Cecil library, which appears to have some functions for generating code.

The main goal of my project is to go deeper into the guts of .Net, study Compiler Construction and get a good class for my work. The second goal is to come up with a compiler implementation that can later be open to the community under an open source permissive license.

So, what would be an interesting, educational, entertaining and promising approach here? I would definitely try them all if I had more time, but I need to send my work in 6 months to get a positive result ...

Thanks in advance, Alexander.

+4
source share
3 answers

If you need an easier way, and your language can be reasonably translated into C #, I would recommend that you generate C # code (or similar) and compile it. Roslyn would probably be the best at that. Apparently CCI can do this with CCI Code , but I never used that. I would not recommend CodeDOM because it does not support functions such as static classes or extension methods .

If you need additional control or want to go low, you can directly generate CIL using Reflection.Emit. But it will be (a lot) more work, especially if you are not familiar with CIL. I think that Cecil can be used the same way, but it is intended for something else, and I do not think that it offers any advantages over Reflection.Emit.

DLR is implied, as its full name implies, for dynamic languages. The Expression used can be used to generate code, but I think they are best suited for creating relatively simple methods at runtime. Of course, DLR in itself can be very useful if your language is dynamic.

+5
source

Boo is a language / compiler designed for the CLI. It seems like it's open source, so you can learn how they do it.

+2
source

When I wrote compilers, I wrote in assembly language (that is, in the source code of the assembler), after which I ran the system assembler. That way, I could easily understand what I was creating. It's much easier to read mov ax, bx (x86 build) than to decode HEX opcodes.

If I was not allowed to use assembler in the final product, I developed the compiler using the output from the assembly, and then, as soon as I got everything that worked, I made a binary output path. The beauty was, all I had to change was the actual output of bytes (opcodes and binary values, not text).

I would suggest doing something similar for your project. First, design it for the MSIL output, which you can build using ILASM. This way you can easily check your code generator output by reading the generated code. Once you are sure that your code generator is working, add an output parameter that will use Reflection.Emit or Common Compiler Infrastructure.

+2
source

Source: https://habr.com/ru/post/1380475/


All Articles