What are the subphases of the semantics analysis compiler phase?

I asked how the compiler works. I looked through several books, and they all agree that the compiler phases roughly correspond to this (correct me if I am wrong): lexical analysis, parsing, semantic analysis, intermediate code, code optimization, code generation. The lexical and syntactic phases look quite clear and understandable as methods (but this, of course, does not mean that it is easy). However, I still cannot find what really is in the semantic phase. Firstly, I know that there must be some sub-phases, such as scope checking, declaration checking, and type checking, but the question that bothered me is this: are there other things that need to be done. Can you tell me what are the necessary steps to take at this stage. I know that this depends heavily on the programming language and implementation of the compiler, but could you give a few examples regarding C / C ++, Java. And could you please point me to a book / page / article where I can read these things in depth. Thanks.

Edit: The books I looked at were Compilers: Principles, Methods, and Tools, Aho and Modern Compiler Design, Grunet, Reywijk. I could not answer this question using them. If you find this question is too broad, you can give an answer by considering the implementation of the compiler of your choice for C, C ++ or Java.

+4
source share
3 answers

There are typical phases of "semantic analysis" that many compilers go through in one form or another. After lexing and parsing in this order, the following actions are usually performed:

  • Name and type resolution. Defines lexical areas, identifiers declared in such areas, type information for these identifiers and for each use without identifier declaration, declaration to which it refers

  • Flow control analysis. Graphing the control flow over computations explicit and / or implied (for example, by constructors) by the code.

  • Data flow analysis. Determines where variables get new values, and where these values ​​are read by other parts of the program. (This is often done by local analysis as part of the procedures, followed perhaps by one of the procedures).

Also often done as part of data flow analysis:

  • Points to the analysis. Definition for each pointer in every place in the code, which objects that the pointer can refer to

  • Call schedule. Charting calls through procedures, often taking into account pointers to indirect functions whose estimated values ​​occur during point analysis.

As a practical matter, some of them need to be alternated to get the best results.

In addition, there are many analyzes used to support various optimizations and code generation passes. If you really want to know more, check out any decent compiler book.

+4
source

As mentioned in templatetypedef, semantic analysis is language specific. For C ++, in addition to everything else, it will involve what is required to create templates (C ++ tends to be more and more semantic analysis), and for Java there should be some proven analysis of exceptions.

Even for C, the GNU C compiler can be configured to check the arguments given for string interpolations. I suggest that GCC has a choice of options for semi-semantic analysis. If you are making paper on this subject, you can spend the day counting them :)

Besides accessibility, I find that semantic analysis is what distinguishes statically typed imperative object-oriented languages ​​today.

+1
source

You cannot divide it into subphases at all. There are many things that need to be done, but at least conceptually, all this is done when you walk through the parse tree from top to bottom and come back again. What exactly they are and how exactly all this happens depends on the language, the application being processed, the particular compiler writer, ...

You can start making a list:

  • Create a character table.
  • Find variable declarations that are referenced.
  • Check compatibility of variable data types.
  • Set types of subexpressions.
  • ...

You can see that they should already be somewhat mixed in practice, and do not represent shared subphases.

0
source

Source: https://habr.com/ru/post/1502896/


All Articles