Compiler phases?

At what phase of compilation are the keywords of the recognized programming language indicated?

I'm a little confused between

  • Lexical analysis.
  • Analysis of the program.

I once wrote lexer in C using regular expressions, but it recognized main() in int main(void) as well as a keyword.

In these lines, I think we need to put together a parse tree to recognize keywords.

+4
source share
5 answers

I had to create a simple compiler this year as a project for which I used Java. Keyword recognition was done through lexical analysis. At this point, I would read the input language and create a token with the type (for the type of keywords it was a declaration_ variable) and its value. I also had different types for each case, such as identifier, constant, multiplication operation, adding operation, etc. Then these tokens were transferred to the queue and to the parser, which will check the grammar and create a binary tree, which was then used to create the output language.

+3
source

As a rule, the phase of lexical analysis of compilation breaks the input text into sequences of tokens, each of which refers to a certain type of token, which is useful in subsequent analysis. Therefore, keywords are usually first recognized during lexical analysis in order to simplify parsing. Since parsers are typically implemented by writing context-free token grammars rather than tokens (i.e. token categories rather than token contents), it is much easier to build a parser when keywords are tagged during lexing, For example, if I want to have a parser that treats "if" as a keyword, then I might need a rule that looks something like this in CFG:

 Statement ::= 'IF' Expr 'THEN' Expr 

If I do not classify IF and THEN into my own token types, then my parser would not be able to write an operator like the one above.

+3
source

This will be a lexical analysis.

Some languages ​​have “special” identifiers as well as keywords. They are often added to the identifier table and highlight known constant identifier values ​​before starting parsing so that they can be easily detected. However, they usually do not have much meaning for the parser - they should be found in the abstract syntax tree (AST) after parsing.

For example, see the Oberon language report ...

http://www-old.oberon.ethz.ch/oreport.html

Not a language recommendation - just an easily accessible and simple language specification (very Wirths style).

In any case, the Dictionary and Presentation section includes a list of “operators and delimiters,” including what most people recognize as keywords. They will be recognized by the lexical analyzer.

In the "Declarations and Area Rules" section, there is a list of predefined identifiers such as "ABS" and "BOOLEAN". I am not familiar enough with Oberon, but if I wrote the compiler, I had a good chance that I would simply pre-initialize the normal identifier table to include these predefined identifiers.

In C, the "core" in most cases is just another function. The compiler may or may not treat it as special. Perhaps the only "special" thing is that the startup code that is associated with your final executable calls this function.

+1
source

It depends on the definition, in particular, where you draw the lines between the scanner, tokenizer, lexer and parser. Since this is homework, and it is right if your prof. speaks correctly: take a look at the definitions that were presented in your reading material.

Regarding main (): You can definitely say that main (), since all other functions are not a keyword, it is, however, a token. The tokenizer recognizes that the substring "main" is one token, the parser sets it in relation to them "(...)" and "{...}". In addition, for main (), the parser automatically generates a program entry point.

+1
source

Traditionally, keywords are recognized by the lexer (which leaves you in a language made from a fixed set of keywords). But of course you can do this during a parsing session. You can even get rid of your lexer completely by using one of the many analysis methods without lexing (like PEGs ). This can help you avoid confusion.

+1
source

Source: https://habr.com/ru/post/1342403/


All Articles