How to start parsing as soon as a lexer returns a token? (Building a compiler)

I am trying to create a compiler in C for a C-like language. I just wrote a lexer that will read the input file and output a stream of tokens. Now I understand the theory of grammars, and I know how to write parse trees manually for different expressions. But I kind of lost touch with the real code! How to start the parsing process after I have tokens? First, what should my parser program do when it receives the first token? I know, somehow it needs to be arranged like a tree. But how do I get started? Any help would be great, I'm just a beginner.

Thanks a lot!


the next token after processing the current token. Now I plan to start by assuming that my language consists entirely of assignment operators.
So the form of BNF looks something like this:

< assignment_statement > ::= < destination > := < factor > < destination > ::= < identifier > < identifier > ::= [a-zA-Z][a-zA-Z0-9]* <factor > ::= < name >|< number >|< string > < name > ::= < identifier > < number > ::= [0-9]+ < string > ::="[a-zA-Z0-9]*" 

Now that I see a statement like: var := 9 , lexer will return the first token as "var", what will be the parent rule and the correctors? Also, how do I build a parse tree for this operator?

Thanks a lot!

+4
source share
1 answer

A common example is that the parser requests tokens. The analyzer loop usually takes the form of โ€œreading the next token, determining what to do with it, updating the partially processed tree, repeatingโ€, and it is easier to achieve it if the parser calls the lexer itself (instead of the third code fragment read from lexer and the feed tokens to the parser) .

Of course, the heart of the analyzer depends on the algorithm you are using (recursive descent, LR ...), but it usually consists in determining whether the new token matches the rule that you are currently parsing (for example, find - when reading EXPR = '-' EXPR | ... ), fits with a rule (for example, when searching for a class while reading DEF = CLASS | ... such that CLASS = 'class' ... ) or does not fit at all (at this point you should abort current rule by building the corresponding AST node and repeat the process according to the parent rule).

Recursive descent analyzers do this using subheadings (for sub-rules) and returning values โ€‹โ€‹(to return to the parent rule), while RL partners tend to compress multiple rules and sub-chapters into a single shift to stay in the current set of rules or reduce to break the rules and build one or more AST nodes.

+4
source

Source: https://habr.com/ru/post/1335395/


All Articles