How do I need to create a simple LR parser?

I am trying to create a simple LR analyzer for the type of template file (configuration) that will be used to create some other files. I read and read about LR parsers, but I just can't figure it out! I understand that there is a parsing stack, a state stack, and a parsing table. Tokens are read onto the parsing stack, and when the rule is negotiated, the tokens are shifted or reduced depending on the parsing table. This continues recursively until all tokens are reduced and the parsing is complete.

The problem is that I really don't know how to create a parsing table. I read a lot of descriptions, but the language is technical, and I just do not understand. Can someone tell me how I will do this?

Also, how would I store things like the rules of my grammar?

http://codepad.org/oRjnKacH is a sample file that I am trying to parse with my grammar attempt for its language.

I have never done this before, so I'm just looking for advice, thanks.

+4
source share
5 answers

In your study of parser theory, you seem to have missed a much more practical fact: practically no one even thinks that you write a table-oriented parser from the bottom up, as you discuss. For practical purposes, handwritten parsers use a top-down structure (usually a recursive descent).

The main reason for using a table-driven parser is that it allows you to write (fairly) small code that manipulates the table, etc., which is almost completely general (i.e. it works for any parser). Then you encode everything related to a particular grammar into a form that is easy to manipulate with a computer (i.e., some tables).

Obviously, it would be entirely possible to do it manually if you really wanted to, but almost never had a real point. Creating tables entirely by hand would be very painful in itself.

For example, you usually start by creating an NFA, which is a large table โ€” usually one row for each analyzer state and one column for each possible. In each cell, you encode the next state to enter when you start in that state, and then get that input. Most of these transitions are mostly empty (i.e., they just say that entry is not allowed when you are in this state).

You then go through all of these and follow some fairly simple rules to put together NFA state sets together to become a state in DFA. The rules are simple enough that itโ€™s pretty easy to program them on a computer, but you have to repeat them for each cell of the NFA table and, in essence, improve accounting to create a DFA that works correctly.

The computer can and will do it pretty well - for this, applying a couple of simple rules for each of the twenty thousand cells in the NFA state table, this is a piece of cake. It is hard to imagine in order to expose a person to do the same, although, I am sure that under the leadership of the UN this would be illegal torture.

+6
source

The classic solution is the lex / yacc combination:

http://dinosaur.compilertools.net/yacc/index.html

Or, as gnu calls them, flex / bison.

edit:

Perl has Parse :: RecDescent, which is a recursive descent parser, but it may work better for simple operation.

+1
source

you need to read about ANTLR

+1
source

Well, you cannot understand it as

"Function A1 makes f for object B, then function A2 makes g in D, etc."

its more like

"Function A performs the action {a, b, c, d, e, f, g, h, i, j, k, l, m, n, o or p or no-op} and shifts / decreases a certain number of objects { 1-1567} in a stack of type {B, C, D, E, F or G} and objects containing it up to N levels, which can be of types {H, I, J, K or L etc} in certain combinations in accordance with list of rules "

Actually, you need a data table (or code generated from a data table, for example, as a BNF grammar dataset) that tells the function what to do.

You can write it from scratch. You can also paint walls with eyelash brushes. You can interpret the data table at runtime. You can also install Sleep (1000); in your code every line. Not that I tried it either.

Compilers are complicated. Consequently, compiler generators.

EDIT

You are trying to define tokens in terms of the content in the file itself.

I assume that the reason you do not want to use regular expressions is because you want to have access to the line number information for different tokens in the block of text, and not just for the block of text as a whole. If line numbers for each word are not needed, and whole blocks are going to fit into memory, I would be inclined to simulate the entire block in brackets as a token, as this can increase processing speed. In any case, you will need the yylex custom function. Start by creating lex with fixed tokens "[" and "]" to start and end the content, then freeze and modify it to update the data about which tokens to look for from yacc code.

0
source

I looked at the definition of your file, while I am missing some context why you would like to specifically use the LR parser, my first thought was why not use existing formats such as xml or json. Going along the parser generator route usually has a high initial cost, which does not pay off for the simple data that you are looking for parsing.

As Paul said, lex / yacc is an option, you can also see Boost :: Spirit .

I did not work with anyone, a year ago I wrote a much larger parser using QLALR from Qt / Nokia users. When I examined parsers, this one, although very underestimated, was of the least importance for getting started (only 1 tool), but it does not support lexical analysis. IIRC I could not understand the C ++ support in ANTLR at that time.

10,000 miles In general, you are looking at two components: a lexer that takes input characters and turns them into tokens of a higher order. For markers to work, rules will be indicated in your grammar description, usually you will include some code with rules, this code will be executed when the rule is matched. A compiler generator (e.g. yacc) will take your description of the rules and code and turn it into compiled code. If you do not do this manually, you yourself will not manipulate the tables.

0
source

Source: https://habr.com/ru/post/1302159/


All Articles