Analysis of options with PEG (Graco) is not suitable?

My colleague PaulS asked me the following:


I am writing a parser for an existing language (SystemVerilog is an IEEE standard), and the specification has a rule in it that is similar in structure:

cover_point = [[data_type] identifier ':' ] 'coverpoint' identifier ';' ; data_type = 'int' | 'float' | identifier ; identifier = ?/\w+/? ; 

The problem is that when analyzing the following legal line:

 anIdentifier: coverpoint another_identifier; 

anIdentifier matches data_type (via its identifier), which means that Grako looks for another identifier after it and then fails. Then it does not try to parse without the data_type part.

I can rewrite the rule as follows:

 cover_point_rewrite = [data_type identifier ':' | identifier ':' ] 'coverpoint' identifier ';' ; 

but I wonder if:

  • it is intentional and
  • if there is a better syntax?

Is this a PEG problem in general or an instrument (Grako)?

+4
source share
1 answer

It says here that in PEG the selection operator is ordered to avoid CFG ambiguities using the first match.

In your first example

  [data_type] 
id is parsed, so it fails when it finds : instead of another identifier. This may be because [data_type] behaves like (data_type | ε) , so it will always parse data_type with the first id.

IN

  [data_type identifier ':' |  identifier ':'] 
the first choice fails when there is no second id, so the analyzer backs out and tries with the second choice.
+2
source

Source: https://habr.com/ru/post/972294/


All Articles