Marpa: Can I explicitly ban keywords as identifiers?

I am introducing a new DSL in Marpa and (from Regexp :: Grammars), I am more than satisfied. My language supports a bunch of unary and binary operators, objects with C style identifiers, and method calls using familiar dot notation. For instance:

foo.has(bar == 42 AND baz == 23)

I found the priority rules offered by the grammatical description of the Marpa grammar language, and many of them rely on this, so I have almost one G1 Expression Rule. Excerpt (many alternatives and semantic actions are omitted for brevity):

 Expression ::= NumLiteral | '(' Expression ')' assoc => group || Expression ('.') Identifier || Expression ('.') Identifier Args | Expression ('==') Expression || Expression ('AND') Expression Args ::= ('(') ArgsList (')') ArgsList ::= Expression+ separator => [,] Identifier ~ IdentifierHeadChar IdentifierBody IdentifierBody ~ IdentifierBodyChar* IdentifierHeadChar ~ [a-zA-Z_] IdentifierBodyChar ~ [a-zA-Z0-9_] NumLiteral ~ [0-9]+ 

As you can see, I am using the Scanless interface (SLIF). My problem is that it also parses, for example:

 foo.AND(5) 

Marpa knows that there can only be an identifier after a period, so he does not even think that AND can be a keyword. I know that I can avoid this problem by doing a separate lexing step that uniquely identifies AND as a keyword, but this tiny papercut is not worth the effort.

Is there a way in SLIF to restrict an Identifier rule to only identifiers without a keyword?

+5
source share
2 answers

I don’t know how to express such a thing in grammar. You can enter an intermediate nonterminal for an identifier that will check the condition, though:

 #!/usr/bin/perl use warnings; use strict; use Syntax::Construct qw{ // }; use Marpa::R2; my %reserved = map { $_ => 1 } qw( AND ); my $grammar = 'Marpa::R2::Scanless::G'->new( { bless_package => 'main', source => \( << '__GRAMMAR__'), :default ::= action => store :start ::= S S ::= Id | Id NumLiteral Id ::= Identifier action => allowed Identifier ~ IdentifierHeadChar IdentifierBody IdentifierBody ~ IdentifierBodyChar* IdentifierHeadChar ~ [a-zA-Z_] IdentifierBodyChar ~ [a-zA-Z0-9_] NumLiteral ~ [0-9]+ :discard ~ whitespace whitespace ~ [\s]+ __GRAMMAR__ }); for my $value ('ABC', 'ABC 42', 'AND 1') { my $value = $grammar->parse(\$value, 'main'); print $$value, "\n"; } sub store { my (undef, $id, $arg) = @_; $arg //= 'null'; return "$id $arg"; } sub allowed { my (undef, $id) = @_; die "Reserved keyword $id" if $reserved{$id}; return $id } 
+2
source

You can use lexeme priorities designed specifically for this kind of thing, for example here in the Marpa :: R2 test suite.

Basically, you declare <AND keyword> ~ 'AND' lexeme and assign it priority 1 so that it prefers more Identifier . That should do the trick.

PS I slightly modified the above script to give an example - code , output .

+3
source

Source: https://habr.com/ru/post/1207607/


All Articles