What happened to a simple ANTLR grammar?

Question

What happened to a simple ANTLR grammar?

I am writing an ANTLR grammar to parse log files and ran into a problem. I simplified my grammar to reproduce the problem as follows:

stmt1: '[ ' elapse ': ' stmt2 ; stmt2: '[xxx' ; stmt3: ': [yyy' ; elapse : FLOAT; FLOAT : ('0'..'9')+ '.' ('0'..'9')* ;

When I used the following line to check grammar:

 [ 98.9: [xxx

I got an error:

 E:\work\antlr\output\__Test___input.txt line 1:9 mismatched character 'x' expecting 'y' E:\work\antlr\output\__Test___input.txt line 1:10 no viable alternative at character 'x' E:\work\antlr\output\__Test___input.txt line 1:11 no viable alternative at character 'x' E:\work\antlr\output\__Test___input.txt line 1:12 mismatched input '<EOF>' expecting ': '

But if I remove the ruel 'stmt3', the same line will be accepted.

I'm not sure what happened ...

Thanks for any advice!

Leon

Thanks from Bart. I tried to fix the grammar. I think the basic level, I have to eliminate all tokens. And I am adding a WS token to simplify the rule.

 stmt1: '[' elapse ':' stmt2 ; stmt2: '[' 'xxx' ; stmt3: ':' '[' 'yyy' ; elapse : FLOAT; FLOAT : ('0'..'9')+ '.' ('0'..'9')* ; WS : (' ' |'\t' |'\n' |'\r' )+ {skip();} ;

+1

antlr

Leon chen Oct 29 '12 at 16:40

source share

1 answer

Bart kiers · Accepted Answer · 2012-10-29T18:45:14+0000

ANTLR has a strict separation between lexer rules (tokens) and parser rules. Although you have defined some literals inside the parser rules, they are still tokens. This means that the following grammar is equivalent (in practice) to your example grammar:

 stmt1 : T1 elapse T2 stmt2 ; stmt2 : T3 ; stmt3 : T4 ; elapse : FLOAT; T1 : '[ ' ; T2 : ': ' ; T3 : '[xxx' ; T4 : ': [yyy' ; FLOAT : ('0'..'9')+ '.' ('0'..'9')* ;

Now, when lexer tries to build tokens from the input "[ 98.9: [xxx" , it successfully creates the T1 and FLOAT tokens, but when it sees ": [" , it tries to build the T4 token. But when the next char in the stream has "x" instead of "y" , lexer tries to build another token starting with ": [" . But since such a marker does not exist, lexer emits an error:

[...] inappropriate character 'x', expecting 'y'

And no, the lexer will not back down to “abandon” the character "[" from ": [" to match the T2 token, and will not look forward to char -stream to see t25> really can be created. ANTLR LL (*) applies only to parser rules, not lexer rules!

What happened to a simple ANTLR grammar?

More articles: