Float literal and range parameter in ANTLR

I am working on a parser for the D language, and I ran into a problem when I tried to add a slice operator rule. You can find the ANTLR grammar for here . Basically, the problem is that if lexer encounters such a line: "1..2" it is completely lost, and it ends up as a single float, and therefore the postfixExpression rule for a string like "a [10 .. 11]" becomes an ExpArrIndex object with an ExpLiteralReal argument. Can anyone explain what exactly is wrong with numeric literals? (as I understand it, this is not so somewhere around these tokens)

+4
source share
2 answers

You can do this by issuing two tokens ( Int and Range tokens) when you see ".." inside the float rule. You need to override two methods in your lexer in order to accomplish this.

Demo with a small part of your Dee grammar:

 grammar Dee; @lexer::members { java.util.Queue<Token> tokens = new java.util.LinkedList<Token>(); public void offer(int ttype, String ttext) { this.emit(new CommonToken(ttype, ttext)); } @Override public void emit(Token t) { state.token = t; tokens.offer(t); } @Override public Token nextToken() { super.nextToken(); return tokens.isEmpty() ? Token.EOF_TOKEN : tokens.poll(); } } parse : (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF ; Range : '..' ; IntegerLiteral : Integer IntSuffix? ; FloatLiteral : Float ImaginarySuffix? ; // skipping Space : ' ' {skip();} ; // fragments fragment Float : d=DecimalDigits ( options {greedy = true; } : FloatTypeSuffix | '..' {offer(IntegerLiteral, $d.text); offer(Range, "..");} | '.' DecimalDigits DecimalExponent? ) | '.' DecimalDigits DecimalExponent? ; fragment DecimalExponent : 'e' | 'E' | 'e+' | 'E+' | 'e-' | 'E-' DecimalDigits; fragment DecimalDigits : ('0'..'9'|'_')+ ; fragment FloatTypeSuffix : 'f' | 'F' | 'L'; fragment ImaginarySuffix : 'i'; fragment IntSuffix : 'L'|'u'|'U'|'Lu'|'LU'|'uL'|'UL' ; fragment Integer : Decimal| Binary| Octal| Hexadecimal ; fragment Decimal : '0' | '1'..'9' (DecimalDigit | '_')* ; fragment Binary : ('0b' | '0B') ('0' | '1' | '_')+ ; fragment Octal : '0' (OctalDigit | '_')+ ; fragment Hexadecimal : ('0x' | '0X') (HexDigit | '_')+; fragment DecimalDigit : '0'..'9' ; fragment OctalDigit : '0'..'7' ; fragment HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ; 

Test the grammar with the class:

 import org.antlr.runtime.*; public class Main { public static void main(String[] args) throws Exception { DeeLexer lexer = new DeeLexer(new ANTLRStringStream("1..2 .. 33.33 ..21.0")); DeeParser parser = new DeeParser(new CommonTokenStream(lexer)); parser.parse(); } } 

And when you run Main , the following output is issued:

 IntegerLiteral '1' Range '..' IntegerLiteral '2' Range '..' FloatLiteral '33.33' Range '..' FloatLiteral '21.0' 

EDIT

Yes, as you pointed out in the comments, the lexer rule can only issue one single token. But, as you yourself have tried, semantic predicates can actually be used to make the lexer look forward to char -stream, to make sure that there really is a ".." after the IntegerLiteral token, before trying to match the FloatLiteral .

The following grammar will create the same tokens as the first demo.

 grammar Dee; parse : (t=. {System.out.printf("\%-15s '\%s'\n", tokenNames[$t.type], $t.text);})* EOF ; Range : '..' ; Number : (IntegerLiteral Range)=> IntegerLiteral {$type=IntegerLiteral;} | (FloatLiteral)=> FloatLiteral {$type=FloatLiteral;} | IntegerLiteral {$type=IntegerLiteral;} ; // skipping Space : ' ' {skip();} ; // fragments fragment DecimalExponent : 'e' | 'E' | 'e+' | 'E+' | 'e-' | 'E-' DecimalDigits; fragment DecimalDigits : ('0'..'9'|'_')+ ; fragment FloatLiteral : Float ImaginarySuffix?; fragment IntegerLiteral : Integer IntSuffix?; fragment FloatTypeSuffix : 'f' | 'F' | 'L'; fragment ImaginarySuffix : 'i'; fragment IntSuffix : 'L'|'u'|'U'|'Lu'|'LU'|'uL'|'UL' ; fragment Integer : Decimal| Binary| Octal| Hexadecimal ; fragment Decimal : '0' | '1'..'9' (DecimalDigit | '_')* ; fragment Binary : ('0b' | '0B') ('0' | '1' | '_')+ ; fragment Octal : '0' (OctalDigit | '_')+ ; fragment Hexadecimal : ('0x' | '0X') (HexDigit | '_')+; fragment DecimalDigit : '0'..'9' ; fragment OctalDigit : '0'..'7' ; fragment HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ; fragment Float : d=DecimalDigits ( options {greedy = true; } : FloatTypeSuffix | '.' DecimalDigits DecimalExponent? ) | '.' DecimalDigits DecimalExponent? ; 
+4
source

from D lexer doc

The source text is broken into markers using the maximum label method, i.e. The lexical analyzer is trying to make the longest token that it can. For example, → is a right shift marker, not two than markers. An exception to this rule is that a .. embedded inside what looks like two floating point literals , as in 1..2, is interpreted as if .. were separated by a space from the first integer.

it is possible to do a preliminary analysis that s/(\d)\.\.(\d)/$1 .. $2/ does

+2
source

Source: https://habr.com/ru/post/1388070/


All Articles