How to get the Antlr Parser rule for reading both by default and by private channel

I use the usual separation of spaces in a hidden channel, but I have one rule in which I would like to include any spaces for further processing, but any example I found requires very strange manual coding.

Is there a simple option for reading from several channels, for example, to put a space there from the very beginning.

Ex. this is the lexer rule for WhiteSpace

WS : ( ' ' | '\t' | '\r' | '\n' ) {$channel=HIDDEN;} ; 

And this is my rule where I would like to include spaces

 raw : '{'? (~('{'))*; 

Basically, this is a catch rule for any content that does not comply with other rules that should be handled by another template, and therefore I need the source stream.

I was hoping for an example syntax {$channel==DEFAULT || $channel==HIDDEN} {$channel==DEFAULT || $channel==HIDDEN} but cannot find.

My goal will be C #, but if necessary I can rewrite Java examples.

+6
source share
3 answers

AFAIK, this is not possible. However, you can extend UnbufferedTokenStream to change channel during parsing. You cannot use CommonTokenStream since it buffers a variable number of tokens (and there may be tokens in the buffer that are on the wrong channel!). Please note that you need at least ANTLR 3.3: in previous versions, UnbufferedTokenStream was not yet included.

Suppose you want to parse (and display) lowercase or uppercase letters. Uppercase letters are placed in the HIDDEN channel, so only lowercase letters are processed by deafness. However, when the parser stumbles upon the lowercase "q" , we want to switch to the HIDDEN channel. After parsing the HIDDEN channel, we want "q" to return us to DEFAULT_CHANNEL again.

So, when analyzing the source "aAbBcqCdDQeE" first "a" , "b" and "c" "aAbBcqCdDQeE" are printed, then the channel changes, then "c" and "D" are printed, then the channel changes again, and finally, "e" is output to console.

Here is the ANTLR grammar that does this:

ChannelDemo.g

 grammar ChannelDemo; @parser::members { private void handle(String letter) { if("Q".equals(letter)) { ((ChangeableChannelTokenStream)input).setChannel(Token.DEFAULT_CHANNEL); } else if("q".equals(letter)) { ((ChangeableChannelTokenStream)input).setChannel(HIDDEN); } else { System.out.println(letter); } } } parse : any* EOF ; any : letter=(LOWER | UPPER) {handle($letter.getText());} ; LOWER : 'a'..'z' ; UPPER : 'A'..'Z' {$channel=HIDDEN;} ; 

And here is the custom token stream class:

ChangeableChannelTokenStream.java

 import org.antlr.runtime.*; public class ChangeableChannelTokenStream extends UnbufferedTokenStream { public ChangeableChannelTokenStream(TokenSource source) { super(source); } public Token nextElement() { Token t = null; while(true) { t = super.tokenSource.nextToken(); t.setTokenIndex(tokenIndex++); if(t.getChannel() == super.channel) break; } return t; } public void setChannel(int ch) { super.channel = ch; } } 

And a small main class for testing everything:

Main.java

 import org.antlr.runtime.*; public class Main { public static void main(String[] args) throws Exception { ANTLRStringStream in = new ANTLRStringStream("aAbBcqCdDQeE"); ChannelDemoLexer lexer = new ChannelDemoLexer(in); ChangeableChannelTokenStream tokens = new ChangeableChannelTokenStream(lexer); ChannelDemoParser parser = new ChannelDemoParser(tokens); parser.parse(); } } 

Finally, generate lexer / parser (1), compile all the source files (2) and run the Main class (3):

1

  java -cp antlr-3.3.jar org.antlr.Tool ChannelDemo.g

2

  javac -cp antlr-3.3.jar * .java

3 (* nix)

  java -cp.: antlr-3.3.jar Main

3 (Windows)

  java -cp.; antlr-3.3.jar Main

which will output the following to the console:

  a
 b
 c
 C
 D
 e

EDIT

You can include the class in your grammar file as follows:

 grammar ChannelDemo; @parser::members { private void handle(String letter) { if("Q".equals(letter)) { ((ChangeableChannelTokenStream)input).setChannel(Token.DEFAULT_CHANNEL); } else if("q".equals(letter)) { ((ChangeableChannelTokenStream)input).setChannel(HIDDEN); } else { System.out.println(letter); } } public static class ChangeableChannelTokenStream extends UnbufferedTokenStream { private boolean anyChannel; public ChangeableChannelTokenStream(TokenSource source) { super(source); anyChannel = false; } @Override public Token nextElement() { Token t = null; while(true) { t = super.tokenSource.nextToken(); t.setTokenIndex(tokenIndex++); if(t.getChannel() == super.channel || anyChannel) break; } return t; } public void setAnyChannel(boolean enable) { anyChannel = enable; } public void setChannel(int ch) { super.channel = ch; } } } parse : any* EOF ; any : letter=(LOWER | UPPER) {handle($letter.getText());} | STAR {((ChangeableChannelTokenStream)input).setAnyChannel(true);} ; STAR : '*' ; LOWER : 'a'..'z' ; UPPER : 'A'..'Z' {$channel=HIDDEN;} ; 

A parser that is generated from the above grammar will allow reading through all channels when it encounters "*" . Therefore, when analyzing "aAbB*cCdDeE" :

 import org.antlr.runtime.*; public class Main { public static void main(String[] args) throws Exception { ANTLRStringStream in = new ANTLRStringStream("aAbB*cCdDeE"); ChannelDemoLexer lexer = new ChannelDemoLexer(in); ChannelDemoParser.ChangeableChannelTokenStream tokens = new ChannelDemoParser.ChangeableChannelTokenStream(lexer); ChannelDemoParser parser = new ChannelDemoParser(tokens); parser.parse(); } } 

the following is printed:

  a
 b
 c
 C
 d
 D
 e
 E
+4
source

Perhaps you should consider making a space a part of your gram. But why clutter up your grammar with such unimportant information? Well, because it is NOT insignificant. A new line has appeared in some contexts. When you need IDE support, for example. from the language server of the visual studio you need to specify the language grammar without all the bells and whistles of the low-level ANTLR settings.

0
source

In Antler 4 I use a simple solution. I have not tested it in Antlr 3 . This is C #, but you can easily translate it into Java.

  • Modify parser1.g4 as follows:

     parser grammar Parser1; options { tokenVocab=Lexer1; } startRule @init { SetWhiteSpacesAcceptence(false); } : (componentWithWhiteSpaces | componentWithoutWhiteSpaces)* EOF ; componentWithWhiteSpaces : { SetWhiteSpacesAcceptence(true); } component1 component2 component3 { SetWhiteSpacesAcceptence(false); } ; componentWithoutWhiteSpaces : component4 component5 component6 
  • Modify lexer1.g4 as follows:

     lexer grammar Lexer1; WS : [ \t\r\n] { if( this.IsWhiteSpacesAccepted() ) Skip(); }; 
  • Extend the Parser1 class as follows:

     class MyParser : Parser1 { public void SetWhiteSpacesAcceptence(bool isAccept) { if (_input != null && _input.TokenSource != null) { if (_input.TokenSource is MyLexer) { MyLexer lexer = _input.TokenSource as MyLexer; if (lexer != null) lexer.SetWhiteSpacesAcceptence(isAccept); } } } public bool IsWhiteSpacesAccepted() { if (_input != null && _input.TokenSource != null) { if (_input.TokenSource is MyLexer) { MyLexer lexer = _input.TokenSource as MyLexer; if (lexer != null) return lexer.IsWhiteSpacesAccepted(); } } return false; } } 
  • Extend the Lexer1 class as follows:

     class MyLexer : Lexer1 { private bool isWhiteSpacesAccepted; public void SetWhiteSpacesAcceptence(bool isAccept) { isWhiteSpacesAccepted = isAccept } public bool IsWhiteSpacesAccepted() { return isWhiteSpacesAccepted; } } 
  • Now the Main function is as follows:

     static void Main() { AntlrFileStream input = new AntlrFileStream("pathToInputFile"); MyLexer lexer = new MyLexer(input); UnbufferedTokenStream tokens = new UnbufferedTokenStream(lexer); MyParser parser = new MyParser(tokens); parser.startRule(); } 
0
source

Source: https://habr.com/ru/post/886374/


All Articles