How to avoid escape character with ANTLR 4?

Many languages ​​associate a string with a quote, for example:

"Rob Malda is smart."

ANTLR 4 can match such a string with the lexer rule as follows:

QuotedString : '"' .*? '"';

To use certain characters inside a string, they must be escaped, possibly like this:

"Rob \"Commander Taco\" Malda is smart."

ANTLR 4 may also correspond to this line;

EscapedString : '"' ('\\"|.)*? '"';

(taken from p96 Defining ANTLR 4 Reference)

Here's my problem: suppose the character to escape is the same character as the line separator. For instance:

"Rob ""Commander Taco"" Malda is smart."

(This is completely legal in Powershell.)

Which lexer rule will match this? I would think this would work:

EscapedString : '"' ('""'|.)*? '"';

But this is not so. The lexer symbolizes the escape character " as the end of the line separator.

+6
source share
2 answers

Cancel certain characters with the ~ operator:

 EscapedString : '"' ( '""' | ~["] )* '"'; 

or, if the line does not have line breaks, follow these steps:

 EscapedString : '"' ( '""' | ~["\r\n] )* '"'; 

You do not want to use an inanimate operator, otherwise "" will never be consumed, and "a""b" will be designated as "a" and "b" instead of one token.

+7
source

(Do not vote for this answer, vote for @Bart Kiers answer.)

I suggest this for completeness as it is a small part of the Powershell grammar. Combining the logic for exiting p76 in the "Defining ANTLR 4 Reference with Bart's answer", here are the rules needed to lex out escaped strings in Powershell:

 EscapedString : '"' (Escape | '""' | ~["])* '"' | '\'' (Escape | '\'\'' | ~['])* '\'' | '\u201C' (Escape | .)*? ('\u201D' | '\u2033') // smart quotes ; fragment Escape : '\u0060\'' // backtick single-quote | '\u0060"' // backtick double-quote ; 

These rules handle the following four ways to avoid strings in Powershell:

 'Rob ''Commander Taco'' Malda is smart.' "Rob ""Commander Taco"" Malda is smart." 'Rob `'Commander Taco`' Malda is smart.' "Rob `"Commander Taco`" Malda is smart." 
+1
source

Source: https://habr.com/ru/post/985686/


All Articles