How to achieve Perl ^ and $ regex in ANLTR4 vocabulary? i.e. to match the beginning of a line and the end of a line without using any character.
I am trying to use the ANTLR4 lexer to match the # character at the beginning of the line, but not in the middle of the line. For example, to isolate and throw out all C ++ preprocessor directives, no matter which directive it ignores, # inside a string literal. (Usually, we can tokenize C ++ string literals to exclude # that appears in the middle of the string, but assuming we don't). This means that I only want to specify #. *? not bothering #if #ifndef #pragma etc.
In addition, the C ++ standard allows white space and multi-line comments before and after #, for example.
# ifdef .....
considered a valid preprocessor directive appearing on one line. (CRLF inside ML COMMENTs rush)
This is what I am doing now:
PPLINE: '\r'? '\n' (ML_COMMENT | '\t' | '\f' |' ')* '#' (ML_COMMENT | ~[\r\n])+ -> channel(PPDIR);
But the problem is that I have to rely on the existence of CRLF before # and send out that CRLF is generally with a directive. I need to replace the CRLF reset by CRLF with this directory line, so I have to make sure the directive is completed by CRLF.
However, this means that my grammar cannot process a directive that appears right at the beginning of the file (i.e. does not precede CRLF) or precedes EOF without ending CRLF.
If the Perl-style regex ^ $ syntax is available, I can map SOL / EOL instead of explicitly matching and using CRLF.
source share