To this question, I asked about extracting tags from arbitrary text. The solution provided worked well, but there is one extreme case that I would like to deal with. To repeat, I can make out an arbitrary text entered by the user, and would like to see any appearance of <or >match the correct syntax tag. If the angle bracket is not part of a valid tag, it should be escaped as <or >. The syntax I'm looking for is <foo#123>where foois the text from a fixed list of entries, and 123is the number [0-9]+. Parser:
parser grammar TagsParser;
options {
tokenVocab = TagsLexer;
}
parse: (tag | text)* EOF;
tag: LANGLE fixedlist GRIDLET ID RANGLE;
text: NOANGLE;
fixedlist: FOO | BAR | BAZ;
Lexer:
lexer grammar TagsLexer;
LANGLE: '<' -> pushMode(tag);
NOANGLE: ~[<>]+;
mode tag:
RANGLE: '>' -> popMode;
GRIDLET: '#';
FOO: 'foo';
BAR: 'bar';
BAZ: 'baz';
ID: [0-9]+;
OTHERTEXT: . ;
This works well and successfully parses the text, for example:
<foo#123>
Hi <bar#987>!
<baz#1><foo#2>anythinghere<baz#3>
if 1 < 2
BailErrorStrategy:
<foo123>
<bar
<foo
<unsupported
if 1 < 2
, < tag . , > , :
if 2 > 1
if 2 > 1 , .
, >, , ?