Antlr grammar avoiding angle brackets

To this question, I asked about extracting tags from arbitrary text. The solution provided worked well, but there is one extreme case that I would like to deal with. To repeat, I can make out an arbitrary text entered by the user, and would like to see any appearance of <or >match the correct syntax tag. If the angle bracket is not part of a valid tag, it should be escaped as &lt;or &gt;. The syntax I'm looking for is <foo#123>where foois the text from a fixed list of entries, and 123is the number [0-9]+. Parser:

parser grammar TagsParser;

options {
    tokenVocab = TagsLexer;
}

parse: (tag | text)* EOF;
tag: LANGLE fixedlist GRIDLET ID RANGLE;
text: NOANGLE;
fixedlist: FOO | BAR | BAZ;

Lexer:

lexer grammar TagsLexer;

LANGLE: '<' -> pushMode(tag);
NOANGLE: ~[<>]+;

mode tag:

RANGLE: '>' -> popMode;
GRIDLET: '#';
FOO: 'foo';
BAR: 'bar';
BAZ: 'baz';
ID: [0-9]+;
OTHERTEXT: . ;

This works well and successfully parses the text, for example:

<foo#123>
Hi <bar#987>!
<baz#1><foo#2>anythinghere<baz#3>
if 1 &lt; 2

BailErrorStrategy:

<foo123>
<bar#a>
<foo#123H>
<unsupported#123>
if 1 < 2

, < tag . , > , :

if 2 > 1

if 2 &gt; 1 , .

, >, , ?

0
1

, > , > tag. , . , lexer :

lexer grammar TagsLexer;

LANGLE: '<' -> pushMode(tag);
NOANGLE: ~[<>]+;
BADRANGLE: '>';

mode tag;

RANGLE: '>' -> popMode;
...

> .

+1

Source: https://habr.com/ru/post/1653067/


All Articles