I want to parse text comments and look for specific tags inside them. The types of tags I'm looking for look like this:
<name#1234>
Where "name" is the string [az] (from a fixed list), and "1234" is the number [0-9] +. These tags can occur in a string zero or more times and be surrounded by arbitrary other text. For example, all lines are valid:
"Hello <foo#56> world!"
"<bar#1>!"
"1 < 2"
"+<baz#99>+<squid#0> and also<baz#99>.\n\nBy the way, maybe <foo#9876>"
The following lines are not valid:
"1 < 2"
"<foo>"
"<bar#>"
"Hello <notinfixedlist#1234>"
The latter is invalid because "notinfixedlist" is not a supported named identifier.
I can easily parse this with a simple regular expression, for example (I'm just omitting the named groups):
<[a-z]+#\d+>
or directly specifying a fixed list:
<(foo|bar|baz|squid)#\d+>
but I would like to use antlr for several reasons:
- , , , , , "<" " > ", , . "& lt;" "& gt;" , .
- (: "{foo + 666}" "[[@1234]]" . , , .
- , antlr4 , , , .
antlr4? , , , , , .
, :
grammar Tags;
parse
: ( tag | text )*
;
tag
: '<' fixedlist '#' ID '>'
;
fixedlist
: 'foo'
| 'bar'
| 'baz'
| 'squid';
text
: ~('<' | '>')+
;
ID
: [0-9]+
;
?