I am trying to write a tokenizer for Mustache in Perl. I can easily handle most tokens, for example:
#!/usr/bin/perl use strict; use warnings; my $comment = qr/ \G \{\{ ! (?<comment> .+? ) }} /xs; my $variable = qr/ \G \{\{ (?<variable> .+? ) }} /xs; my $text = qr/ \G (?<text> .+? ) (?= \{\{ | \z ) /xs; my $tokens = qr/ $comment | $variable | $text /x; my $s = do { local $/; <DATA> }; while ($s =~ /$tokens/g) { my ($type) = keys %+; (my $contents = $+{$type}) =~ s/\n/\\n/; print "type [$type] contents [$contents]\n"; } __DATA__ {{!this is a comment}} Hi {{name}}, I like {{thing}}.
But I have a problem with the Set Delimiters directive:
#!/usr/bin/perl use strict; use warnings; my $delimiters = qr/ \G \{\{ (?<start> .+? ) = [ ] = (?<end> .+?) }} /xs; my $comment = qr/ \G \{\{ ! (?<comment> .+? ) }} /xs; my $variable = qr/ \G \{\{ (?<variable> .+? ) }} /xs; my $text = qr/ \G (?<text> .+? ) (?= \{\{ | \z ) /xs; my $tokens = qr/ $comment | $delimiters | $variable | $text /x; my $s = do { local $/; <DATA> }; while ($s =~ /$tokens/g) { for my $type (keys %+) { (my $contents = $+{$type}) =~ s/\n/\\n/; print "type [$type] contents [$contents]\n"; } } __DATA__ {{!this is a comment}} Hi {{name}}, I like {{thing}}. {{(= =)}}
If I change it to
my $delimiters = qr/ \G \{\{ (?<start> [^{]+? ) = [ ] = (?<end> .+?) }} /xs;
It works fine, but in the Set Delimiters directive you need to change the delimiters, so the code will look like
my $variable = qr/ \G $start (?<variable> .+? ) $end /xs;
And it's fair to say {{{== ==}}} (that is, change the delimiters to {= and =} ). I want, but maybe not what I need, is the ability to say something like (?:not starting string)+? . I believe that I just have to give up on being clean, and pass the code into a regular expression to make it match only what I want. I try to avoid this for four reasons:
- I do not think it is very clean.
- It is marked as experimental.
- I am not very familiar with it (I think it comes down to
(?{CODE}) and returns special values. - I hope someone knows some other exotic feature that I donβt know which is better suited to the situation (e.g.
(?(condition)yes-pattern|no-pattern) ).
Just to make everything clear (I hope), I am trying to match a separator with a constant length, followed by a shortest line that allows a match and does not contain an initial separator, followed by a space, followed by an equal sign, and then the shortest line, which allows a match ending in a trailing delimiter.