How to create something like a negative character class with a string instead of characters?

I am trying to write a tokenizer for Mustache in Perl. I can easily handle most tokens, for example:

#!/usr/bin/perl use strict; use warnings; my $comment = qr/ \G \{\{ ! (?<comment> .+? ) }} /xs; my $variable = qr/ \G \{\{ (?<variable> .+? ) }} /xs; my $text = qr/ \G (?<text> .+? ) (?= \{\{ | \z ) /xs; my $tokens = qr/ $comment | $variable | $text /x; my $s = do { local $/; <DATA> }; while ($s =~ /$tokens/g) { my ($type) = keys %+; (my $contents = $+{$type}) =~ s/\n/\\n/; print "type [$type] contents [$contents]\n"; } __DATA__ {{!this is a comment}} Hi {{name}}, I like {{thing}}. 

But I have a problem with the Set Delimiters directive:

 #!/usr/bin/perl use strict; use warnings; my $delimiters = qr/ \G \{\{ (?<start> .+? ) = [ ] = (?<end> .+?) }} /xs; my $comment = qr/ \G \{\{ ! (?<comment> .+? ) }} /xs; my $variable = qr/ \G \{\{ (?<variable> .+? ) }} /xs; my $text = qr/ \G (?<text> .+? ) (?= \{\{ | \z ) /xs; my $tokens = qr/ $comment | $delimiters | $variable | $text /x; my $s = do { local $/; <DATA> }; while ($s =~ /$tokens/g) { for my $type (keys %+) { (my $contents = $+{$type}) =~ s/\n/\\n/; print "type [$type] contents [$contents]\n"; } } __DATA__ {{!this is a comment}} Hi {{name}}, I like {{thing}}. {{(= =)}} 

If I change it to

 my $delimiters = qr/ \G \{\{ (?<start> [^{]+? ) = [ ] = (?<end> .+?) }} /xs; 

It works fine, but in the Set Delimiters directive you need to change the delimiters, so the code will look like

 my $variable = qr/ \G $start (?<variable> .+? ) $end /xs; 

And it's fair to say {{{== ==}}} (that is, change the delimiters to {= and =} ). I want, but maybe not what I need, is the ability to say something like (?:not starting string)+? . I believe that I just have to give up on being clean, and pass the code into a regular expression to make it match only what I want. I try to avoid this for four reasons:

  • I do not think it is very clean.
  • It is marked as experimental.
  • I am not very familiar with it (I think it comes down to (?{CODE}) and returns special values.
  • I hope someone knows some other exotic feature that I don’t know which is better suited to the situation (e.g. (?(condition)yes-pattern|no-pattern) ).

Just to make everything clear (I hope), I am trying to match a separator with a constant length, followed by a shortest line that allows a match and does not contain an initial separator, followed by a space, followed by an equal sign, and then the shortest line, which allows a match ending in a trailing delimiter.

+4
source share
2 answers

Use a negative forward-backward expression. Something like that:

 my $variable = qr/ \G $start (?<variable> (.(?!$end))+ ) $end /xs; 
+3
source

For those who are curious, it follows that a complete tokenizer for Mustache, written in the style of Perl 5.10. Now I just need to write parser and rendering.

 #!/usr/bin/perl use 5.010_000; use strict; use warnings; sub gen_tokenizer { my ($s, $e) = @_; my ($start, $end) = map { quotemeta } $s, $e; my $unescaped = "$s $e" eq "{{ }}" ? qr/ \G \{{3} (?<unescaped> .+?) }{3} /xs : qr{ \G $start & (?<unescaped> .+? ) $end }xs; return qr{ $unescaped | \G $start (?: ! (?<comment> .+? ) | > (?<partial> .+? ) | \# (?<enum_start> .+? ) | / (?<enum_stop> .+? ) | (?<start> (?: . (?! $end ) )+? ) = [ ] = (?<end> .+? ) | (?<variable> .+? ) ) $end | (?<text> .+? ) (?= $start | \z ) }xs; } my $template = do { local $/; <DATA> }; my $tokenizer = gen_tokenizer "{{", "}}"; while ($template =~ /$tokenizer/g) { my @types = keys %+; if (@types == 1) { my $type = $types[0]; (my $contents = $+{$type}) =~ s/\n/\\n/g; say "$type: [$contents]"; } else { $tokenizer = gen_tokenizer $+{start}, $+{end}; say "set_delim: [$+{start} $+{end}]"; } } __DATA__ {{!this is a comment}} {{{html header}}} Hi {{name}}, I like {{thing}}. {{(= =)}} (#optional) This will only print if optional is set (/optional) (&html footer) 
+2
source

Source: https://habr.com/ru/post/1304405/


All Articles