How to implement custom escape sequence when using split () in perl?

I am trying to write a parser for the EDI data format, which is only delimited text, but where the delimiters are defined at the top of the file.

Essentially this is a bunch of splits () based on the values ​​I read at the top of my code. The problem is also that the custom "escape character" indicates that I need to ignore the following delimiter.

For example, it is assumed that * is a delimiter and? it's an escape, I'm doing something like

use Data::Dumper;
my $delim = "*";
my $escape = "?";
my $edi = "foo*bar*baz*aster?*isk";

my @split = split("\\" . $delim, $edi);
print Dumper(\@split);

I need it to return "aster * isk" as the last item.

My initial idea was to do something where I replace each instance of the escape character and the next character with an uncorrected unprintable ascii sequence before I call my split () functions and then another regexp to switch them back to right values.

This is doable, but it seems like a hack, and will become pretty ugly when I do this for all 5 different potential delimiters. Each delimiter is potentially a special regexp char, which results in a lot of screens in my own regular expressions.

Is there a way to avoid this, possibly with a special regular expression passed to my split () calls?

+3
source share
4

, , escape- . :

# Process escapes to hide the following character:
$edi =~ s/\Q$escape\E(.)/sprintf '%s%d%s', $escape, ord $1, $escape/esg;

my @split = split( /\Q$delim\E/, $edi);

# Convert escape sequences into the escaped character:
s/\Q$escape\E(\d+)\Q$escape\E/chr $1/eg for @split;

, , char, , .

+1
my @split = split( /(?<!\Q$escape\E)\Q$delim\E/, $edi);

, escape- :

s/\Q$escape$delim\E/$delim/g for @split;

: escape- , , . :

my @split = $edi =~ /(?:\Q$delim\E|^)((?:\Q$escape\E.|(?!\Q$delim\E).)*+)/gs;
s/\Q$escape$delim\E/$delim/g for @split;

*+ perl 5.10+. :

/(?:\Q$delim\E|^)((?>(?:\Q$escape\E.|(?!\Q$delim\E).)*))/gs
+7
+2

- ​​, ysth, , , ( ), , .

sub split_edi {
  my ($in, %args) = @_;
  die q/Usage: split_edi($input, escape => "#", delims => [ ... ]) /
    unless defined $in and defined $args{escape} and defined $args{delims};

  my $escape = quotemeta $args{escape};
  my $delims = join '|', map quotemeta, @{ $args{delims} };

  my ($cur, @ret);

  while ($in !~ /\G\z/cg) {
    if ($in =~ /\G$escape(.)/mcg) {
      $cur .= $1;
    } elsif ($in =~ /\G(?:$delims)/cg) {
      push @ret, $cur; 
      $cur = '';
    } elsif ($in =~ /\G((?:(?!$delims|$escape).)+)/mcg) {
      $cur .= $1;
    } else {
      die "hobbs can't write parsers";
    }
  }
  push @ret, $cur if defined $cur;
  @ret;
}

- , escape char , .

:

  • escape, , .
  • - , .
  • , .
  • , .

. ysth regex, - . , , , :)

say for split_edi("foo*bar;baz*aster?*isk", delims => [qw(* ;)], escape => "?");
foo
bar
baz
aster*isk
+1
source

Source: https://habr.com/ru/post/1762000/


All Articles