I am trying to write a parser for the EDI data format, which is only delimited text, but where the delimiters are defined at the top of the file.
Essentially this is a bunch of splits () based on the values ββI read at the top of my code. The problem is also that the custom "escape character" indicates that I need to ignore the following delimiter.
For example, it is assumed that * is a delimiter and? it's an escape, I'm doing something like
use Data::Dumper;
my $delim = "*";
my $escape = "?";
my $edi = "foo*bar*baz*aster?*isk";
my @split = split("\\" . $delim, $edi);
print Dumper(\@split);
I need it to return "aster * isk" as the last item.
My initial idea was to do something where I replace each instance of the escape character and the next character with an uncorrected unprintable ascii sequence before I call my split () functions and then another regexp to switch them back to right values.
This is doable, but it seems like a hack, and will become pretty ugly when I do this for all 5 different potential delimiters. Each delimiter is potentially a special regexp char, which results in a lot of screens in my own regular expressions.
Is there a way to avoid this, possibly with a special regular expression passed to my split () calls?
source
share