How to parse multiple lines, fixed width file in perl?

I have a file that I need to parse in the following format. (All delimiters are spaces):

field name 1: Multiple word value. field name 2: Multiple word value along with multiple lines. field name 3: Another multiple word and multiple line value. 

I know how to parse a file with a fixed width of one line, but I do not understand how to handle multiple lines.

+4
source share
4 answers
 #!/usr/bin/env perl use strict; use warnings; my (%fields, $current_field); while (my $line = <DATA>) { next unless $line =~ /\S/; if ($line =~ /^ \s+ ( \S .+ )/x) { if (defined $current_field) { $fields{ $current_field} .= $1; } } elsif ($line =~ /^(.+?) : \s+ (.+) \s+/x ) { $current_field = $1; $fields{ $current_field } = $2; } } use Data::Dumper; print Dumper \%fields; __DATA__ field name 1: Multiple word value. field name 2: Multiple word value along with multiple lines. field name 3: Another multiple word and multiple line value. 
+8
source

Fixed width tells unpack me. Regular expressions can be parsed and separated, but unpack should be more secure since it is the right tool for fixed-width data.

I put the width of the first field at 12 and an empty space between 13, which works for this data. You may need to change this. The pattern "A12A13A*" means "find 12, then 13 ascii characters, followed by any length of ascii characters". unpack will return a list of these matches. In addition, unpack will use $_ if no string is specified, which is what we are doing here.

Please note that if the first field is not a fixed colon width, since it appears to be in your sample data, you need to combine the fields in a template, for example. "A25A *" and then separate the colon.

I chose the array as a storage device, because I don’t know if your field names are unique. The hash rewrites the fields with the same name. Another advantage of the array is that it preserves the order of the data as it appears in the file. If these things are irrelevant, and a quick search is more priority, use a hash instead.

Code:

 use strict; use warnings; use Data::Dumper; my $last_text; my @array; while (<DATA>) { # unpack the fields and strip spaces my ($field, undef, $text) = unpack "A12A13A*"; if ($field) { # If $field is empty, that means we have a multi-line value $field =~ s/:$//; # strip the colon $last_text = [ $field, $text ]; # store data in anonymous array push @array, $last_text; # and store that array in @array } else { # multi-line values get added to the previous lines data $last_text->[1] .= " $text"; } } print Dumper \@array; __DATA__ field name 1: Multiple word value. field name 2: Multiple word value along with multiple lines. field name 3: Another multiple word and multiple line value with a third line 

Output:

 $VAR1 = [ [ 'field name 1:', 'Multiple word value.' ], [ 'field name 2:', 'Multiple word value along with multiple lines.' ], [ 'field name 3:', 'Another multiple word and multiple line value with a third line' ] ]; 
+4
source

You can do it:

 #!/usr/bin/perl use strict; use warnings; my @fields; open(my $fh, "<", "multi.txt") or die "Unable to open file: $!\n"; for (<$fh>) { if (/^\s/) { $fields[$#fields] .= $_; } else { push @fields, $_; } } close $fh; 

If the line starts with a space, add it to the last element in @fields , otherwise press it at the end of the array.

In addition, slurp the whole file and split it with the look:

 #!/usr/bin/perl use strict; use warnings; $/=undef; open(my $fh, "<", "multi.txt") or die "Unable to open file: $!\n"; my @fields = split/(?<=\n)(?!\s)/, <$fh>; close $fh; 

This is not a recommended approach.

+2
source

You can change the delimiter:

 $/ = "\nfield name"; while (my $line = <FILE>) { if ($line =~ /(\d+)\s+(.+)/) { print "Record $1 is $2"; } } 
0
source

Source: https://habr.com/ru/post/1386385/


All Articles