Why is only the last occurrence found in my Perl regex?

I have the following input for a Perl script, and I want to get the first occurrence of the NAME = "..." lines in each of the structures <table>...</table>.

The entire file is read in one line, and the regular expression acts on this input.

However, the regex always returns the last occurrence of strings NAME="...". Can someone explain what is happening and how can this be fixed?

Input file: 
ADSDF
<TABLE>
NAME="ORDERSAA"
line1
line2
NAME="ORDERSA"
line3
NAME="ORDERSAB"
</TABLE>
<TABLE>
line1
line2
NAME="ORDERSB"
line3
</TABLE>
<TABLE>
line1
line2
NAME="ORDERSC"
line3
</TABLE>
<TABLE>
line1
line2
NAME="ORDERSD"
line3
line3
line3
</TABLE>
<TABLE>
line1
line2
NAME="QUOTES2"
line3
NAME="QUOTES3"
NAME="QUOTES4"
line3
NAME="QUOTES5"
line3
</TABLE>
<TABLE>
line1
line2
NAME="QUOTES6"
NAME="QUOTES7"
NAME="QUOTES8"
NAME="QUOTES9"
line3
line3
</TABLE>
<TABLE>
NAME="MyName IsKhan"
</TABLE>

This is where the Perl code starts:

use warnings;
use strict;

my $nameRegExp = '(<table>((NAME="(.+)")|(.*|\n))*</table>)';

sub extractNames($$){
 my ($ifh, $ofh) = @_;
 my $fullFile;
 read ($ifh, $fullFile, 1024);#Hardcoded to read just 1024 bytes.
 while( $fullFile =~ m#$nameRegExp#gi){
  print "found: ".$4."\n";
 }
}

sub main(){
 if( ($#ARGV + 1 )!= 1){
  die("Usage: extractNames infile\n");
 }
 my $infileName = $ARGV[0];
 my $outfileName = $ARGV[1];
 open my $inFile, "<$infileName" or die("Could not open log file $infileName");
 my $outFile;
 #open my $outFile, ">$outfileName" or die("Could not open log file $outfileName");
 extractNames( $inFile, $outFile );
 close( $inFile );
 #close( $outFile );
}

#call 
main();
+3
source share
4 answers

Try the following:

'(?><TABLE>\n+(?:(?!</TABLE>|NAME=).*\n+)*)NAME="([^"]+)"'

(?:.*\n+)* , lookahead - (?!</TABLE>|NAME=) - NAME TABLE. , NAME, - (?>...) - .

, . - , - ; : (?:...).


EDIT: , , - . :

((NAME="(.+)")|(.*|\n))*

: , NAME= , - . , *, . , NAME .

, "" NAME="...", , . * ; , NAME - MyName IsKhan - , 4.

, , , - . , * :

'<TABLE>\n+(?:.*\n+)*?NAME="([^"]+)"'

- ; .

+4

:

my $nameRegExp = '(<table>((NAME="(.+?)")|(.*?|\n))*</table>)';

NAME . NAME () <TABLE>...</TABLE>.

NAME, :

my $nameRegExp = 'NAME="(.+?)"';

print $1;

+1

First of all, it is a bad idea to parse XML using regular expressions. Secondly, you need to change your regular expression to the following:

my $nameRegExp = '(<table>((NAME="(.+)?")|(.*?|\n))*?</table>)';

Thus, the regular expression is not greedy and should return the first appearance.

+1
source
$/ = '</TABLE>';
while (<>) {
    chomp;
    @F = split "\n";
    $g = 0;
    for ($o = 0; $o <= $#F; $o++) {
        if ($F[$o] =~ /^NAME=/) {
            $F[$o] =~ s/^NAME=//g;
            $v = $F[$o];
            $g = 1;
            last;
        }
    }    
    if ($g) {  print $v."\n"; }
}

Output

$ perl myscript.pl file
"ORDERSAA"
"ORDERSB"
"ORDERSC"
"ORDERSD"
"QUOTES2"
"QUOTES6"
"MyName IsKhan"

all its meaning: use </TABLE>as a separator of records and a new line as a separator of fields. Go through each field and find NAME=. If found, replace and get the line after the character =.

+1
source

Source: https://habr.com/ru/post/1737476/


All Articles