Perl extract text between html tags with regex

I'm new to Perl, and I'm trying to extract text between all the tags <li> </li>in a string and assign them to an array using regex or split / join.

eg.

my $string = "<ul>
                  <li>hello</li>
                  <li>there</li>
                  <li>everyone</li>
              </ul>";

To this code ...

foreach $value(@array){
    print "$value\n";
}

... leads to this result:

hello
there
everyone
0
source share
2 answers

Note. Do not use regular expressions for HTML parsing.

This first option is done using HTML :: TreeBuilder , one of the many HTML available in Parsers. You can go to the link above and read the documentation and see the example given.

use strict;
use warnings;
use HTML::TreeBuilder;

my $str 
   = "<ul>"
   . "<li>hello</li>"
   . "<li>there</li>"
   . "<li>everyone</li>"
   . "</ul>"
   ;

# Now create a new tree to parse the HTML from String $str
my $tr = HTML::TreeBuilder->new_from_content($str);

# And now find all <li> tags and create an array with the values.
my @lists = 
      map { $_->content_list } 
      $tr->find_by_tag_name('li');

# And loop through the array returning our values.
foreach my $val (@lists) {
   print $val, "\n";
}

, ( ). - ..

my $str
   = "<ul>"
   . "<li>hello</li>"
   . "<li>there</li>"
   . "<li>everyone</li>"
   . "</ul>"
   ;

my @matches;
while ($str =~/(?<=<li>)(.*?)(?=<\/li>)/g) {
  push @matches, $1;
}

foreach my $m (@matches) {
   print $m, "\n";
}

:

hello
there
everyone
+6

. HTML.

hwnd Parser HTML.

HTML-, css, Mojo::DOM. Mojocast episode 5 8- .

use strict;
use warnings;

use Mojo::DOM;

my $html = do {local $/; <DATA>};

my $dom = Mojo::DOM->new($html);

for my $li ($dom->find('li')->text->each) {
    print "$li\n";
}

__DATA__
<ul>
  <li>hello</li>
  <li>there</li>
  <li>everyone</li>
</ul>

:

hello
there
everyone
+1

Source: https://habr.com/ru/post/1605624/


All Articles