Perl extract text between html tags with regex

Question

Perl extract text between html tags with regex

I'm new to Perl, and I'm trying to extract text between all the tags <li> </li>in a string and assign them to an array using regex or split / join.

eg.

my $string = "<ul>
                  <li>hello</li>
                  <li>there</li>
                  <li>everyone</li>
              </ul>";

To this code ...

foreach $value(@array){
    print "$value\n";
}

... leads to this result:

hello
there
everyone

0

html regex perl tags

user2809023 Sep 23 '13 at 23:45

source share

2 answers

hwnd · Answer 1 · 2013-09-24T01:19:55+0000

Note. Do not use regular expressions for HTML parsing.

This first option is done using HTML :: TreeBuilder , one of the many HTML available in Parsers. You can go to the link above and read the documentation and see the example given.

use strict;
use warnings;
use HTML::TreeBuilder;

my $str 
   = "<ul>"
   . "<li>hello</li>"
   . "<li>there</li>"
   . "<li>everyone</li>"
   . "</ul>"
   ;

# Now create a new tree to parse the HTML from String $str
my $tr = HTML::TreeBuilder->new_from_content($str);

# And now find all <li> tags and create an array with the values.
my @lists = 
      map { $_->content_list } 
      $tr->find_by_tag_name('li');

# And loop through the array returning our values.
foreach my $val (@lists) {
   print $val, "\n";
}

, ( ). - ..

my $str
   = "<ul>"
   . "<li>hello</li>"
   . "<li>there</li>"
   . "<li>everyone</li>"
   . "</ul>"
   ;

my @matches;
while ($str =~/(?<=<li>)(.*?)(?=<\/li>)/g) {
  push @matches, $1;
}

foreach my $m (@matches) {
   print $m, "\n";
}

:

hello
there
everyone

Miller · Answer 2 · 2014-06-15T17:12:37+0000

. HTML.

hwnd Parser HTML.

HTML-, css, Mojo::DOM. Mojocast episode 5 8- .

use strict;
use warnings;

use Mojo::DOM;

my $html = do {local $/; <DATA>};

my $dom = Mojo::DOM->new($html);

for my $li ($dom->find('li')->text->each) {
    print "$li\n";
}

__DATA__
<ul>
  <li>hello</li>
  <li>there</li>
  <li>everyone</li>
</ul>

:

hello
there
everyone

Perl extract text between html tags with regex

More articles: