I do not get HTML tag when parsing
The snippet of HTML that I want to parse is as follows:
<ul class="authors">
<li class="author" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/Person">
<a href="/search?facet-creator=%22Charles+L.+Fefferman%22" itemprop="name">Charles L. Fefferman</a>,
</li>
<li class="author" itemprop="author" itemscope="itemscope" itemtype="http://schema.org/Person">
<a href="/search?facet-creator=%22Jos%C3%A9+L.+Rodrigo%22" itemprop="name">José L. Rodrigo</a>
</li>
I want to highlight whole elements <a>, but while I try to parse it with WWW::Mechanize::TreeBuilder, the only content that I get are the names of the authors. So:
Content I Expect:
<a href="/search?facet-creator=%22Charles+L.+Fefferman%22" itemprop="name">Charles L. Fefferman</a>,
<a href="/search?facet-creator=%22Jos%C3%A9+L.+Rodrigo%22" itemprop="name">José L. Rodrigo</a>
Content I get:
Charles L. Fefferman,
José L. Rodrigo
Here is the code responsible for parsing this:
my $mech = WWW::Mechanize->new();
WWW::Mechanize::TreeBuilder->meta->apply($mech);
$mech->get($addressdio);
my @authors = $mech->look_down('class', 'author');
print "Authors: <br />";
foreach ( @authors ) {
say $_->as_text(), "<br />";
}
I thought that this could be due to as_text(), and also that when CGI receives HTML, it does not perceive it as text.
+4
1 answer
I processed it, but in a completely different way - using HTML :: TagParser:
my $html = HTML::TagParser->new("overwrite.xml");
my @li = $html->getElementsByAttribute('class','author');
foreach(@li){
my $a = $_->firstChild();
my $link = $a->getAttribute('href');
say $_->innerText;
say $link;
}
+3