Gref and data mining in Perl

I have HTML content stored in a variable. How to extract data found between a set of common tags on a page? For example, I'm interested in data (represented by DATA, stored between a set of tags that are one line after another:

...
<td class="jumlah">*DATA_1*</td>
<td class="ud"><a href="">*DATA_2*</a></td>
...

And then I would like to keep the DATA_2 => DATA_1 mapping in a hash

+3
source share
4 answers

Since this is HTML, you probably need an XPath module to work with HTML, HTML :: TreeBuilder :: XPath .

, HTML:: TreeBuilder. , - $content, :

my $tree = HTML::TreeBuilder->new;
$tree->parse_file($file_name);

XPath . td , tr table body html:

my $tdNodes = $tree->findnodes('/html/body/table/tr/td');

, , , :

foreach my $node ($tdNodes->get_nodelist) {
  my $data = $node->findvalue('.'); // the content of the node
  print "$data\n";
}

. HTML:: TreeBuilder NodeSet , NodeSet. w3schools XPath .

HTML, , . , .. XPath, , . , HTML XPath , .

0
+2

HTML, Q - HTML:: TreeBuilder HTML:: Parser.

, , , SO, HTML RegEx - - , , 100% , HTML .

0

: HTML::TreeBuilder::XPath. :

XPath HTML:: TreeBuilder, .

0

Source: https://habr.com/ru/post/1746640/


All Articles