How can I extract HTML table data using Perl?

I need to get some data from a web page. After analyzing the HTML code of the page, I found that the data I needed was embedded in a table with a unique table identifier. I don't know if this is an HTML rule or not, anyway, this is very good for parsing, I think.

The data in the table is arranged as follows (various attributes and tags were omitted to give you a clear “data structure”)

<table .... id = "tablename" .... >
    <tr>
         <td .... >filed1</td>
             ....
         <td .... >filedn</td>
    </tr>
         #several "trs" here
    <tr>
         <td .... >filed1</td>
             ....
         <td .... >filedn</td>
    </tr>
</table>

So my question is how to use the Perl HTML parser utility to meet my needs in this case.

Thanks in advance.

+3
source share
4 answers
+2

Look at Ken MacFarlane HTML Analysis with HTML :: Parser in the Perl Journal. I'm not sure if this is the parser you are talking about, but it looks like it can do what you want, or at least point you in the right direction.

-1
source

You can try something like this:

my $html = '<html code....';

$html =~ s/^.*(<table id="tablename">.*<\/table>).*/$1/s;
-4
source

Source: https://habr.com/ru/post/1726208/


All Articles