How can I extract HTML table data using Perl?

Question

How can I extract HTML table data using Perl?

I need to get some data from a web page. After analyzing the HTML code of the page, I found that the data I needed was embedded in a table with a unique table identifier. I don't know if this is an HTML rule or not, anyway, this is very good for parsing, I think.

The data in the table is arranged as follows (various attributes and tags were omitted to give you a clear “data structure”)

<table .... id = "tablename" .... >
    <tr>
         <td .... >filed1</td>
             ....
         <td .... >filedn</td>
    </tr>
         #several "trs" here
    <tr>
         <td .... >filed1</td>
             ....
         <td .... >filedn</td>
    </tr>
</table>

So my question is how to use the Perl HTML parser utility to meet my needs in this case.

Thanks in advance.

+3

html perl

Haiyuan zhang Dec 21 '09 at 5:50

source share

4 answers

HTML:: .

+2

Pradeep 21 . '09 11:30

Look at Ken MacFarlane HTML Analysis with HTML :: Parser in the Perl Journal. I'm not sure if this is the parser you are talking about, but it looks like it can do what you want, or at least point you in the right direction.

-1

Chris thompson Dec 21 '09 at 5:55

source share

You can try something like this:

my $html = '<html code....';

$html =~ s/^.*(<table id="tablename">.*<\/table>).*/$1/s;

-4

sitemap Dec 21 '09 at 6:32

source share

Leon Timmermans · Accepted Answer · 2009-12-21T07:33:19+0000

HTML::TableExtract , .

How can I extract HTML table data using Perl?

More articles: