How can I reliably parse a QuakeLive player profile using Perl?

I am currently working on a Perl script to collect data from the QuakeLive website. Everything went well until I was able to get the data set.

I used regular expressions for this, and they work for everything except my favorite arena, weapons and type of game. I just need to get the names of these three elements in $ 1 for further processing.

I tried to rename the image of my favorites, but without success. If this is useful, I already use WWW :: Mechanize in the script.

I think that the problem may be related to the name of the paragraph class, where these elements are, while the previous one was classless.

You can find an example profile HERE .

Note that for the previous part of the page, it worked using code:

$content =~ /<b>Wins:<\/b> (.*?)<br \/>/;
$wins = $1;
print "Wins: $wins\n";
+3
2

, :

<p class="prf_faves">
<img src="http://cdn.quakelive.com/web/2010092807/images/profile/none_v2010092807.0.gif" 
     width="17" height="17" alt="" class="fl fivepxhr" />
                <b>Arena:</b> Campgrounds
                <div class="cl"></div>
            </p>

, <br />, , . HTML-. - ():

my ($favarena) = $content =~ m{<b>Arena:</b> ([^<]+)};

< <div> $favarena. , ,

my ($favarena) = $content =~ m{<b>Arena:</b> (\S+)};

.

, , , . , :

<p class="prf_faves">
<img src="http://cdn.quakelive.com/web/2010092807/images/profile/none_v2010092807.0.gif" 
     width="17" height="17" alt="" class="fl fivepxhr" />
<!-- <b>Arena: </b> here -->
                <b>Arena:</b> Campgrounds
                <div class="cl"></div>
            </p>

script , HTML .

HTML:: TokeParser:: Simple:

#!/usr/bin/perl

use strict; use warnings;

use HTML::TokeParser::Simple;

my $p = HTML::TokeParser::Simple->new( 'martianbuddy.html' );

while ( my $tag = $p->get_tag('p') ) {
    next unless $tag->is_start_tag;
    next unless defined (my $class = $tag->get_attr('class'));
    next unless grep { /^prf_faves\z/ } split ' ', $class;

    my $fav = $p->get_tag('b');
    my $type = $p->get_text('/b');
    my $value = $p->get_text('/p');
    $value =~ s/\s+\z//;

    print "$type = $value\n";
}

:

Arena:  Campgrounds
Game Type:  Clan Arena
Weapon:  Rocket Launcher

, HTML:: TreeBuilder:

#!/usr/bin/perl

use strict; use warnings;

use HTML::TreeBuilder;
use YAML;

my $tree = HTML::TreeBuilder->new;
$tree->parse_file('martianbuddy.html');

my @p = $tree->look_down(_tag => 'p', sub {
        return unless defined (my $class = $_[0]->attr('class'));
        return unless grep { /^prf_faves\z/ } split ' ', $class;
        return 1;
    }
);

for my $p ( @p ) {
    my $text = $p->as_text;
    $text =~ s/^\s+//;
    my ($type, $value) = split ': ', $text;
    print "$type: $value\n";
}

:

Arena: Campgrounds 
Game Type: Clan Arena 
Weapon: Rocket Launcher

, HTML, , , HTML:: Parser , XML-.

+7

. , , HTML-. - HTML::TreeBuilder ? " 3- " " ..

+5

Source: https://habr.com/ru/post/1768064/


All Articles