Requesting a Website with Perl LWP :: Ease of Processing Internet Prices

Question

Requesting a Website with Perl LWP :: Ease of Processing Internet Prices

In my free time, I tried to improve my perl abilities by working on a script that uses LWP :: Simple to poll individual pages of a website to check product prices (I have several perl noob). This script also keeps a very simple lag behind the last price for this product (since prices often change).

I was wondering if I could automate the script even more, so I do not need to explicitly add the page URL to the original hash (i.e. store an array of key terms and do an amazon search query to find the page or price?). Is there a way that I could do this that doesn't suggest that I just copy the Amazon URL and parse my keywords? (I know that handling HTML with regex is generally bad, I just used it since I only need one small piece of data).


#!usr/bin/perl
use strict;
use warnings;
use LWP::Simple;

my %oldPrice;
my %nameURL = (
    "Archer Season 1" => "http://rads.stackoverflow.com/amzn/click/B00475B0G2",
    "Code Complete" => "http://rads.stackoverflow.com/amzn/click/0735619670",
    "Intermediate Perl" => "http://rads.stackoverflow.com/amzn/click/0596102062",
    "Inglorious Basterds (2-Disc)" => "http://rads.stackoverflow.com/amzn/click/B002T9H2LK"
);

if (-e "backlog.txt"){
    open (LOG, "backlog.txt");
    while(){
        chomp;
        my @temp = split(/:\s/);
        $oldPrice{$temp[0]} = $temp[1];
    }
close(LOG);
}

print "\nChecking Daily Amazon Prices:\n";
open(LOG, ">backlog.txt");
foreach my $key (sort keys %nameURL){
    my $content = get $nameURL{$key} or die;
    $content =~  m{\s*\$(\d+.\d+)} || die;
    if (exists $oldPrice{$key} && $oldPrice{$key} != $1){
        print "$key: \$$1 (Was $oldPrice{$key})\n";
    }
    else{
    print "\n$key: $1\n";
    }
    print LOG "$key: $1\n";
}
close(LOG);

+3

regex perl lwp

Cooper Feb 18 '11 at 16:51

source share

2 answers

, . , - , :

URL . .
.
Learn XPath and use it to extract data from HTML , it’s easy if you are already using a CSS selector.

Other stackers, if you want to change my post with the rationale for each tip, complete and edit it.

+3

daxim Feb 18 '11 at 17:44

source share

bvr · Accepted Answer · 2011-02-18T17:37:36+0000

script, Amazon. URL- . - HTML:: TreeBuilder. HTML dump (. ).

use strict; use warnings;

use LWP::Simple;
use URI::Escape;
use HTML::TreeBuilder;
use Try::Tiny;

my $look_for = "Archer Season 1";

my $contents
  = get "http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords="
        . uri_escape($look_for);

my $html = HTML::TreeBuilder->new_from_content($contents);
for my $item ($html->look_down(id => qr/result_\d+/)) {
    # $item->dump;      # find out structure of HTML
    my $title = try { $item->look_down(class => 'productTitle')->as_trimmed_text };
    my $price = try { $item->look_down(class => 'newPrice')->find('span')->as_text };

    print "$title\n$price\n\n";
}
$html->delete;

Requesting a Website with Perl LWP :: Ease of Processing Internet Prices

More articles: