HTML parsing with Mojolicious User Agent

Question

HTML parsing with Mojolicious User Agent

I have html something like this

 <h1>My heading</h1>

 <p class="class1">
 <strong>SOMETHING</strong> INTERESTING (maybe not).
 </p>

 <div class="mydiv">
 <p class="class2">
 <a href="http://www.link.com">interesting link</a> </p>

 <h2>Some other heading</h2>

The content between h1 and h2 changes - I know that I can use css selectors in Mojo :: Dom to, say, select the contents of h1 or h2 or p-tags, but how to choose everything between h1 and h2? Or, in general, everything between any two given tag sets?

+3

perl mojolicious

user1849286 Dec 10 '12 at 21:48

source share

1 answer

memowe · Accepted Answer · 2012-12-11T00:51:52+0000

It is pretty simple. You can simply select all the interesting elements in the Mojo :: Collection object (this is what Mojo :: DOM is children , for example, a method) and make some state model as a match when repeating over this collection.

Probably the most magical way to do this

Perl .. :

".." . , , () sed, awk . ".." , . , . , , , . , .

a

#!/usr/bin/env perl

use strict;
use warnings;
use feature 'say';
use Mojo::DOM;

# slurp all DATA lines
my $dom = Mojo::DOM->new(do { local $/; <DATA> });

# select all children of <div id="yay"> into a Mojo::Collection
my $yay = $dom->at('#yay')->children;

# select interesting ('..' operator in scalar context: flip-flop)
my $interesting = $yay->grep(sub { my $e = shift;
    $e->type eq 'h1' .. $e->type eq 'h2';
});

say $interesting->join("\n");

__DATA__
<div id="yay">
    <span>This isn't interesting</span>
    <h1>INTERESTING STARTS HERE</h1>
    <strong>SOMETHING INTERESTING</strong>
    <span>INTERESTING TOO</span>
    <h2>END OF INTERESTING</h2>
    <span>This isn't interesting</span>
</div>

<h1>INTERESTING STARTS HERE</h1>
<strong>SOMETHING INTERESTING</strong>
<span>INTERESTING TOO</span>
<h2>END OF INTERESTING</h2>

, Mojo:: Collection grep $yay. , , .. . , h1 , h2, , .

, - Perl, .. , !

HTML parsing with Mojolicious User Agent

Probably the most magical way to do this

More articles: