Extract specific string from curl'd result

Given this curl command: curl --user-agent "fogent" --silent -o page.html " http://www.google.com/search?q=insansiate "

* Spelling intentionally incorrect. I want to get an offer as a result.

I want to be able to grep in the page.html file, possibly using grep -oE or hide it directly from curl and never store the file.

The result should be: 'instantiate'

All I need is the word “instance," or the phrase, whatever Google automatically corrects, is what I want.

Here is the basic html that is returned:

<span class=spell style="color:#cc0000">Did you mean: </span><a href="/search?hl=en&amp;ie=UTF-8&amp;&amp;sa=X&amp;ei=VEMUTMDqGoOINraK3NwL&amp;ved=0CB0QBSgA&amp;q=instantiate&amp;spell=1"class=spell><b><i>instantiate</i></b></a>&nbsp;&nbsp;<span class=std>Top 2 results shown</span>

So, perhaps from / to the line below, which I hope is unique enough to cover all of my databases.

class=spell><b><i>instantiate</i></b></a>&nbsp;&nbsp;

grep; , , html prettify tool , 50 . , bash, , , . perl , .

, ?

+3
4

, , - . , , .

grep -o 'Did you mean:\([^>]*>\)\{5\}' page.html | sed 's/.*<i>\([^<]*\)<.*/\1/' page.html

:

curl --user-agent "fogent" --silent "http://www.google.com/search?q=insansiate" | grep -o 'Did you mean:\([^>]*>\)\{5\}' page.html | sed 's/.*<i>\([^<]*\)<.*/\1/'

" > " " :" "</i>" , .

, Google?

ispell aspell, :

echo insansiate | ispell -a

.

+4

xidel - -; ( CSS, XPath).

CSS- a.spell .

xidel --user-agent "fogent" "http://google.com/search?q=insansiate" -e 'a.spell'

, xidel , curl .

curl , ( ):

curl --user-agent "fogent" --silent "http://google.com/search?q=insansiate" |
xidel - -e 'a.spell'
+1

curltidy -asxmlxmlstarlet sel

0

Edit: Sorry, did not see your notice in Perl.

#! / Usr / bin / perl use strictly; use LWP :: UserAgent;

my $arg = shift // 'insansiate';

my $lwp = LWP::UserAgent->new(agent => 'Mozilla');
my $c = $lwp->get("http://www.google.com/search?q=$arg") or die $!;

my @content = split(/:/, $c->content);

for(@content) {
  if(m;<b><i>(.+)</i></b>;) {
    print "$1\n";
    exit;
    }
}

Running:

 > perl google.pl 
    instantiate
 > perl google.pl disconect
    disconnect
0
source

Source: https://habr.com/ru/post/1749777/


All Articles