Regular expression not capturing text from my site

Perl beginner with a question about regular expressions.

The code below successfully downloads the contents of a web page from my site. Then I check the pattern "Search Type: [Dir or Geo]". This bit that I just wrote is not regular regular expression code, but text to show what I want to combine.

Here's an excerpt from what the get method actually captures (sorry, not enough reputation points to send images):

what: movers<br/> where: toronto<br/> search type: Dir <br/> 

there are tabs and spaces between "search type:" and "Dir", and you see this paragraph symbol in Word documents (right after the word "type:".

Below is my code.

 use strict; use warnings; use WWW::Mechanize; my $searchtype = "nothing yet"; my $mech = WWW::Mechanize->new(); my $webpage; $mech->credentials('user','password' ); foreach my $keyword qw(movers) { print "\$keyword = $keyword\n"; my $url = "http://myurl"; $mech->get($url); $webpage = $mech->content(); if ($webpage =~ /search type.+([AZ][az][az])/) { $searchtype = $1; print "$searchtype\n"; } } 

So why is my regex $ webpage = ~ / search type. + ([AZ] [az] [az]) / doesn't write "Dir" to the matching variable $ 1?

Driving me nuts.

Louie

+4
source share
1 answer

/./ matches any character except newlines if you do not use /./s . Since you want to match a new line, you need to add /s .

 /search type.+([AZ][az][az])/s 

But it will find the last three letters of the document. Do you really want

 /search type:\s+([AZ][az][az])/ 
+6
source

Source: https://habr.com/ru/post/1446367/


All Articles