How to extract URLs from plain text in Perl?

Question

How to extract URLs from plain text in Perl?

I saw some posts like this, but not quite what I want to do.

How to extract and remove URL links and then remove them from plain text.

Example:

"Hello!!, I love http://www.google.es".

I want to extract "http://www.google.es", save it to a variable, and then remove it from my text.

Finally, the text should be like this:

"Hello!!, I love".

URLs are usually the last "word" of text, but not always.

+3

url text perl extract

Cristian Oct 18 '10 at 10:16

source share

4 answers

brian d foy · Answer 1 · 2010-10-18T16:54:17+0000

, URI:: Find, URI . , , URL-, , URI:

use URI::Find;

my $string = do { local $/; <DATA> };

my $finder = URI::Find->new( sub { '' } );
$finder->find(\$string );

print $string;

__END__
This has a mailto:joe@example.com
Go to http://www.google.com
Pay at https://paypal.com
From ftp://ftp.cpan.org download a file

Nikhil Jain · Answer 2 · 2010-10-18T10:46:56+0000

URI:: Find URL- .

Regexp:: Common:: URI - URI.

use strict;
use warning;
use Regexp::Common qw/URI/;
my $str = "Hello!!, I love http://www.google.es";
my ($uri) = $str =~ /$RE{URI}{-keep}/;
print "$uri\n"; #output: http://www.google.es

JAR.JAR.beans · Answer 3 · 2015-01-01T08:05:12+0000

99% , , , :

/((?<=[^a-zA-Z0-9])(?:https?\:\/\/|[a-zA-Z0-9]{1,}\.{1}|\b)(?:\w{1,}\.{1}){1,5}(?:com|org|edu|gov|uk|net|ca|de|jp|fr|au|us|ru|ch|it|nl|se|no|es|mil|iq|io|ac|ly|sm){1}(?:\/[a-zA-Z0-9]{1,})*)/mg

https://regex101.com/r/fO6mX3/2

ghostdog74 · Answer 4 · 2010-10-18T10:23:28+0000

Perl

$ cat  file
"Hello!!, I love http://www.google.es".
this is another link http://www.somewhere.com
this if ftp link ftp://www.anywhere.com the end

$ awk '{gsub(/(http|ftp):\/\/.[^" ]*/,"") }1'  file
"Hello!!, I love ".
this is another link
this if ftp link  the end

, Perl,

How to extract URLs from plain text in Perl?

More articles: