I have a wget-like script that loads a page and then extracts all the files associated with the IMG tags on this page.
Given the URL of the source page and the link extracted from the IMG tag on this page, I need to create a URL for the image file I want to get. I am currently using a function that I wrote:
sub build_url {
my ( $base, $path ) = @_;
if ($path =~ /^\//) {
($base) = $base =~ /^(?:http:\/\/)?(\w+(?:\.\w+)+)/;
return "$base$path";
}
my @base = split '/', $base;
my @path = split '/', $path;
pop @base if $base =~ /[[:alnum:]]+\/[\w\d]+\.[\w]+$/;
my $relcount = $path =~ /(\.\.\/)/g;
while ( $relcount-- ) {
pop @base;
shift @path;
}
return join '/', @base, @path;
}
The fact is that, of course, I am not the first person to solve this problem, and in fact, such a general problem, which I suppose, should be a better, more standard way of dealing with it, using either the main module or some of CPAN - although through the main module is preferable. I thought about File::Spec, but was not sure if it had all the necessary functions.